|USFDC Home | USF Electronic Theses and Dissertations||| RSS|
This item is only available as the following downloads:
Computer-Aided Diagnosis for Mammogr aphic Microcalcifi cation Clusters by Mugdha Tembey A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Department of Computer Science and Engineering College of Engineering University of South Florida Co-Major Professor: Maria Kallergi, Ph.D. Co-Major Professor: Dmitry Goldgof, Ph.D. Rafael Perez, Ph.D. Date of Approval: November 07, 2003 Keywords: neural network, filtering, segmen tation, detection, shape analysis, feature selection, Receiver Operati ng Characteristics (ROC) Copyright 2003, Mugdha Tembey
DEDICATION Aai, Baba, Neha and Dhiraj: for being my strength
ACKNOWLEDGEMENTS In my two and a half years at USF a nd the Imaging Science Research Division (ISRD) I have been fortunate enough to meet some very nice people, some who have influenced this work directly and others indi rectly. I wish to take this space to mention and appreciate them. First and foremost, I would like to thank Dr. Maria Kallergi, Director of ISRD and my advisor for giving me the opportunity to work on this project. She offered me an assistanceship that lasted through my Masters and introduced me to this exciting field. I am extremely grateful for her continued suppor t and guidance over the last two and a half years that has been instrumental in the comp letion of this project. I also thank Dr. Dmitry Goldgof for agreeing to supervise my work along with Dr. Kallergi His direction and involvement in this work have been very he lpful. I thank Dr. Rafael Perez for being on my committee and for the time that he took to review my work. Among the people at ISRD: Dr. John Heine offered his filtering programs and his advice, Angela was very patient with all my questions and demands about the data, and Joe with my mess and system problems. Ana nd and Madhu were always there to discuss and assist when I was stuck on a problem, sitting through numerous trails before my thesis presentation and proof-reading parts of this manuscript. Their help has been invaluable. Among my friends at USF: Ananth, Swapna li and I came here together and they have stuck with me since then. Ananth has been my complaint-box and a very loyal and
uncomplaining one I must say. Ayush was a w illing driver for those much-needed trips to Dunkin Donuts for coffee, Prashant ever-rea dy to take a break to eat and Sanket had an encouraging word always. Vimal and Girish have been great to know and hang around with and I appreciate all their help. USF ha s been a lot of fun because of these guys. Finally, my deepest thanks and appreciation towards my parents, Aai and Baba, who taught me the importance of a good educa tion. Their unconditional love and support, and the trust that they placed in me have b een very reassuring; I would never be here without them. Many thanks to Shireeshmama who with Aai Baba, had a solution for my initial inhibitions about coming here, and pushed me to apply for my Masters. My sister, Neha took it upon herself to always be there for me and for Aai-Baba and she is been great at it. At the same time, she also helped me keep the important things in perspective. Dhiraj with his all his love, encouragement, strength and confidence has been a dream come true. He never let me lose focus and he lped me look at a half-empty glass as halffull. He taught me not to stop at my dreams; but to go out there and make them a reality. Also, the love and support of my extended family has been extremely heartening. All their names would make this list too long, but I am sure you know you are in my thoughts. Thank you for all the support and help, I could not have done this without you.
TABLE OF CONTENTS LIST OF TABLES.....................................................................................................iii LIST OF FIGURES...................................................................................................v LIST OF ABBREVIATIONS....................................................................................vii ABSTRACT...............................................................................................................ix CHAPTER 1 INTRODUCTION.............................................................................1 1.1 Computer-Aided Diagnosis for Breast Cancer..................................2 1.2 Scope of the Work.............................................................................4 1.3 Thesis Organization...........................................................................5 CHAPTER 2 BREAST CANCER DETECTION TECHNIQUES.........................6 2.1 Mammography Applications-Screening and Diagnostic...................6 2.2 Mammographic Views.......................................................................8 2.3 Mammographic Interpretation...........................................................11 2.3.1 Mammographic Abnormalities..............................................13 220.127.116.11 Calcifications.............................................................13 2.4 Limitations of Mammography...........................................................17 CHAPTER 3 COMPUTER APPLICATIONS IN MAMMOGRAPHY.................21 3.1 Computer-Aided Diagnosis (CAD-Dx).............................................23 3.1.1 Pre-processing........................................................................26 3.1.2 Detection/Segmentation.........................................................26 3.1.3 Feature Selection....................................................................27 3.1.4 Classification..........................................................................28 3.2 Two-View Analysis...........................................................................31 CHAPTER 4 MATERIALS AND METHODS.......................................................35 4.1 Database.............................................................................................35 4.2 CAD-Dx Algorithm...........................................................................36 i
4.2.1 Pre-processing........................................................................37 4.2.2 Segmentation/Detection.........................................................37 4.2.3 Feature Extraction..................................................................38 18.104.22.168 Shape Analysis...........................................................41 22.214.171.124.1 Regional Descriptors................................42 126.96.36.199.2 Boundary Descriptors..............................44 188.8.131.52 Demographic Features...............................................47 4.2.4 Classification..........................................................................47 4.2.5 Feature Ranking and Selection..............................................52 CHAPTER 5 RESULTS AND DISCUSSION........................................................57 5.1 Large Scale Evaluation (Single-View)..............................................57 5.1.1 Database Characteristics........................................................57 5.1.2 Algorithm Optimization.........................................................59 5.1.3 Training and Testing of NevProp4........................................61 5.1.4 Feature Ranking and Selection..............................................64 5.2 Two-View Feature Estimation...........................................................66 CHAPTER 6 CONCLUSIONS ...............................................................................87 REFERENCES..........................................................................................................90 ii
LIST OF TABLES Table 2.1 Forms (Shape, Morphology, Distribution) of Individual Microcalcifications that Suggest Benign or Malignant Disease as Listed in the Literature [Kallergi 2003]...........................................................................................15 Table 2.2 Intensity and Group Descriptors of Calcifications Listed in the Clinical Literature [Kallergi 2003]..........................................................................15 Table 2.3 BIRADS Descriptors for Calcifications With Associated Genesis Type (B = Probably Benign; M= Suggestive of Malignancy; U = Uncertain) [Kallergi 2003]..................................................................16 Table 4.1 Dataset Used for the Study.........................................................................36 Table 4.2 Feature Set Used For Classification of Calcification Clusters...................42 Table 5.1 Operating Points for the Single-View Dataset...........................................63 Table 5.2 Sequence of Features Selected for the Single-View Dataset of 260 Clusters. The feature # correspondence can be found in Table 4.2 (pg. 44).......................................................................................65 Table 5.3 Two-View Feature Estimation Experiments..............................................66 Table 5.4 Operating Points for Experiment # 1..........................................................67 Table 5.5 Sequence of Features Selected for Experiment # 1....................................68 Table 5.6 Operating Points for Experiment # 2..........................................................68 Table 5.7 Sequence of Features Selected for Experiment # 2....................................69 Table 5.8 Operating Points for Experiment # 3..........................................................71 Table 5.9 Sequence of Features Selected for Experiment # 3....................................72 Table 5.10 Summary of Selected Features for the Single-View Set and the Datasets Used in Experiments 1 to 3 of Table 5.3....................................................73 Table 5.11 Comparison of Cluster Classifications in CC and MLO Views.................74 iii
Table 5.12 Operating Points for Experiment # 4..........................................................76 Table 5.13 Sequence of Features Selected for Experiment # 4....................................77 Table 5.14 Best Feature Combinations and Classification Experiments. Age Was Considered Only Once...............................................................................78 Table 5.15 Operating Points for Experiment # 5..........................................................80 Table 5.16 Operating Points for Experiment # 6..........................................................80 Table 5.17 Operating Points for Experiment # 7..........................................................82 Table 5.18 Operating Points for Experiment # 8..........................................................84 iv
LIST OF FIGURES Figure 2.1 Patient Positioning of a CC Mammographic View of the Right Breast [ACS 2003].................................................................................... 8 Figure 2.2 CC and MLO Mammographic Projections Taken During Screening Mammography [Imaginis 2003].............................................................. 9 Figure 2.3 CC and MLO Representative Views of A Right Breast.......................... 10 Figure 2.4 CC and MLO Views of a Left Breast of a Patient With Marked Abnormalities...........................................................................................17 Figure 3.1 Schematic Representation of a CAD System...........................................22 Figure 3.2 Schematic Illustration of an Automated CAD-Dx System......................25 Figure 4.1(a) ROI of A Benign Calcification Cluster....................................................39 Figure 4.1(b) ROI After Pre-processing Operation (Filtering) to Smooth Background and Isolate the Calcification Clusters......................................................39 Figure 4.1(c) ROI After Segmentation of Calcification. FP Signals Are Also Included In The Output...........................................................................................39 Figure 4.2(a) ROI of A Malignant Calcification Cluster...............................................40 Figure 4.2(b) ROI After Pre-processing Operation (Filtering) to Smooth Background and Isolate the Calcification Clusters......................................................40 Figure 4.2(c) ROI After Segmentation of Calcification. FP Signals Are Also Included In The Output...........................................................................................40 Figure 4.3 Architecture of NevProp4 Used for the Study.........................................50 Figure 4.4 SFS Illustrated For a Set of Five Input Features......................................55 Figure 5.1 Histogram of Size of Calcification Clusters in Single-View Dataset......58 v
Figure 5.2 Histogram of Contrast of Calcification Clusters in Single-View Dataset......................................................................................................58 Figure 5.3 Number of Hidden Units Vs Classification Error....................................62 Figure 5.4 Error Plot for Single-View Dataset..........................................................63 Figure 5.5 ROC Plot for the Single-View Dataset....................................................64 Figure 5.6 Error Plots for Experiment # 1.................................................................67 Figure 5.7 Error Plots for Experiment # 2 ................................................................69 Figure 5.8 Error Plots for Experiment # 3.................................................................71 Figure 5.9 Error Plots for Experiment # 4.................................................................75 Figure 5.10 ROC Plot for Experiment # 4...................................................................77 Figure 5.11 Error Plots for Experiment # 5.................................................................79 Figure 5.12 Error Plots for Experiment # 6.................................................................81 Figure 5.13 Error Plots for Experiment # 7.................................................................82 Figure 5.14 Error Plots for Experiment # 8.................................................................83 Figure 5.15 ROC Plots of Experiments 5-8................................................................85 Figure 5.16 Partial Area ROC Plots for Experiments 5-8 Where A TPF of 0.9 Is Selected....................................................................................................86 vi
LIST OF ABBREVIATIONS ACR American College of Radiology ACS American Cancer Society AMA American Medical Association ANN Artificial Neural Network BIRADS Breast Imaging Reporting and Data System CC Cranio-Caudal (Mammographic View) CAD Computer-Aided Detection CAD-Dx Computer-Aided Diagnosis DM Digital Mammography FN False Negative FP False Positive GA Genetic Algorithm ML Medio-Lateral (Mammographic View) MLO Medio-Lateral Oblique (Mammographic View) MLP Multilayer Perceptron MRI Magnetic Resonance Imaging ROC Receiver Operating Characteristics ROI Region of Interest SD Standard Deviation vii
SFS Sequential Forward Selection TN True Negative TP True Positive US Ultrasound viii
COMPUTER-AIDED DIAGNOSIS FOR MAMMOGRAPHIC MICROCALCIFICATION CLUSTERS Mugdha Tembey ABSTRACT Breast cancer is the second leading cause of cancer deaths among women in the United States and microcalcifications clusters are one of the most important indicators of breast disease. Computer methodologies help in the detection and differentiation between benign and malignant lesions and have the potential to improve radiologists performance and breast cancer diagnosis significantly. A Computer-Aided Diagnosis (CAD-Dx) algorithm has been previously developed to assist radiologists in the diagnosis of mammographic clusters of calcifications with the modules: (a) detection of all calcification-like areas, (b) false-positive reduction and segmentation of the detected calcifications, (c) selection of morphological and distributional features and (d) classification of the clusters. Classification was based on an artificial neural network (ANN) with 14 input features and assigned a likelihood of malignancy to each cluster. The purpose of this work was threefold: (a) optimize the existing algorithm and test on a large database, (b) rank classification features and select the best feature set, and (c) determine the impact of single and two-view feature estimation on classification and feature ranking. Classification performance was ix
x evaluated with the NevProp4 artificial neur al network trained w ith the leave-one-out resampling technique. Sequential forward selection was used for feature selection and ranking. Mammograms from 136 patients, containing single or tw o views of a breast with calcification cluster were digi tized at 60 microns and 16 bits per pixel. 260 regions of interest (ROIs) centered on calcification clus ter were defined to build the single-view dataset. 100 of the 136 patients had a twoview mammogram which yielded 202 ROIs that formed the two-view dataset. Classificat ion and feature selecti on were evaluated with both these datasets. To decide on the optimal features for two-view feature estimation several combinations of CC and MLO view features were attempted. On the single-view dataset th e classifier achieved an A Z =0.8891 with 88% sensitivity and 77% specificity at an operating point of 0.4; 12 features were selected as the most important. With the two-view da taset, the classifier achieved a higher performance with an A Z =0.9580 and sensitivity and specificity of 98% and 80% respectively at an operating point of 0.4; 10 features were sel ected as the most important.
CHAPTER 1 INTRODUCTION Breast cancer is the most common cancer among women in the United States with an estimated 211,300 new cases anticipated to be diagnosed in 2003 that will account for 32 percent of the total cancer cases [Jemal et al 2003, Holmes and Muss 2003]. Breast cancer is also the leading cause of cancer deaths for women between ages of 20 to 59 years [Jemal et al 2003] and the second leading cause of death in general, second only to lung cancer. In the year 2003, 40,200 (39,800 women and 400 men) are expected to die of breast cancer [Jemal et al 2003]. Statistics show that the lifetime risk of breast cancer in the United States has almost tripled in the past 50 years. In the 1940s, a womans lifetime risk of breast cancer was 1 in 22 that increased to 1 in 8 in the year 2002 [MBCC 2002]. 1 Before the 1990's, breast cancer mortality rates had been constant for nearly four decades. During 1989-1995 the breast cancer mortality declined by 1.6% and by 3.5% from 1995-1999 [MBCC 2002]. Most medical experts agree that this decline in the mortality rate can be attributed to the increasing awareness in the public that led to the early detection of breast cancer followed by proper treatment and regular follow-up. This is in agreement with previous studies that have shown that early diagnosis and
2 suitable treatment can significantly improve the chance of survival for patients with breast cancer [ Chan et al 1995, Lester 1984]. X-ray mammography is the current mainst ay for detecting n on-palpable cancers (when they are small and have not sp read, hence considered curable) [ Shen et al 1993]. The US Food and Drug Administration reports th at mammography can find 85 to 90 percent of breast cancers in women over 50 and can discover abnormalities in the breast up to two years before they can be felt [ Bassett and Gambhir 1991 Imaginis 2003]. 1.1 Computer-Aided Diagnosis for Breast Cancer Even though mammography is the chosen t echnique for early detection of breast cancer, around 10%-30% of women with breast cancer who undergo mammography show negative mammograms [ Giger 1993, Giger 2002 ]. In approximately two-thirds of these false-negative (FN) mammograms, i.e., ma mmograms with suspicious abnormalities which the radiologist failed to de tect, the cancer was evident in retrospect [ Feig 2002]. In addition to FNs, normal lesions c ould be mistaken for disease leading to false-positive (FP) calls and recommended for biops y and/or additional work up. These procedures cause the patient a lot of mental and physical trauma that could often be avoided. Records indicate that only 15-34% of the biopsies performed on non-palpable, mammographically suspicious lesions actually prove to be malignant [ Kahn et a1 1997, Lo 2003]. Studies show that missed cancers could be due to (a) technical reasons, (b) interval cancers or (c) human error. Technical reas ons include inadequate compression or poor
3 image quality due to improper positioning or fa ulty equipment. Fatigue or oversight on the part of the radiologist [ Giger 1993, Giger 2002 ], varying decision criteria or distraction by more prominent image features could result in interpretation errors in deciding a missed abnormality [ Vyborny et al 2000 ]. It has been suggested that double reading i.e., independent mammogram interpretation by two radiologists, may increase the sensitivity and specificity of ma mmographic screening by 10% to 15 % [ Giger 1993 Thurfjell 1994 Vyborny et al 2000 Feig 2002]. However, the rise in costs in addition to the increased workload on the radiologists doe s not make double reading a cost-effective option [ Giger 1993, Petrick 2000, Thurfjell 1994 ]. The idea that a machine could overcome mo st, if not all, human and technical limitations or reduce workload and costs led to the development of a class of computer algorithms in which the computer alerts th e radiologist to locations on the mammogram that are suspicious and worthy of furthe r investigation. Comput er-Aided Detection (CAD) algorithms, as they were aptly na med were shown to significantly improve detection [ Huo et al 2002, Feig 2002 ] and provide similar benefits as the second reading by another radiologist. CAD can also detect 50% to 90% of the missed cancers that are visible in retrospect [ Feig 2002 ]. With computerized detection improving on the sensitivity of the radiologists, using the computer to make a benign or malignant diagnosis of the detected region was the natu ral next step. This ne w class of computer methodologies referred to as Computer-Aid ed Diagnosis (CAD-Dx) schemes, are the topic of this research. It follows naturally from the previous section.
4 1.2 Scope of the Work CAD and CAD-Dx system improve dete ction and diagnosis by emulating the methodology of the radiologist with several mo dules, each with a specific functionality, that analyze the image to locate the abnormality and classify it. Pre-processing segmentation and detection feature extraction and finally classification define the general flow for an automated classification sy stem. The work for this thesis is for the optimization of the classification module of the CAD-Dx also incorporating feature selection for microcalcification clusters. In the initial stages of development of this CAD-Dx system, filtering, segmentation and feature extraction techniques have been op timized to enhance features, detect them and extract shape and distribution related features [Gavrielides et.al. 1997, Kallergi 2003 ]. The shape and distribution related features were combined with a single demographic feature for characterization on a small database [Kallergi 2003 ]. For this work we expanded the original database to form the single-view da taset used in this study. The existing algorithm was optimized with this si ngle-view database a nd tested with a new classifier for class char acterization, i.e., NevProp4 neural network model [ Goodman 1998] using leave-one-out resampling. A sequential forward selection (SFS) methodology [ Bishop 1995] was implemented to select th e most relevant features while maintaining classifier performan ce. The area under the ROC curve ( A Z ) [ Metz 1978] was computed as an index for the performance of each feature in distinguishing between malignant and benign microcalcifica tions. The area under the ROC curve ( A Z ) [ Metz 1978] was computed as an index for the perf ormance of each feature in distinguishing
between malignant and benign microcalcifications. To determine the impact of combination of two-view cluster information the paired clusters, i.e., those that were seen on both views, were separated from the single-view dataset to form the two-view dataset. Classification and feature selection was implemented with the paired dataset as well as with single CC and MLO view data. Several experiments with different combinations of features selected with SFS from single CC and MLO views were performed to explore the potential of combination of two-view information. 1.3 Thesis Organization This thesis has been organized as follows: Chapter 2 provides a general introduction to breast cancer and the role of mammography in screening and diagnosis. Chapter 3 discusses the state-of-the-art in CAD and CAD-Dx for mammography. In Chapter 4, the developed CAD-Dx system and its modules are described including the materials and methods used in this study. Chapter 5 presents and discusses the results of the study. Finally, the conclusions are presented in Chapter 6. 5
6 CHAPTER 2 BREAST CANCER D ETECTION TECHNIQUES Screen-film x-ray mammography is the only proven medical technique today that can detect breast cancer at an early stage [Shen 1993, Paquerault 2000, Lo 2003 ] and increase the likelihood of cure and lo ng-term survival of the patients [ Bassett and Gambhir 1991, Kahn et al 1997 ]. Due to the effectiveness of mammography in the early detection of breast cancer, the American Can cer Society (ACS), the American College of Radiology (ACR) and the American Medica l Association (AMA) recommend women over the age of 40 have a screening ma mmogram annually. In clinical practice radiologists interpret individual mammogram s for characteristic signs of potential cancerous lesions. This chapter gives a brief introduction to the conventional and new techniques for breast cancer imaging. The algorith ms tested in this work us e digitized mammograms, so our focus is more on mammogra phy than other methodologies. 2.1 Mammography Application Screening and Diagnostic Breast cancer is discovered by breast self-examination, clinical breast examination, or mammographic screening.
7 Mammography refers to an xray examination of the breast and a mammogram is an x-ray projection of the three-dimensional stru ctures of the breast on a film. It is used to detect and diagnose breast disease in wo men who show abnormal breast symptoms (symptomatic), i.e., palpable lump, pain or nipple discharge, and also in women, who have no breast complaints (asymptomatic) for early detection and prevention purposes. Mammography uses dedicated x-ray systems that produce a high-quality image of the breast tissue on film. The breas t is compressed for a few s econds between two plates. Figure 2.1 shows the positioning of the breast between the two plates of the x-ray machine. Compression, although uncomfort able, is necessary for high-quality mammograms [ Bassett and Gambhir 1991 ]. The resulting image is a black and white image of the breast tissue on film. A high-quality mammogram is one in which maximum possible breast tissue is imaged on the film [ Bassett 1998 ]. Films are viewed on dedicated light boxes or digital mammography work stations by trai ned radiologists [ ACS 2003, RadiologyInfo 2003]. There are two general types of mammography: screenin g and diagnostic [ Bassett and Gambhir 1991, Vyborny et al 2000 ]. Screening mammography aims at detecting clinically occult breast ca ncer, i.e., cancer that is still too small to be felt by a physician or during self-examination by the patient. It is the regular low-dose x-ray examination of the breast that is performed on asymptomatic women for early detection and prevention purposes. Diagnostic mammography is an x-ray examination of symptomatic women that is performed to pinpoint the exact size and location of the palpable or nonpalpable abnormalities detected during physical examin ation or routine screening mammography.
It is also aims to image the surrounding tissue and lymph nodes for possible spread of the disease. Even though it is usually more time-consuming and costly than screening mammography, it sheds more light on the likelihood of malignancy for the abnormality and may avoid the need for a biopsy of the lesion. 2.2 Mammographic Views The breasts are based on the curvilinear chest wall which makes it difficult for all the breast parenchyma to be included in a single mammographic projection [Giger et al 2000]. Standard mammographic screening therefore involves two images of each breast, the Cranio-Caudal (CC) and the Medio-Lateral Oblique (MLO) views. For the CC view, the compression is applied from the top to the bottom and x-ray is incident on the top with the screen-film cassette positioned at the bottom. Figure 2.1 shows the positioning of the breast for the CC view of the right breast. Figure 2.1 Patient Positioning of A CC Mammographic View of the Right Breast [ACS 2003]. 8
For the MLO view, the compression is applied at an angle, generally 45-60 degrees and the x-ray is incident on the medial portion between the two breasts while the film cassette is placed on the outer side. The oblique view is the most effective single view as it depicts the upper-outer quadrant and the axillary tail of the breast where the most breast carcinoma occur [Bassett and Gambhir 1991]. Screening mammography involves two views of each breast; usually a CC and a MLO view as shown in Figure 2.2. Representative CC and MLO view mammograms of a right breast are shown in Figure 2.3. Usually now a Medio-Lateral (ML) view is also done in addition to the MLO which is similar to the MLO view, except that it is taken at an angle of 90 degrees to the chest wall. To confirm the suspicion of malignancy, diagnostic mammography involves several views of the breast in addition to the CC and MLO views during screening mammography. A diagnostic workup may include magnification, spot compression and cleavage views or ML and latero-medial (LM) views [Imaginis 2003]. Furthermore, the screening mammography views may be repeated if they were not if adequate quality. Figure 2.2 CC and MLO Mammographic Projections Taken During Screening Mammography [Imaginis 2003]. 9
10 Figure 2.3 CC and MLO Representative Views of A Right Breast. Magnification views are often used to evaluate microcalcifications. A small magnification device is used that brings the breast closer to the x-ray source and further away from the film plate allowing the acquisition of zoomed in images, generally 2
11 times magnification of the region of interest (ROI). Magnification vi ews provide a clearer assessment of the number of calcifications or the tissue of a suspicious area or a mass. Spot compression view involves a compression of a small area of the breast concentrated around the suspicious tissue us ing a small compression plate or cone. By applying compression to only a specific area of the breast, the effective pressure is increased on that spot resulting in better tissue separation and visualization of the suspicious abnormality. Like magnification views, spot compression views show the borders of an abnormality or questionable area better than the standard mammography views. Cleavage view (also called valley view) is a view that images only the central portion between the two breasts and is perfor med when there is an abnormality on the medial edge of the mammogram that cannot be entirely seen on the standard views. To get as much medial tissue as possible both br easts on the plate at the same time and the medial half of both breasts is imaged that might have been left out in the individual views. The views taken in screening and dia gnostic mammography only help localize a suspicious lesion. Biopsy is the only definitive way to determine whether a woman has breast cancer. A radiologist recommends a pati ent for a biopsy based on the diagnostic mammogram and other clinical information. 2.3 Mammographic Interpretation Radiologists interpret or read mammogram s for the detection and diagnosis of potential abnormalities. However, mammographic interpretation cab be hampered
12 because of the way normal and abnormal ti ssue looks on the mammogram. As mentioned earlier, 10-30% of women with breas t cancer, who undergo mammography, show negative mammograms in two-thirds of wh ich cancers are visible in retrospect [ Giger 1993, Giger 2002 ]. Also, only about 15-34% of the biopsies performed on non-palpable, mammographically suspicious lesi ons prove to be malignant [ Kahn et al 1997, Lo 2003].The goal is to detect all the abnormalities that are present on the mammogram with no misdiagnoses of normal lesions. FN and FP interpretations could be due to (a) technical reasons, i.e., inadequate compre ssion or poor image quality due to improper positioning or faulty equipment, (b) interval cancers, i.e., cancers that develop in the period between which the two mammograms were taken or (c) human errors such as fatigue or oversight on part of the radiologist, varying decisi on criteria or distraction by more prominent image features [ Giger 1993, Giger 2002 ]. Even though independent double reading is a possible alternative to reducing the number of missed cancers [ Giger 1993, Thurfjell 1994, Vyborny et al 2000, Feig 2002 ], additional costs for the patient and workload for the radiologist do no t make it a very popular choice [ Giger 1993,Petrick 2000, ,Thurfjell 1994]. Mammograms should ideally be interpreted as true positive (TP) or true negative (TN), i.e., cases that are corre ctly classified as diseased and normal respectively. The sensitivity is the probability that a test result wi ll be positive when a disease is present which when expressed as a percentage is the TP-rate The specificity is the probability that a test result will be negative when the disease is absent which when expressed as a percentage it is the TN-rate i.e. (1-FP)
13 In clinical terms, sensitivity is a measure of the radiologists performance in detecting the abnormalities wher eas specificity defines the ra diologists ability to avoid generating false alarms [ Woods et al 1993, Gavrielides 1996 ]. It has been shown that computer algorithms trained to detect suspic ious lesions can improve the sensitivity of the radiologists [ Huo et al 2002 ]. 2.3.1 Mammographic Abnormalities The presence of a breast abnormality in a mammogram is usually indicated by signs directly due to the abnormality, or indirectly due to the abnormalitys effect on its surroundings [ Shen et al 1994]. The most prominent signs of abnormalities visible on mammograms are clustered microcalcifications, arch itectural distortion, asymmetric densities, nipple retractions, sp iculated masses, circumscribed or well-defined masses and ill-defined or irregular masses [ Sickles 1984]. Of these, the clustered microcalcifications and masses are th e most common. This work targets microcalcification clusters and these ar e discussed in more detail below. 184.108.40.206 Calcifications Calcifications are one of the most importa nt indicators of breast disease and breast cancer is often associated with them. They are small calcium deposits in the breast that can be detected by mammography and are divided into two categories: macrocalcifications and microcalcifications Calcifications are not malignant or benign; they tend to be associated with the malignant or benign tissue. However for the ease of expression, calcifications associated with mali gnant or benign tissu e are referred to as
14 malignant calcifications and be nign calcifications respectively. This thesis follows the same terminology. Macro-calcifications are coarse, scattered calcium deposits usually associated with benign conditions. They can be classified as benign on visual inspection and, hence, rarely need to be biopsied to confirm diagnosis. Microcalcifications are tiny specks of calci um in the breast, usually found in an area of rapidly dividing cells with individua l dimensions between 0.1-1 mm. A number of microcalcifications grouped together is termed as a cluster and it may be a strong indication of cancer. A cluster is defined as at least 3 microcalcifications within a 1-cm 2 area [ Kopans 1991, Gavrielides et al 1997 ]. The calcifications usually have an irregular shape and are very small in size. The dete ction of 30-50 % of non-palpable cancers and 70% of non-palpable minimal, i.e., non-infilt rating and less than 0.5 cm cancers can be attributed solely to the pres ence of microcalcifications [ Feig and Yaffe 1995 ]. Discriminating characteristics of calcifications include size, shape, density, number and distribution [ Gavrielides 1996, Feig and Yaffe 1995 ]. Radiologists have attempted to characterize benign and malignant calcifications based on thes e characteristics as shown in Tables 2.1 and 2.2 [ Kallergi 2003 ]. The development of the Breast Imaging Reporting and Data System (BIRADS) lexicon was base d on these characteris tics and is shown in Table 2.3 [ Kallergi 2003 ].
15 Table 2.1: Forms (Shape, Morphology, Distributi on) of Individual Microcalcifications that Suggest Benign or Malignant Disease as Listed in the Lite rature [Kallergi 2003]. Form Descriptor Probable Genesis Type: B = probably Benign M = suggestive of malignancy U = uncertain Linear M Branching or V,W,X,Y, Z shapes M Small numerous irregular U Punctiform in a monomorphic group B Punctiform in a polymorphic group or of varying size M Angular U Smooth Dense B Hollow or ring or radiolucent or eggshell B Annular B Fine with major variations or very fine, hardly visible M / U Worm-like M Bean-form M Undulating line of various lengths M Amorphous U Roundish or faceted B Tea cup-like B Clumpy with rounded edges B Ovoid B Table 2.2: Intensity and Group De scriptors of Calcifications Listed in the Clinical Literature [Kallergi 2003]. Intensity and Group Descriptors Probable Genesis Type: B = probably Benign M = suggestive of malignancy U = uncertain Blurred contours B Pale B Polymorphic group M Monomorphic group B Small, clustered M Linear tubular in para llel tracks (vascular) B
16 Table 2.3. BIRADS Descriptors for Calcificatio ns With Associated Genesis Type (B = Probably Benign; M= Suggestive of Malig nancy; U = Uncertain) [Kallergi 2003]. Skin (lucent centered) B Vascular (linear tubular with parallel tracks) B Coarse or popcorn like B Large rod-like B Round (larger than 0.5 mm) B Eggshell or rim (thin walled lucent centered, cystic) B Milk of calcium (varying appearance in projections) B Dystrophic (irregular in shape, over 0.5 mm, lucent centered) B Punctate (round smaller than 0.5 mm) B Suture (linear or tubular, with knots) B Spherical or lucent center (smooth and round or oval) B Amorphous or indistinct U Pleomorphic or heterogeneous granular M Fine linear M Morphology or character Fine linear branching M Clustered U Segmental U/M Regional U Diffuse/Scattered B Distribution Linear M 1-5 U 5-10 U Number >10 U Apart from microcalcifications the presence of a mass also indicates the existence of disease. Most masses of a malignant natu re appear irregular or with a spiculated margin. Some masses have calcifications associ ated with them and these are referred to as calcified masses. Figure 2.4 shows the two mammographic views of a left breast with marked abnormalities. The black arrows show the positions of the masses identified by the radiologist while the white arrows point to the calcification clusters. Note the difference in location of the abnormalities on the mammographic views due to the
17 positioning and the compression of the breast during mammographic imaging. (a) CC-View (b) MLO-View Figure 2.4 CC and MLO Views of Left Breast of a Patient With Marked Abnormalities. 2.4 Limitations of Mammography Section 1.1 covered some limitations of mammographic imaging namely the FP interpretations that lead to unnecessary biopsies and the large number of missed cancers
18 that are visible in retrosp ect. Additionally mammograms, as normally viewed, display only 3% of the information they detect [ Laine et al 1995 ], the main obstacles being the low contrast between the normal and maligna nt glandular tissues especially in younger women. Mammography is thus unable to maintain a high sensitivity while screening younger women or those with excessively dense breasts [ Mitka 2003 ]. A clear distinction between malignant (cancerous) and benign (non-cancerous) abnormalities might always not be possible due to the overlap in the br east tissue even with the availability of multiple views. The indeterminate features for masses and calcifications as visible on the mammograms may lead to confusion renderi ng an inaccurate decision and leading to biopsies. The very small size of microcalcificat ion clusters, especially in the early stages of tumor development that makes them extrem ely difficult to view. Calcifications have a high attenuation property due to which they a ppear as white (or high intensity) spots on mammograms. Conversely, it does detect some cancers "early," but many of these are not potentially lethal (FPs) and their detection causes needless anxiety to the patient. The need to overcome the limitations of mammography led to research in the development of (a) computer methodologies that would en compass the underlying image information enabling a better detection and diagnosis a nd, (b) new imaging techniques that would provide cost-effective solution with better vi sualization. Our work for a CAD-Dx system translates the first approach. Of th e imaging techniques on the horizon, Digital Mammography (DM), Ultrasound (US) and Magnetic Resonance Imaging (MRI) are believed to have the potential to meas ure up to the standards set by mammography [ Samuels 1998, Bassett 1998, Williams et al 1998, Mitka 2003 ]. In the following section
19 we have provide a succinct description for these; a comprehensive summary of the numerous screening methods can be found elsewhere [Samuels 1998, Williams et al 1998, Mitka 2003 ]. Digital mammography is regarded as the next step in the evolution of mammography [ Samuels 1998, Bassett 1998 ]. It is filmless as an electronic system (consisting of an x-ray source a nd an electronic detector) generates electronic images that can be viewed on high resolution monitors. DM offers several advantages over conventional mammography includ ing more image information, elimination of hard-copy film, rapid transmission of images over long distances and easy adaptability to computeraided detection and diagnosis systems [ Bassett 1998 ]. Ultrasound reflects the acoustic characteristics of the breast tissue. It has been proposed as an alternative to mammogr aphy for women with dense breasts [ Mitka 2003 ] and is routinely used for th e diagnostic evaluation of masses. Its non-ionizing nature makes it suitable for the detection of small non-palpable invasive br east cancers (missed in mammography) in young, pre gnant and lactating women [Samuels 1998]. However on an ultrasound the resolution if not adequate fo r calcifications and its results are always correlated with mammography [ Samuels 1998]. MRI is also a non-ionizing technique with multi-planar capability that requires a dedicated breast coil and an intravenous cont rast medium (gadolinium) primarily aimed at evaluating architectural features of ma sses. A 5-year German study of 462 women with a high risk of developing breast cancer showed that MRI sensitivit y at detecting the 51 cancers that developed in the women was 96% compared to 43% of mammography and
20 47% of US. MRI is not adequate for imag ing microcalcifications and has a lower specificity leading to more false-positiv e results and unnecessary biopsies in the misdiagnosed patients [ Samuels 1998, Williams et al 1998, Mitka 2003 ].
21 CHAPTER 3 COMPUTER APPLICATIONS IN MAMMOGRAPHY Chapter 2 provided an overview of the different imaging techniques with an emphasis on mammography; its benefits and lim itations. It also stressed the importance of microcalcification clusters as an early indicator of breast cancer. Despite its limitations mammography is the best available imaging technique for microcalcification detection. In a recent study screen-film mammogra phy showed 91.9% sensitivity and 39.3% specificity for calcifications [ Fischer et al 2002 ]. There is room for improvement however, improvement that may be achieved with the use of CAD and CAD-Dx. Traditionally the radiologist reads th e different mammographic views for signs of disease and makes a decision whether or not to recommend a biopsy depending on the degree of suspicion. A highly suspicious regi on is recommended for a biopsy to verify the presence or absence of malignant disease while a less suspicious region may be followed-up periodically. A standard 12-bit mammogram will contain 2 12 i.e. 4096 shades of gray while human observer perception is limited to 2 4 Computer algorithms could detect broader range of gray scale diff erences and extract the maximum possible information from the image hence detecting lesions invisible to the human eye. CAD algorithms aimed to increase the efficiency of screening procedures by using a computer
22 system acting as a second-reader and indicating locations of suspicious abnormalities. Figure 3.1 Schematic Representation of a CAD System. Development of CAD systems has been extensively researched in the field medical image analysis for the past 15 years and led to the FDA approval of three commercial CAD systems namely, the Second Look (CADx Medical Systems), ImageChecker (R2 Technology Inc.) and MammoReader II (iCAD) currently in the market [Kallergi 2003,
23 Vyborny et al 2000, Feig 2002 ]. Studies showed that CAD wa s able to detect 50% to 90% of the cancers visible in retrospect [ Giger 1993, Feig 2002, Giger et al 2000 ]. Figure 3.1 illustrates a typical CAD system. The digital image is passed through pre-processing and detection/segmentation techni ques to localize the regions. Pre-processing involves image enhancement to aid a better segmentation where the object(s) of in terest are localized and separated from the background. The CAD output is a mammographic view that has the suspicious locations (true and/or false) marked on it using different symbols for calcifications and masses. The radiologist considers this CAD output in his/her interpretation before making a final decisi on regarding the degree of suspicion of an abnormality and the follow-up. 3.1 Computer-Aided Diagnosis (CAD-Dx) CAD-Dx followed CAD by adding anot her level of automation. CAD-Dx algorithms characterize detected lesions as benign or malignant thus fusing detection and diagnosis [ Vyborny et al 2000, Giger et al 2000 ]. Conceptually a CAD-Dx system can be visualized as having two ma in blocks, i.e., one for detection and one for classification, where the detection block can be the CAD syst em described earlier or a human observer. Specifically, in CAD-Dx systems, the dete cted lesions can be marked out by the radiologist or a human expert [ Floyd et al 1994, Lo et al 199 7, Kahn et al 1997, Wu et al 1993] and the feature-set presented to the system that analyzes the features and comes up with a prediction of malignancy. In contrast the entire procedure from presenting the system with a digitized mamm ogram to receiving an output can be automated so that all the processing happens in the background and the radiologist is presented with a final
24 value. The radiologist can interpret this as a percent likelihood of malignancy and use it to make a final diagnostic decision and init iate proper patient management procedures [ Jiang 1999, Thiele et al 1996, Chan et al 1998 ]. Like a CAD system, CAD-Dx system takes a digitized mammogram as an input. To extract useful lesion information from the imag e for the purpose of classification requires additional image processing and pattern rec ognition techniques. Figure 3.2 shows a block diagram with the various modules that may be a part of a CAD-Dx algorithm. Preprocessing and detection/segmentation are a part of the detection block while feature extraction and classification are additional CAD-Dx functionalities. A preliminary feature extraction step is required to extract the class characteristics from the detected lesions and combine them into a one-dimensiona l feature vector that the classifier can interpret. Classification is the final block that provides the class separation. A classifier trained with the representative input samples, can generalize to predict the class of an unknown sample. The output of th e classifier ca n either be a percent likelihood of malignancy [ Kallergi 2003, Vyborny et al 2000 ] or standardized rating scale [ Vyborny et al 2000] for the lesion. The CAD-Dx systems provide a second-opinion to the radiologist before making the final diagnosis [ Jiang 1999, Kallergi 2003, Floyd et al 1994]. CAD-Dx systems can help reduce the false-positives and decrease patient morbidity as well as the number or surgical biopsies performed [ Giger 1993, Jiang et al 1996-a, Jiang 1999]. A review of various pre-pr ocessing, segmentation, and feature extraction and classification t echniques reported in the liter ature is done in the following section.
25 Figure 3.2 Schematic Illustration of an Automated CAD-Dx System.
26 3.1.1 Pre-processing Accurate detection of the suspicious lesi ons is necessary so that the extracted features represent the class characteristics, enables proper training of the classifier and subsequently a consistent classification. A pre-processing operati on that suppresses the unimportant image features and artifacts simulta neously enhancing the features of interest often aids an accurate detection. Pre-proces sing is usually done using image processing methods or filtering techniques [ Gaverielides et al 2002 ]. Several methods have been proposed for the enhancement of mammographi c images to facilitate detection and segmentation. Some of these are based on difference-image technique [ Chan et al 1987, Chan et al 1990 ], wavelets [ Heine et al 1997 Qian et al 1994] and median filter, contrast-reversal filte r and box-rim filter [ Chan et al 1988 ]. 3.1.2 Detection/Segmentation In the literature various techniques fo r detection and segmentation of the calcification clusters have been described: waveletbased techniques [ Chan et al 1987, Chan et al 1990, Strickland and Hahn 1996, Yoshida et al 1995 ], neural networks [ Chan et al 1995, Woods et al 1993, Wu et al 1992, Zhang et al 1994 ], linear discriminant analysis (LDA) [ Chan et al 1998 ], k-nearest neighbor (kNN) [ Woods et al 1993 ], fuzzy logic [ Cheng et al 1998, Verma and Zakos 1973 ] and gray-level thresholding and clustering [ Fam et al 1988, Davies and Dance 1992 ] and more recently support vector machines (SVM) [ El-Naqa et al 2002 ]. Woods et al 1993 have compared the detection accuracy of microcalcificatio ns with six classifiers (2 Bayesian, 2 dynamic neural
networks, a standard Backpropagation network and a KNN). They report the best results, averaged over the 5 training sets for the Quadratic classifier (QC) on the 200 sample training with 171.1 detections while on the same training set the LC reaches an average total detection rate of 169.5. 3.1.3 Feature Selection The next step after detection and segmentation of the lesion is the extraction of features that would best describe the class to which it belongs. Reported algorithms usually employ combinations of morphological, texture, and intensity-related features as well as demographic information related to the patient [Kallergi 2003]. A list of features used in literature for microcalcifications can be found in [Verma and Zakos 2001] while [Chan et al 1998] provide a summary of the available techniques. Gavrielides et al 1997 have extracted six shape features and used their average and standard deviation values (12 features) for classification of microcalcification clusters as malignant or benign. They report their results with a three layer feedforward neural network with quick-propagation training and leave-one-out resampling. Their results prove the features extracted by the shape analysis to be robust features for classification. More recently, Kallergi [Kallergi 2003] has reported an A Z = 0.98 with the original 12 features and two additional features (1 regional and 1 demographic) classified with a Backpropagation neural network. The above mentioned work was developed at this facility [Kallergi 2003, Gavrielides et al 1997]; so we have used their expertise and retained the 14 features that emulate the visual analysis system reported by Lanyi [1985, 1986]. 27
28 3.1.4 Classification These extracted features represent the class to which they belong and are used for class characterization. Many different pattern recognition techniques have been used for classification of the detected microcalcifications. Some of these include artificial neural networks (ANNs) [Chan et al 1995, Lo et al 1997, Lo et al 1999, Jiang et al 1997, Tourassi et al 2001, Markopoulos 2001, Giger and Huo 1999, Floyd et al 1994, Wu et al 1993, Kahn et al 1997] and k-nearest neighbor (KNN) [Gavrielides et al 1997, Hojjatolesami and Kittler 1996]. ANNs have been used in medical image analysis for a variety of applications. Kahn et al use a Bayesian network (MammoNet) with five patient-history findings, two physical findings, and 15 mammographic findings extracted by experienced radiologists to determine the probability of malignancy. Tested on 77 cases, MammoNet yielded an A Z = that gave a sensitivity of 92 % and a specificity of 88.5 %.Lo et.al. [Lo et al 1997] have used a three-layer Backpropagation network with 8 mammographic features (extracted by the radiologists) and the patient age to predict if the malignant lesions were invasive or in situ. Their ANN implementation was able to correctly predict the invasion among the malignant lesions with an A 045.0881.0 Z = 0 03.091. and identify all 28 in situ cancers (100 % specificity) and 48 of the 68 invasive cancers (71 % sensitivity). Floyd et al  have used a three-layer Backpropagation network to predict malignancy from mammographic findings for a set of 260 patient records and achieved an A Z = which surpassed the radiologists performance. The mammographic findings were provided by the radiologists. Wu et al 1993 used an ANN that trained on 14 )01.0(94.0
29 image features extracted from the mammograms by expert mammographers. The network reached an A Z = 0.89 which exceeded the values by the attending radiologists (A Z = 0.84) and the radiology residents (A Z = 0. 80). Tourassi et al 2001 have used a constraint satisfaction neural network (CSNN) approach for breast cancer diagnosis using 16 mammographic and clinical findings reported by experienced mammographers. The CSNN was developed and evaluated on a database of 500 nonpalpable breast lesions containing both masses and calcifications and reported a total A Z = which is comparable to the predictive performance achieved by experienced radiologists and Backpropagation networks. However a drawback in these methods is the human intervention required for feature extraction from the mammographic images. The features extracted may differ with the mammographer or the radiologist, which is a major concern. As against this, an automated extraction of lesion features and their classification is a more consistent approach. Markopoulos et.al 2001 reported an automated feature extraction and classification method for microcalcifications, limiting human input to the localization of the calcifications. With the eight extracted features, the computer achieved an A 02.084.0 Z = 0.937 which surpassed the A Z that was attained by the physicians as a group (0.810). Here even though the feature extraction was automated, the initial identification of calcifications by human observer again makes this approach operator-independent. A classification technique that incorporates segmentation and detection with feature extraction to provide a malignant or benign diagnosis would be the most optimal solution. Jiang et al  have developed such an automated method for differentiating
30 malignant from benign microcalcifications where the eight image features were both extracted and analyzed by the computer. They report an A Z = 0.92 for the computerized method as against the 0.89 for th e radiologist. In a later study Jiang et al  used computer extracted features to compare the computer-aided performance of the radiologist to the performance without aid. W ith the computer predicted output as an aid the sensitivity increased from 73.5 % to 87.4 % and the specificity from 31.6 % to 41.9 %. In a preliminary study, Thiele [ 1996] have developed an automated method to examine the usefulness of computer texture analysis of the breast tissue surrounding the calcifications to improve the sens itivity of malignant diagnosis. Shen et al [ 1994] use region growing technique to obtain the cont ours of calcifications. Using three shape features they have reported 100 % results on 143 calcifications using the nearest neighbor method. A large number of features can be detrimen tal to the performan ce of a classification system especially for finite sample sizes making feature selection mandatory for the reduction of input dimensions to faci litate an accura te classification. Kupinski et al [ 1997] investigated various methods of feature se lection for two different classifiers used in the computerized detection of mass le sions. A stepwise met hod, a genetic algorithm (GA) and individual feature analysis were employed with linear discriminants and artificial neural networks. The GA base d selection with the ANN reported an A Z = 0.98 followed by the sequential forward selection (SFS) with 0.97. Chan et al 1998 report the results of an automated feature extraction and classification method. They compare the genetic algorithm (GA) based feature selec tion to the feature selection method based on
31 the stepwise discriminant analysis (LDA) procedure on a data set of 145 mammographic microcalcification clusters. The featur es selected by the GA based method are comparable or slightly better than th ose selected by the stepwise LDA method. Yu and Guan 1999 recommend the General Regression Neur al Networks (GRNN) for selecting the most discriminating features for the automatic detection of clustered microcalcifications. SFS and sequential backward selection (SBS) are used with GRNN for the selection from 31 features extracted from 105 clusters, of which the SBS achieves a 90 % mean TP results at the cost of 0.5 FPs per image. This work focuses on the computer-aided diagnosis for microcalcification clusters. We have used shape and distribution relate d features along with a single demographic feature as the inputs to a three-laye r feedforward ANN for classification. 3.2 Two-View Analysis Even though we now routinely use two vi ews per breast for screening there was an initial debate regarding the number of views deemed necessary. With the Sandviken study in 1974 one-view screening with an oblique view was introduced. The Two Countries Swedish Trial and the Nijmegen study used the one-view oblique mammography too [ Warren et al 1996]. In the late-80s, following the recommendations of the Forrest report, most mammography screening in the United Kingdom were done using a single MLO view of each breast [ Bryan and Brown 1995 ]. In the Health Insurance Plan (HIP) trial in New York, 1963 [ Thurfjell et al 1994 ], two-views were used (CC and ML). Thus the technique of using one-view was dominant only in Europe and even the earliest screening programs in the United States used two views with one-view
32 screening only being performed for research purposes to show the effectiveness of the two-view process. The supporters of the one view claimed that it will not only halve the cost of x-ray film and the radiation dosage but it will also decrease the interpretation time. However, it was shown that no single view is able to correctly identify all the mammographically detectable cancers [ Anderson et al 1978, Bassett 198, Muir et al 1984, Libshitz 1976, Kreager and Kornguth 1994, Bassett 1980, Andersson 1984, Lundgren 1977]. Furthermore, adherence to a single view technology may result in more requests for additional images to further characterize the abnormalities. Bryan S and Brown J [ 1995] believe that the addition of a second view helps in detecting more early tumors and results in fewer FP recalls. It will also help provide a dditional information to determine if a tumor on one view needs any further investigation. A carcinoma may not be visible on a single view and yet have a very typical appear ance on another view [ Anderson et al 1978 ]. In cases like these the addition of a second view would not end in a missed cancer case. Between the ML and the MLO views, the oblique view is more useful in imaging all the breast locations for probable cancers. The oblique view images most area of the breast and also gives more information [ Muir et al 1984] as the positioning is such that the x-ray beam has a shorter course through the denser tissu e as compared to the other two views. It also images some portion of the chest-wall and the tail of the glandular tissue which can seldom be visualized in a CC view and not too often in a ML view. The CC or the ML view does not include the chest wall [ Bassette et al 1980 ]. However statistics showed that malignancies that were detected solely on the oblique view were in
33 the upper-outer quadrant of the breast and in the axillary tail. Hen ce, 6-10% of cancers would be missed without this view [ Bassette and Gold 1983 ]. The oblique view is also the best in which the tumors in the juxtathor acic part of the breast can be visualized. So the two-view screening method involves a CC and an MLO view instead of ML view [ Anderson et al 1978, Bassett 1987, Muir et al 1984, Kreager and Kornguth 1994, Lundgren 1977, Lundgren and Helleberg 1982 ]. Currently the CC and the MLO views are used in screening mammography. If a radiologist finds a suspicious object (microcalcification cluster or ma ss) in MLO view he would look for the same in the CC view (and vice versa) to make sure that it is indeed real and not an artifact s imulating a calcification. CAD-Dx attempts to simulate the analysis of the radiologist and achieve more consistent and accurate results. If the performance of the radiologist can increase with simultaneous reading from two views it is reasonable to assume that the performan ce of the CAD-Dx may increase if two-view information is used. To our knowledge, resear ch to-date is focuse d on using single-view information. Two-view correlation has been used to reduce FP detections in CAD systems and in three-dimensional (3D) breas t representations. Specifically, Highnam et al 1998 and Kita et al 1998 developed a model-based method for finding a curve in the mediolateral oblique mammogram which corresponds to the potential positions of a point marked in the CC mammogram. [ Paquerault et al 2001 Paquerault et al 2002 ] showed a new method that correlates geometrical, morphol ogical and textural information from the segmented suspicious objects on the two standard views for reducing FPs on the
34 individual selected views and improve sensitivity for mass detection. [Sahiner et al 2001] also used the correspondence of the detected structures on the two views of the same breast for FP reduction in computerized detection of masses. They used radial bands and nipple-to-object distances to define the search area for the pairings of objects on the other view. [Kita et al 2001] have extended their previous work to calculate curved epipolar lines by developing a simulation of object deformation into the stereo camera geometry using which they not only determined the correspondences but also the 3D location of the lesion within the compressed breast. [Kita et al 2002] have constructed a pilot test study based on the method proposed in [Kita et al 2001] and their work describes the system and does an analysis of the errors in 3D locations of the lesions. These works, though not directly related to CAD-Dx, recognize the importance and the need for utilizing both views of the breast in computer applications. A study closer to this work has [Good et al 1999] attempted to identify a number of single-view features that could be expected to be relatively invariant under compression and hence correlated for ipsilateral projections for mass. Information is combined from single-view features to form the multi-view feature vector; each multi-view feature was defined to be the absolute value of the logarithm of the ratio of the corresponding single-view feature. In preliminary results, an A Z = 0 using the multi-view information with a Bayesian classifier is reported. 03.082.
35 CHAPTER 4 MATERIALS AND METHODS This chapter provides the details of th e dataset that was used for training and testing the proposed methodology. It also describes the algorithms used for the development of different modules for the CAD-Dx system. 4.1 Database The mammograms from 136 patients were anal yzed to create the dataset used for this work. Each patient case is comprised of one or two views of the same breast, i.e., CC and MLO. The right and left breasts were cons idered to be separate cases even though for the same patient. All the mammograms in the 136 cases showed the presence of at least one biopsy-proven benign or malignant micr ocalcification cluster. The mammograms were selected from the patient files of the H. Lee Moffitt Cancer Center & Research Institute at the University of South Florida [ Kallergi 2003 ].Original mammograms were acquired on a system accredited by American College of Radiology (ACR). A DuPont Microvision film combined with a Kodak Mi n-R (one-sided) screen was used for all mammograms. Film digitization was done w ith a DBA (DBA Inc. Melbourne, Fl) ImageClear R3000 CCD-based film digitizer with a pixel size of 30 m, a pixel depth of
16 bits [Kallergi 2003]. Original images were resized to 60 m by mathematical interpolation while retaining the pixel depth. Studies have shown that this resolution is sufficient for calcification classification while computational load is significantly reduced [Kallergi et al 1998]. The ROI were defined on each image of 512 pixel-size that contained the cluster. A total of 260 ROIs were defined 138 of which contained benign cluster and 122 malignant. This will be referred to as the single-view dataset. 101 of the 136 patient cases had a two-view mammogram with the cluster visible in both views. This set yielded 202 ROIs (101 CC and 101 MLOs) that formed our two-view dataset. Of these 202 ROIs, 104 were benign and 98 malignant. A balance was maintained between the benign and malignant class samples in both datasets to prevent any classifier-bias towards a particular class during training. Table 4.1 presents the two datasets and their contents. Table 4.1 Dataset Used for the Study. Single-View Set Two-View Set CC MLO CC MLO Benign 64 74 52 52 Cancer 60 62 49 49 View-Total 124 136 101 101 Total 260 202 4.2 CAD-Dx Algorithm CAD-Dx system designed and implemented by Kallergi [Kallergi 2003] follows the architecture of Figure 3.2. The various modules of the system will be described in the following sections. 36
37 4.2.1 Pre-processing The object of interest in the ROI, i.e ., the calcification cluster is surrounded by breast tissue that masks the calcifications preventing accurate detection and shown in Figures 4.1(a) and 4.2(a). A pre-processing, us ually noise-reducing step is applied to improve image and calcification contrast. In this work a novel filter referred to as the donut filter, was applied to the image that maintained calcifications while suppressing unimportant image features. The filter is a band-pass filter resembling a donut in the Fourier domain. It was designed and developed by Dr. John Heine and it was used for the first time in this work [ Heine et al 2003, Kallergi et al 2004 ]. Previously, a symmlet wavelet filter was used for this purpose [ Kallergi 2003 ]. Figures 4.1(b) and 4.2(b) show representa tive output images of the filter for a benign and malignant cluster respectively. Compared to Figures 4.1(a) and 4.2(a), we observe background mammographic structures are removed while calcifications are preserved. This simplifies the following segmentation/detection step. 4.2.2 Detection/Segmentation Segmentation of the calcifications was done using a simple thresholding approach on the filtered images [ Kallergi 2003 ]. The thresholding was based on the histogram of each image. Only those calcifications with a cluster size less than 10 pixels and high intensity values. The spots that we re smaller than 4 pixels (0.0144 mm 2 ) in area irrespective of their configura tion were eliminated from the final segmentation step. This
38 criterion on the minimum size of the segmented objects was based on empirical observations and the visibility limits reported for calcificat ions in mammography [ Kallergi 2003 ]. The criterion helped to reduce the number of FP signals entering the shape analysis and classification stages and increase classification performance. The work for this thesis did not focus on a deta iled optimization of the detection/segmentation methodologies. The basic intention was to ha ve a detection/segmentation step that preserved the morphology and distribution of the true calcifications in a cluster as much as possible and had a relatively consistent performance. The approach described above has been used for previous work and yielde d the desired result in previous studies [Kallergi 2003, Gavrielides et al 1997 ]. Figure 4.1(c) shows the segmented ROI for a benign cluster while Figure 4.2(c) shows the segmented ROI for a malignant cluster. The segmented image is a binary image with white spots that depict the calcifications. 4.2.3 Feature Extraction Features, i.e., characteristics of the object s of interest, if se lected carefully are representative of the maximum relevant info rmation that the image has to offer for a complete characterization a lesion. Feature extraction methodologies analyze ob jects and images to extract the most prominent features that are representative of the various classes of objects. Features are used as inputs to classifiers that assign them to the class that they represent. Tables 2.1 and 2.2 provided a summary of the various descriptors that are used as malignancy indicators for calcifications. A clinical visual system based on the shape and distribution of calcification clusters has shown to lead to a sensitivity of 97.6% and a specificity of
39 Figure 4.1(a) ROI of a Benign Calcification Cluster. Figure 4.1(b) ROI After Pre-processing Operation (Filtering) to Smooth Background and Isolate Calcifications. Figure 4.1(c) ROI After Segmentation of the Calcifications. FP Signals Are Also Included in the Output.
40 Figure 4.2(a) ROI of a Malignant Calcification Cluster. Figure 4.2(b) ROI After Pre-processing (Filtering) to Smooth Background and Isolate Calcifications. Figure 4.2(c) ROI After Segmentation of the Calcifications. FP Signals Are Also Included in the Output.
73.3% [Lanyi 1985, Lanyi 1986]. This system led to the establishment of the BIRADS categories listed in Table 2.3. During mammographic interpretation radiologists interpret features that describe the shape and distribution of the calcifications and clusters, compare them to the BIRADS lexicon and make a diagnosis. The CAD-Dx system used here reproduces the visual differential diagnosis system by restricting the feature set to include primarily shape and distribution related features. Hence the features are selected through shape analysis of the segmented calcification (Figure 4.1(c) and 4.2(c)) as described below. 220.127.116.11 Shape Analysis Shape analysis of calcifications has been extensively researched in the initial stages of development of this algorithm [Gavrielides et al 1996, Kallergi 2003, Gavrielides et al 1997]. Prior studies have yielded 12 morphological, 1 distributional and 1 demographic feature [Kallergi et al 1998]. These are summarized in Table 4.2. Mean values characterize individual calcifications; Standard Deviations (SD) characterize the cluster. The numbers in the first column correspond to the order in which the features were added to make the feature vector. The last column describes the nature of the selected features. Table 4.2 also groups features by major categories; regional, boundary and demographic descriptors. The regional descriptors describe the object as a region and hence include size measurements. Boundary descriptors describe the shape and the contour of an object [Kallergi 2003]. Area, Compactness and the number of calcifications 41
are regional descriptors while Fourier coefficients of the boundary pixels and three moment-based features are boundary descriptors. Table 4.2 Feature Set Used for Classification of Calcification Clusters. Feature # assigned Feature Nature of feature 14 Age of the patient Demographic feature For individual calcification 1 Mean Area of calcification 2 Mean Compactness Describes the morphology 3 Mean Moments 4 Mean Fourier Descriptor (FD) 5 Mean Eccentricity 6 Mean Spread (S) Describes the margins 13 number of calcifications in cluster Regional descriptor Distribution for the entire cluster 7 SD Area 8 SD Compactness Describes the morphology 9 SD Moments 10 SD Fourier Descriptor 11 SD Eccentricity 12 SD Spread Describes the margins 18.104.22.168.1 Regional Descriptors Area of an object (A) is the most trivial shape parameter that can be computed from a detected object on an image. It can be defined as the number of pixels contained within (and including) the boundary of a segmented object of interest [Kallergi 2003, Castleman 1979]. It is the most basic shape parameter but offers a good description of the size of the object. The Perimeter of an object is the circumferential distance around the boundary and can be obtained from the boundary chain code [Gonzalez and Woods 1993]. Chain codes are used to represent a boundary by connected sequence of straight line segments 42
of specified length and direction. Since the chain code is a complete representation of an object or a curve, they can be used to calculate boundary features such as perimeter, length and width of an object. In case of an 8-connected chain code, the even-numbered segments have length 1 and the odd-numbered segments have length sqrt(2). Thus, the boundary perimeter P can be represented by the following equation [Kallergi 2003]: oeNNP2 Where,is the number of even steps in the boundary chain code andis the number of odd steps. Ne No Even though the perimeter is not used directly as a feature, it is required for the computation of Compactness which is described below. Compactness is a dimensionless quantity that provides a simple measure of contour complexity versus the area enclosed [Gavrielides 1996, Shen et al 1994]. It does not depend on the size of the object and the image plane and is independent of translation, rotation and scale. It is one of the most common features used in pattern recognition and classification techniques [Kallergi 2003]. Compactness can be defined in a variety of ways and its definition may affect classification that has been elaborated elsewhere [Kallergi 2003]. We chose the definition of compactness that yielded the lowest classification error for this application [Kallergi 2003]. APC42 Where, P = Perimeter of the object and A = Area of the object. According to this definition a circle is theoretically the most compact object with the smallestC. Elongated objects have a value of [Kallergi 2003]. So, a larger 1 1C 43
value of compactness describes an irregular and elongated object while a smaller value is representative of a more symmetric object [Kallergi 2003]. The number of calcifications (n) in a cluster represents the cluster size and thus can be considered as a regional descriptor. In Section 22.214.171.124 a cluster has been defined as a group of 3 or more calcifications within an area of 1 cm 2 The number of calcifications in a cluster were divided into five ranges based on the cluster definition, i.e.,,, 10 53n 105n 15 n 15 20 n and The normalized median value of the range was used to represent each range. 20 126.96.36.199.2 Boundary Descriptors We used the boundary descriptors based on the Fourier coefficients of the boundary pixels and the moments of the segmented calcifications for this work [Gavrielides 1996]. Fourier descriptors give a complete and flexible description of an objects shape. In addition they can be translation, rotation and scale invariant [Kallergi 2003]. This work uses the measure FF defined in [Shen et al 1994] based on the Normalized Fourier Descriptors (NFDs) [Kallergi 2003]. FF can be defined as: 2/12/2/12/)(/)(NNnNNnnNFDnnNFDFF The NFD(n) is defined as: 12/,,2,1);1(/)(2/,,2,1);1(/)(0;0)(NnSNnSNnSnSnnNFD 44
where, ,1,,2,1,0],/2exp[)(1)(10NnNnkjksNnSNK are complex coefficients of a Fast Fourier Transform (FFT) as described in [Shen et al 1994] with being a complex number describing the co-ordinate pair of each boundary pixel [Kallergi 2003]. )()()(kjykxks The range for the value of FF is [0-1] and generally it is smaller for complex or rough shapes and larger for smooth shapes. FF is also insensitive to position, size, orientation, and starting point of the contour [Kallergi 2003]. The theory of moments gives a number of useful and practical shape descriptors [Shen et al 1994]. From the six selected shape features, three are based on low-order and central moments i.e. moment-based measure (M) [Shen et al 1994, Kallergi 2003], eccentricity () [Kallergi 2003, Castleman 1979] and spread (S) [Kallergi 2003]. If the coordinates of the N pixels of a segmented calcification contour are described by an ordered set ( Niiyix,...,2,1)),(),( the Euclidian distances of the vectors connecting the centroid of the segmented object and the ordered set of contour pixels form a one-dimensional representation of the contour [Kallergi 2003]. The p )(iz th moment can then be defined as [Shen et.al. 1994, Kallergi 2003]: NippizNm1)]([1 and the p th central moment as [Kallergi 2003]: pNipmizN])([111 Shen et al 1994 developed a set of shape features based on four low-order 45
moments. Higher order moments are very sensitive to noise and hence not considered. Similar to FF, the range of M is from [0-1]; it represents the roughness of a contour and increases as the irregularity of the shape increases. It gives information regarding the shape roughness and is used to distinguish between the different shape categories of calcifications. Here, M has been defined as [ Kallergi 2003, Castleman 1979 ]: 1 4/1 2 1 4/1 4)()( mm M Eccentricity ( ) measures the degree to which an objects mass is concentrated along a particular axis. Th e range of values for is [0-1] where 0 defi nes a circular object and 1 a liner object. It can be represented as: 2 200,2 2 1,1 2 200,2)( 4)( mm mmm Where, for an image the moment of order ),( yxfq p was defined as: xy qp pqyxfyx m ),( Spread ( S ) was based on the central moments of the boundary pixels. It measures how unevenly an objects mass is distributed al ong its centroid and takes values in the same range as M and i.e. 0 to 1. Again, a lower value represents a circular object while a large value defines a linear and non-uni form object. Spread is defined as [ Kallergi 2003]: 0,22,0 S Where, ),()()( yxfyyxxq xy p pq 46
188.8.131.52 Demographic Features The risk of developing cancer increases with age. It is very rare in individuals less than 35 years of age, but its incidence rises by the age of 40 and it is of particular concern in the 40 to 50 age group where it is the leading cause of cancer related deaths [Giger et al 2000]. Clinical experience has identified age to be a powerful discriminator in diagnosing malignant disease [Kallergi 2003, Jemal et al 2002]. For a calcification cluster of moderate suspicion, a younger woman might be followed-up annually but an older woman may be recommended for biopsy. Hence, the clinical importance of age in diagnosing disease was instrumental in the inclusion of patient age at the time of the breast exam as an additional feature All 14 features were combined into a one-dimensional feature vector that serves as input to the classifier that provided the class separation depending on the information provided by the features in the feature vector. 4.2.4 Classification The task of the classifier component of CAD-Dx system is to use the features provided by the feature extraction step to assign the objects of interest to a category or class [Duda et al 2002]. Our CAD-Dx system uses an artificial neural network (ANN) as a classifier. ANNs are computer models with a massively parallel structure inspired by the structure and function of the neurons in the human brain [Tourassi and Floyd 1997, Haykin 1999]. They consist of highly interconnected, nonlinear units known as neurons that are linked by some weighted connections [Tourassi and Floyd 1997, Haykin 47
1999].Studies have shown that ANNs can duplicate the human ability to learn and generalize while combining the computational power of computers [Tourassi and Floyd 1997]. They can thus be looked upon as powerful tools able to capture and represent the complex relationships between input-output pairs. They can represent both linear and non-linear relationships by learning from the data that it is presented with. ANNs have been commonly used in medical imaging to classify patterns into various classes. The most common neural network model is the multilayer perceptron (MLP), which is a supervised network, i.e., it needs to know the true value of the input sample in order to learn. The MLP learns from the input sample and the corresponding known output value by changing the synaptic weights between the connections. Typically the network consists of a set of sensory units, i.e., source nodes that constitute the input layer, one or more hidden layer of the computation nodes and an output layer of computation nodes [Haykin 1999].The input signal propagates through the network in the forward direction, on a layer-by-layer basis [Haykin 1999]. The nodes across the layers are highly interconnected. The input samples are fed into the input layer and get multiplied by the interconnection weights as they pass from the input to the hidden layer. The output of the hidden nodes is thus not directly observable and only used as input to the other nodes. The hidden nodes act as feature detectors as they discover the salient features that characterize the training data during learning by performing a non-linear transformation of the input data. The multiplied input samples are summed up and processed by a non-linear function in the hidden layer. As the processed data leave the hidden layer it is multiplied by the interconnection weights and then passed to the next hidden layer (if any) or to the output layer where it is processed again to give the final output. The 48
multiplying weights represent a trained network that is able to correctly map an unknown input sample to the correct output value generalizing from the historical data, i.e., the weights. In a nutshell, while training the network encodes the input-output relationships into synaptic weights, compares the network output to the target output to compute an error measure and then adjusts the weights so that the error is reduced. This procedure is repeated until the network converges on a solution. It is expected that the network learns enough from the past to generalize to the future, i.e., an unknown sample. The MLP has been applied successfully to solve some difficult and diverse problems by using the error back-propagation training algorithm [Haykin 1999] that is based on the error-correction learning rule. The back-propagation learning consists of two phases through different layers of the network, i.e., a forward and a backward pass. In the forward pass, the input pattern is applied to the input nodes and the effect is propagated through the layers one at a time. A set of outputs is fixed as the response of the network to the input pattern. The synaptic weights are fixed during the forward pass while they are adjusted in accordance with an error-correction rule during the backward pass [Haykin 1999]. For the correction, the actual response of the network is subtracted from the target response to produce an error-signal that is then propagated backwards through the network against the direction of the synaptic connections. The synaptic weights are adjusted so as to move the network response closer to the desired response [Haykin 1999]. The learning process performed with the algorithm is called as back-propagation learning. 49
The NevProp version4 (NevProp4) ANN package from public domain was implemented in this work [Goodman and Harrell 1998]. NevProp4 is a three-layer feedforward back-propagation multilayer perceptron simulator developed as a multidisciplinary research project at the University of Nevada Center for Biomedical Modeling Research which is based on the Quickprop 1.0 by Scott Fahlman [Goodman and Harrell 1998]. Figure 4.3 shows the chosen ANN architecture. Figure 4.3 Architecture of NevProp4 Used for the Study. 50 The chosen ANN implementation is a fully connected three-layer feedforward network trained using the Quickprop algorithm [Goodman and Harrell 1998]. Each layer has many nodes and all the nodes of a layer are connected to all the nodes of the next layer. The input sample was the one-dimensional feature vector of the 14 features. The number of nodes in the input layer corresponded to the number of input features, i.e., 14.
The features were normalized before being passed as inputs to the network. The hidden layer consisted of 12 nodes. An empirical testing was performed to decide on the number of hidden nodes. For 14 inputs the number of hidden nodes was varied from 3 to 16 and the Az index was used as a measure of goodness, i.e., classifier performance. The ANN architecture with 12 nodes in the hidden layer gave the best performance on the entire dataset and was selected for classification. The output layer had a single node. The network learnt the input pattern by internally modifying the weighted connections and came up with an output value between 0 to 1. The training of the network was done using Quickprop, an optimization of the back-propagation algorithm that is loosely based on Newtons method [Tveter 2003]. Quickprop is one of the fastest network training algorithms and it works using a different weight change value for each weight as the training proceeds and scaling back the size of the weight changes when they are too large [Tveter 2003]. The relatively small dataset size did not permit a separation of the data into distinct training and test set. Such a separation would have reduced the number of training samples resulting in an under-trained network incapable of achieving the desired high classification rates. So, the generalization error of the classifier was estimated by the leave-one-out (LOO) cross-validation method [Kallergi 2003, Tourassi and Floyd 1997, Haykin 1999]. With this technique the network was trained on all but one input samples and tested on the excluded sample. For a total of N input samples, (N-1) were used for training and 1 was held out for testing. After testing, the left-out sample was returned to the dataset and a different one was left-out for testing. This was repeated for all samples, i.e., N times. 51
The final output from the network was a set of predictions for all the samples that were used in training. The values of the predictions ranged from 0 to 1 where values closer to 0 represented benign samples and those closer to 1 stood for the malignant ones. The classification error rate was estimated by determining the fraction of incorrectly classified malignant and benign cases at various thresholds between 0 and 1. A threshold on the output maybe considered as a binary operating condition that leads to the separation of cases in benign and malignant groups [Kallergi 2003]. The range from 0 to 1 was divided into 10 buckets to yield 9 threshold values with 0.1 increments. Benign and malignant cases assigned a value above and below the selected threshold respectively were considered to be incorrectly classified [Kallergi 2003]. The fraction of benign and malignant samples incorrectly classified at the different thresholds yielded the error rate. The malignant and benign error was plotted at each threshold along with an average error; we refer to this as the classification error plot or simply as error plot. A visual examination of the error plot reveals the optimal threshold suitable for clinical implementation of CAD-Dx, i.e., the operating point. We have also considered the area under the ROC curve (A Z ) as an index for performance of the system. In the results section, a classification plot accompanied by an operating point table has been reported for each experiment. The operating point table tabulates the sensitivity and specificity values at a particular threshold. 4.2.5 Feature Selection and Ranking A large number of features can be extracted from an image of which, only a few may be potentially useful while most may contain irrelevant or redundant information 52
that may result in degradation of the classifiers performance. This is especially true with a limited sample size. The size of the training data grows exponentially with the dimensionality of the input space, this phenomenon is referred to as the curse of dimensionality [Bishop 1995]. So, if we are forced to work with limited data, (as we often are in practice) increasing the dimensionality of the input space rapidly leads to the case where the data is sparse, in which case it provides a very poor representation of the mapping causing a decline in performance [Bishop 1995]. To enhance the accuracy of a classification model, the number of variables used must be reduced or the model must be simplified [Harrell et al 1996]. As a rule of thumb, [Harrell et al 1996] it was suggested that, a predictive discrimination that validates on a new sample is to have predictors where, p is number of predictors and m is the size of the training sample. One of the simplest techniques for dimensionality reduction is to select a subset of the inputs, which provide the maximum class information, and discard the remaining, i.e., select a set of features from the available features such that This is particularly useful when there are inputs that carry very little or no information for the solution or if there are very strong correlations between the set of inputs such that the same information is repeated in several variables [Bishop 1995]. A network with fewer inputs is appropriately constrained by a data set of limited size leading to a trained network with better generalization properties [Bishop 1995]. Also, fewer inputs mean fewer weights, which may lead the network to train faster. Therefore for an accurate and consistent classification, the model should be based on a small but strong set of features that characterize the input data and the class that it belongs to. A good feature is the one that provides a distinct separation between the classes. The selected subset of inputs 10/mp f F Ff 53
should ideally have all good features that would together provide the best class separation. A feature selection algorithm can be designed to pick these good or best features from the available set. A procedure for feature selection is characterized by a criterion that makes it possible to judge if one set of features is better than the other, and a search procedure, to search through the candidate subset of features. For this implementation we have used the A Z that is, the area under the ROC curve [Metz 1978] as the selection criteria. The ROCKIT 0.9B software provided by Dr. Metz at the University of Chicago was used for all calculations. The A Z is a measure of the goodness of the classifier and an A Z = 1 defines perfect classification. For a set of d features, an exhaustive search would result in 2 d possible feature combinations that would be computationally very expensive. Principal Component Analysis (PCA), sequential search techniques and Genetic Algorithms (GA) are some other options to reduce the dimensionality of the input space [Bishop 1995]. Branch and Bound (BB), Sequential Forward Selection (SFS) and Sequential Backward Elimination (SBE) are variations of the sequential search techniques. The number of input features for our work is not too large (14 features) considering the size of the datasets. So, the simple SFS approach was selected to conduct the best subset search instead of the more complex paradigms. SFS started by considering each of the variables individually and selecting the one that gives the largest A Z value. The selected feature was combined with the other features in the full set and the combination that yielded the highest value for classification, i.e., minimized the error rate was chosen. SFS is a bottom-up approach in which at each level a single feature is added to the selected set. The value of the criterion increased until a 54
certain point and then started to decline, i.e., the error-rate of the classifier began to increase. The feature combination that accounted for the maximum A Z before the decline started was assumed to be the most representative and informative, i.e., best feature subset. SFS prevents irrelevant or redundant features from being added to the selected feature set [Woods 1994]. If a redundant feature combination was found, the A Z value decreased resulting in the removal of that feature from the set. Figure 4.4 SFS Illustrated for a Set of Five Input Features. Figure 4.4 illustrates the SFS approach with five features in the starting set. The structure looks like a tree that branches out from the best feature at each level. At the first level, feature (2) is selected and combined with the remaining features at the next level yielding (3, 2) as the best set. The search continues by adding a single feature at each level that accounts for the greatest increase in the criterion function. This is a Depth-First 55
Search (DFS) procedure because the best feature combination is chosen at each level and further explored by going deeper and deeper into the tree. The possible breadth-wise combinations, i.e., the combinations of features (1), (3), (4) and (5) that are left out at the first level are not considered other than in combination with the already selected features [Bishop 1995]. The SFS search procedure was implemented with the Interactive Data Language (IDL) (Research Systems Inc., Boulder, CO) that facilitates cross-platform development. The feature set at each level was presented to the NevProp4 that computed the error-rate for the classification. The c-index or the concordance index is calculated in NevProp4 that is a non-parametric measure of discrimination, i.e., the ability to separate output categories [Goodman and Harrell 1998]. This value has been stated to be approximately equal to the area under the ROC [Metz 1978] curve which we use as a criterion for selecting a feature. 56
CHAPTER 5 RESULTS AND DISCUSSION The results of the experiments conducted to satisfy the three goals of this work are presented and discussed in this chapter. 5.1 Large Scale Evaluation (Single-View) The large scale evaluation of the CAD-Dx system involved four steps, (a) database development and characterization, (b) algorithm modification in terms of filtering and classifier, (c) training and testing of the NevProp4 artificial neural network (ANN) on the large dataset, (d) feature ranking and selection. 5.1.1 Database Characteristics Our single-view database of 260 clusters was characterized by size and contrast estimating parameters following the recommendations widely accepted in the filed on CAD [Nishikawa 2994]. Figures 5.1 and 5.2 present the histograms of the size and contrast of the 260 calcification clusters respectively. Size was defined as a range of the number of calcifications distinguishable on the image [Kallergi 2003]. Calcification 57
contrast was defined as the difference in image intensity between a single calcification and its immediate background divided by the background intensity [Kallergi 2003]. 01020304050603 to 55 to 1010 to 1515 to 20> 20Number of calcifications per clusterNumber of clusters Benign Data (138) Cancer Data (122) Figure 5.1 Histogram of Size of Calcification Clusters in the Single-View Dataset. 0510152025303540< 0.20.2 0.40.4 0.60.6 0.80.8 1.0> 1.0Calcification cluster contrastNumber of clusters Benign Data (138) Cancer Data (122) Figure 5.2 Histogram of Contrast of Calcification Clusters in the Single-View Dataset. 58
Cluster contrast was defined as the average of three contrast values estimated from the three most prominent calcifications in the cluster [Kallergi 2003]. These histograms offer a measure of the database contents and the subtlety of the tested calcifications [Kallergi 2003]. The subtlety of the calcifications may not be a major issue in a pure classification task. However since the proposed methodology includes an automated detection/segmentation step prior to the classification their representation was necessary in order to completely understand the impact on the final conclusion. The histograms in Figures 5.1 and 5.2 show that our dataset consisted of relatively small clusters of low contrast, a feature that usually challenges the detection and segmentation processes and may be used to indicate the relative difficulty of the dataset [Kallergi 2003]. 5.1.2 Algorithm Optimization In Section 4.2.1, we have mentioned the necessity of a pre-processing step that is essential for noise reduction and improving image and calcification contrast. The donut filter used for this work has been introduced in the same section. The donut filter showed improvements over the previously used symmlet wavelet filter [Kallergi 2003]. Normally it reduced edge artifacts and FP signals. Despite improved performance, it should be noted that the filter was not fully optimized and could yield even better results. To optimize the segmentation stage of the algorithm, we tested a segmentation technique based on the canny-edge detector [Canny 1986] for separating the calcifications from the background in the donut filtered image. The canny-edge based segmentation however did not preserve the entire shape of the calcifications as some of the detected regions were hollow in the center and consequently failed in the shape 59
analysis. The threshold-based segmentation described in Section 4.2.2, although basic, served our purpose of preserving the morphology and the distribution of the true calcifications in a cluster and was retained in this work as well [Kallergi 2003]. So at the end the donut filter and the threshold-based segmentation were chosen for the pre-processing and the detection/segmentation steps. Previously the NevProp version 1.16 ANN implemented with the LOO was used as the classifier [Kallergi 2003]. However we were unable to optimize the ANN to save the best weights, which represent the learned-state of the network. In the classification of an unknown case, the network needs to start with uploading the best weights, generalize from the learned information and then come up with a decision. This is a typical scenario in a real-time implementation of CAD-Dx. To remedy this deficiency the latest NevProp4 version of the ANN was implemented. NevProp4 has several options that can be used by setting the appropriate switches in the input file. An in-depth summary of the features can be found elsewhere [Goodman and Harrell 1998]. Some of the NevProp4 features relevant to this work are the options to pre-randomize and/or pre-standardize training data, default configuration as a fully connected three-layer network and automatic training to prevent overfitting. NevProp4 also offers validation options such as data splitting, cross validation and bootstrapping. The inference options in NevProp4 facilitate easy interpretation by saving and retrieving the entire network in one file and reporting all the connections in the system in a simple and grouped format during training. An important feature in NevProp4 is the estimate of the c-index or the concordance index, which is a non-parametric measure of discrimination, i.e., the ability to separate output categories [Goodman and 60
Harrell 1998]. The c-index is approximately equal to the area under the ROC curve [Metz 1978] and has been used in this work as a measure of the classification ability of the network for feature selection. 5.1.3 Training and Testing of NevProp4 NevProp4 had several parameters that needed to be properly defined for a robust final classifier model. Some of the parameters that were empirically optimized were standardization of the inputs, maximum number of epochs, the number of hidden units, and the validation method. The initial randomization of the weights would not be effective, if the training predictors were on different scales [Goodman and Harrell 1998]. Scaled input variables were also necessary for a reliable optimization and later interpretation of the model. We standardized the inputs to the network to a uniform scale. Two options were available: one was to transform the data into the range of -0.5 to +0.5 and the second was to scale the training data predictors to mean and units of standard deviation [Goodman and Harrell 1998]. The second option resulted in a better classification and was selected. A minimum of 50-200 epochs is recommended for the network to continue optimization, the default being 200 [Goodman and Harrell 1998]. A numeric value for the maximum number of epochs is one of the command-line parameters needed to start the network training. We found that a maximum of 700 epochs was needed for the network to converge to an optimal solution. In the absence of hidden units the network only creates direct connections from input to output units, i.e., it becomes a generalized linear model with limited flexibility. 61
An increase in the number of hidden units causes nonlinearity in the effects, creating a complex model with greater flexibility. NevProp4 has one half the numbers of inputs as the default number of hidden units. In our experiments, we varied the number of hidden units from 1 to 16 for 14 input units. Figure 5.3 shows the effect of the variation in the number of hidden units on the classification error. The classification error rates decreased as the number of hidden units were increased achieving minimum error with 12 hidden units. 00.050.10.150.184.108.40.2060123456789101112131415161718Number of Hidden UnitsClassification Error Figure 5.3 Number of Hidden Units Vs Classification Error. Data splitting cross validation and bootstrapping are the validation options offered in NevProp4. The data-split was tried with ratios of 50-50 and 70-30 while 5-fold and 10-fold cross validations were experimented with. However the relatively small dataset size did not permit good classification on the test set. For utilizing the maximum number of available samples for training the generalization error of the classifier was estimated by the LOO resampling technique that has been explained in Section 4.2.4. 62
The network achieved an A Z of 0.8891 after classification of the single-view dataset. Figure 5.4 shows the classification error plot for the single-view dataset for the benign and cancer cases as well as the combination of the two (average performance). Table 5.1 lists the possible operating points. 00.10.20.30.220.127.116.11.80.918.104.22.168.22.214.171.124.80.91ThresholdError Cancer Benign Average Figure 5.4 Error Plots for the Single-View Dataset. Table 5.1 Operating Points for the Single-View Dataset. Operating Point Sensitivity Specificity 0.4 88 % 77 % 0.5 79 % 79 % 0.6 66 % 89 % 63
Specifically, choosing an operating threshold of 0.4 guarantees a maximum sensitivity of 88 % and a specificity of 77 %. At thresholds of 0.5 or 0.6 the specificity increases at the cost of sensitivity which is unacceptable. So 0.4 would probably be the preferred operating point in a clinical setting. Figure 5.5 illustrates the ROC curve obtained from the classification of the 260 clusters. The area under the ROC curve, A Z was estimated as 0.8891 with a standard error of 0.0199. 00.10.20.30.126.96.36.199.80.9188.8.131.52.184.108.40.206.80.91FPFTPFAz = 0.8891Std. Error = 0.0199 Figure 5.5 ROC Plot for the Single-View Dataset. 5.1.4 Feature Ranking and Selection Table 5.2 presents the order in which the features were selected by SFS and the A Z values were calculated for each combination. The A Z value used for the feature selection does 64
not come from the ROCKIT software; it is the c-index value that is calculated by the NevProp4 ANN package. The first column in the table represents the order of feature selection. The rank of the feature is also equivalent to the level in the DFS tree [Section 4.4.5] that the feature is selected at. At each level ),(l )1( l features have already been selected. Level-12 yielded the highest value of A Z although an earlier level (# 8) showed a relatively high value as well. Interestingly, the classifiers performance never reached the level achieved with the original 14 feature set, i.e., A Z =0.8891. This feature selection outcome may be due to the way the SFS method operates and possibly inter-feature relationships that are not fully evident in this process. Table 5.2 Sequence of Features Selected for the Single-View Dataset of 260 Clusters. The feature # correspondence can be found in Table 4.2 (pg. 44). Rank Feature # chosen A Z 1 12 0.5723 2 2 0.6537 3 13 0.6775 4 3 0.6889 5 11 0.7205 6 14 0.7750 7 10 0.8140 8 5 0.8352 9 4 0.8227 10 7 0.8134 11 9 0.8177 12 8 0.8478 13 6 0.8471 14 1 0.8109 This result was instrumental in the partitioning of the dataset into clusters with corresponding clusters on the other view, i.e., two-view or paired dataset and those 65
visible on only one view (unpaired clusters). The next section elaborates on the two-view dataset. 5.2 Two-View Feature Estimation In clinical practice, a cluster that is visible on both breast views has a greater chance of being detected and diagnosed accurately. Confusion in the appearance in one view leads the radiologist to look for more definitive characteristics in the other view. This was the motivation for investigating the potential usefulness of two-view information. The two-view dataset of 101 paired clusters described in Section 4.1 has been used for the two-view feature estimation. Three experiments were performed to scrutinize the importance of two-view information. Table 5.3 summarizes the three experiments. Table 5.3 Two-View Feature Estimation Experiments. Sample-size Features Defined From Experiment # 1 202 101 paired clusters in 101 CC and 101 MLO views. Experiment # 2 101 101 clusters in 101 MLO-views Experiment # 3 101 101 clusters in 101 CC views. The results of each experiment are presented in the following sections. 66
Experiment # 1 From the set of 260 clusters, 202 clusters were identified as having both a CC (101) and a MLO (101) view. These clusters were classified in this experiment and the classification error plots are shown in Figure 5.6. The sensitivities and specificities corresponding to probable operating points are shown in Table 5.4. 00.10.20.30.220.127.116.11.80.918.104.22.168.22.214.171.124.80.91ThresholdError Cancer Benign Average Figure 5.6 Error Plots for Experiment # 1. Table 5.4 Operating Points for Experiment # 1. Operating Point Sensitivity Specificity 0.4 91 % 63 % 0.45 87 % 70 % 0.5 81 % 75 % 0.6 68 % 85 % 67
A closer look would advocate using 0.4 as the operating point. As compared to Table 5.1, at the same operating point, there is an increase in the sensitivity from 88 % to 91 % [Table 5.4]. However this is at the cost of a steep drop in the specificity from 79 % [Table 5.1] to 65 % observed here. Such a significant drop in specificity at the cost of a minimal increase in sensitivity may not be acceptable in a lot of cases. An operating point of 0.5 gives a better specificity, but again we would lose sensitivity at this point. The classifier achieved an A Z of 0.8881 with a standard error of 0.0229 which is not a significant difference from the A Z value achieved with the entire single-view set (A Z =0.8891). Feature selection was for the 202 set as well. Table 5.5 presents the results of feature selection for Experiment # 1. Table 5.5 Sequence of Features Selected for Experiment # 1. Number Feature # chosen A Z 1 1 0.5893 2 11 0.6520 3 4 0.6699 4 9 0.7213 5 7 0.7676 6 2 0.8223 7 5 0.7935 8 8 0.8248 9 10 0.8457 10 13 0.8490 11 12 0.8345 12 14 0.8935 13 3 0.8510 14 6 0.7938 The maximum A Z was achieved with 12 features. The A Z was 0.8935 and is higher than the A Z obtained for the 260 clusters (Table 5.2 and Figure 5.5). This may be due to the exclusion of the 58 unpaired clusters or a bias in the set samples. 68
Experiment # 2. For this experiment the classifier was presented with only the features extracted from 101 MLO view clusters. Figure 5.7 shows the error plot while the operating points are in Table 5.6. 00.10.20.30.126.96.36.199.80.9188.8.131.52.184.108.40.206.80.91ThresholdError Cancer Benign Average Figure 5.7 Error Plots for Experiment # 2. Table 5.6 Operating Points for Experiment # 2. Operating Point Sensitivity Specificity 0.4 92 % 71 % 0.45 90 % 80 % 0.5 88 % 90 % 0.55 87 % 92 % 0.6 84 % 92 % 69
An A Z of 0.9239 with a standard error of 0.0292 was achieved in this experiment. This value was higher from both the 260 and the 202 cluster sets of 0.8891 and 0.8881 respectively. SFS was used to rank the features in this experiment as well and the results are shown in Table 5.7. Table 5.7 Sequence of Features Selected for Experiment # 2. Rank Feature # A Z 1 1 0.6009 2 11 0.6805 3 3 0.7461 4 13 0.8277 5 4 0.8685 6 12 0.8673 7 2 0.8897 8 9 0.8654 9 7 0.8999 10 5 0.9227 11 10 0.8819 12 14 0.9372 13 6 0.9513 14 8 0.8956 Thirteen features were needed in this case to achieve maximum A Z of 0.9513, a value that surpassed the previous performance. Experiment # 3 Here the classifier was presented with only those features that were extracted from 101 CC views. Figure 5.8 shows the classification error plots which Table 5.8 shows the sensitivity and specificity values at each operating point in the error plot. 70
00.10.20.30.220.127.116.11.80.918.104.22.168.22.214.171.124.80.91ThresholdError Cancer Benign Average Figure 5.8 Error Plots for Experiment # 3. Table 5.8 Operating Points for Experiment # 3. Operating Point Sensitivity Specificity 0.4 98 % 87 % 0.45 94 % 90 % 0.5 92 % 90 % 0.55 90 % 92 % 0.6 90 % 92 % This experiment yielded the best performance from all tests. The A Z parameter with all 14 features was estimated at 0.9682 with a standard error of 0.0190 that exceeds all previous performances. This may be due to the difference in feature presentations between the two views, the signal or the image quality, and differences in FP signals between the views. No specific conclusion may be derived in this work and further 71
investigation is necessary to determine the similarities and differences between views for individual calcifications and clusters. Table 5.9 shows the results from the feature selection on the 101 CC view dataset. Nine features were needed to achieve an A Z of 0.9627, a value close to the one obtained with all 14 features (0.9627). Table 5.9 Sequence of Features Selected for Experiment # 3. Rank Feature # A Z 1 3 0.6060 2 4 0.7111 3 14 0.7979 4 12 0.8324 5 10 0.9086 6 5 0.9337 7 6 0.9168 8 13 0.9368 9 8 0.9627 10 9 0.9482 11 1 0.9329 12 11 0.9345 13 2 0.9403 14 7 0.9596 In review, the single-view analysis of paired clusters showed us significant differences in performance. Interestingly mixing the CC and the MLO view of the same cluster in the same dataset as treating them as independent samples decreases classification power probably due to the bias issues. View separation always yielded better results in classification. Table 5.10 shows the combined results from the feature selection performed in the experiments of Table 5.3 and the large dataset of 260 clusters. The columns represent the dataset on which the feature selection was performed and the rows stand for the features. 72
An X symbol in the column represents feature that were not selected in the best subset for that experiment. The shaded rows show the features that were always selected. Table 5.10 Summary of Selected Features for the Single-View Set and the Datasets Used in Experiments 1 to 3 of Table 5.3. Two-View Dataset # Feature 260 Single-View clusters Experiment # 1 Experiment. # 2 Experiment # 3 1 Average -Area of calcification X X 2 Average -Compactness X 3 Average -Moments X 4 Average -Fourier Descriptor (FD) 5 Average -Eccentricity 6 Average -Spread (S) X X 7 SD-Area X 8 SD-Compactness X 9 SD-Moments X 10 SD-Fourier Descriptor 11 SD-Eccentricity X 12 SD-Spread 13 Number of calcifications in cluster 14 Age of the patient Feature numbers 4, 5, 10, 12, 13 and 14 are the features that were always selected and, hence, may be considered as those with the most discriminatory class characterization information. Feature # 13 and 14 represent the number of calcifications within the cluster and the age of the patient respectively, and were new additions to the previous set of 12 shape features [Kallergi 2003]. These new features seem to contribute significantly to cluster classification and results are in agreement with clinical findings. 73
We analyzed the prediction values for the cluster samples in Experiment # 2 and Experiment # 3 to look for the misclassifications in each case. Table 5.12 presents a summary of the classification of the cluster as predicted in each view. It should be noted that since CC and MLO are just two views of the same cluster the pathology, i.e., the malignant or benign condition for both should be the same. Table 5.11 Comparison of Cluster Classifications in CC and MLO Views. CC-view (Experiment # 3) ML-view (Experiment # 2) correct incorrect correct 79 6 incorrect 14 2 This table complements the A Z estimates reported earlier for these experiments and presents a different perspective. Of the 101 tested clusters, the classifier trained with only CC-view samples (Experiment # 3) classified 6 clusters incorrectly that the MLO-view trained classifier (Experiment # 2) classified correctly. Similarly, 14 clusters with correct class assignments by the CC-view trained classifier were classified incorrectly by the MLO-view trained classifier. There were just 2 clusters that were classified incorrectly by both CC and MLO-view trained classifiers. These results suggested that the combination of the CC and the MLO view in feature extraction could significantly improve CAD-Dx performance. Experiment # 4 was conducted to test the above hypothesis. 74
Experiment # 4 In this test, the input features to the classifier were defined as the average of the features estimated in the CC view and the corresponding features estimated in the MLO-view. All the features except patient age were averaged; age was left out since CC and MLO views were from the same patient at the same time so an average would not change its value. The dimensions of the input remained the same and so did the sample-size. 101 CC and 101 MLO view clusters were averaged, so the sample-size remained 101. The resulting error plot is shown in Figure 5.9 and the sensitivity and specificity values corresponding to the various thresholds have been tabulated in Table 5.12. 00.10.20.30.126.96.36.199.80.9188.8.131.52.184.108.40.206.80.91ThresholdError Cancer Benign Average Figure 5.9 Error Plots for Experiment # 4. 75
Table 5.12 Operating Points for Experiment # 4. Operating Point Sensitivity Specificity 0.4 98 % 80 % 0.45 96 % 84 % 0.5 94 % 87 % 0.55 91 % 86 % 0.6 88 % 88 % Results show that feature averaging yielded better results than the single-view test (Experiment # 1) and for some operating points better than the results of Experiments 2 and 3 (compare Table 5.12 to Tables 5.6 and 5.8) An A Z of 0.9580 was achieved in this test with a standard error of 0.0225. The ROC curve is shown in Figure 5.10. This A Z is lower than the A Z obtained from the CC views alone (0.9682) but considering the standard errors the difference is not statistically significant. Compare Figure 5.10 to Figure 5.5 that shows the classification performance on the single-view dataset. The average feature classifier showed a significant increase in performance (A Z =0.9580) from the performance achieved with the single-view dataset (A Z =0.8891) which could be indicative of the fact that the fusion of information from two views aids automated classification. 76
00.10.20.30.220.127.116.11.80.918.104.22.168.22.214.171.124.80.91FPFTPF Experiment # 4Az = 0.9580Std. Error = 0.0225 Figure 5.10 ROC Plot for Experiment # 4. SFS feature selection results are listed in the Table 5.13 for Experiment # 4. Table 5.13 Sequence of Features Selected for Experiment # 4. Rank Feature # A Z 1 13 0.6238 2 8 0.7159 3 5 0.7563 4 1 0.7979 5 10 0.8721 6 11 0.8516 7 14 0.8776 8 4 0.8987 9 9 0.9462 10 12 0.9580 11 2 0.9121 12 7 0.9278 13 6 0.9250 14 3 0.9215 77
Experiment # 4 achieved the same performance with 14 features (A Z = 0.9580) with just 10 features (A Z = 0.9580). Our experiments to this point investigated the impact of single-view and two-view (averaging) feature estimation on classification performance. Features were ranked in each test and the A Z indices were compared. It is possible that combinations of the best feature sets from the previous ranking experiments would yield even better performances. In the remainder of this Chapter, we describe 4 tests done to investigate the impact of best feature combination on classification performance. Table 5.14 summarizes the four classification test using best feature combinations from Experiments 2 and 3 for the 101 clusters Table 5.14 Best Feature Combinations and Classification Experiments. Age Was Considered Only Once. Features from Total number Description CC ML Experiment # 5 All the features from CC and MLO views. Age was considered only once. 13+ age 13 27 Experiment # 6 SFS features from CC & SFS features from MLO. 9 13 21 Experiment # 7 SFS features from MLO and the corresponding features from CC. 13 13 25 Experiment # 8 SFS features from CC and the corresponding features from MLO. 9 9 17 The results from these experiments are presented in the following sections. Only the classification error plots are presented. A figure with all ROC curves is given at the end of the section to allow comparison of the various feature combination tests. 78
Experiment # 5 Here all the features extracted from both CC and ML view clusters were combined, which resulted in a total of 27 features since age was common to both. Figure 5.11 shows the classification error plots for this test and the operating points are listed in Table 5.15. An A Z of 0.9572 was achieved with a standard error of 0.0185. Note that 27-features are probably too many for the tested sample size, i.e., 101 clusters. It is possible that an over-optimistic measure of classification was obtained in this test. A threshold of 0.4 would probably be chosen as the operating point with a sensitivity of 94 % and a specificity of 77 %. 00.10.20.30.126.96.36.199.80.9188.8.131.52.184.108.40.206.80.91ThresholdError Cancer Benign Average 79 Figure 5.11 Error Plots for Experiment # 5.
Table 5.15 Operating Points for Experiment # 5. Operating Point Sensitivity Specificity 0.4 94 % 77 % 0.45 92 % 82 % 0.5 90 % 90 % 0.55 90 % 90 % 0.6 88 % 92 % Experiment # 6 In this experiment, we used the 9 selected features from Experiment # 3 and 13 selected features from Experiment # 2, i.e., a total of 21 features. Age was a feature that SFS selected in both CC and MLO view experiments and, hence, it was considered only once to avoid a redundancy in features. The error plots for the classification (Figure 5.12) and the corresponding operating point table (Table 5.16) follow. Experiment # 6 achieved an A Z of 0.9663 with a standard error of 0.0155. The result is comparable to the performance achieved in Experiment # 3. At a sensitivity of 94 % (threshold = 0.4) the classifier reached a specificity of 85 %. Table 5.16 Operating Point for Experiment # 6. Operating Point Sensitivity Specificity 0.4 94 % 85 % 0.45 92 % 87 % 0.5 91 % 88 % 0.55 88 % 88 % 0.6 88 % 87 % 80
00.10.20.30.220.127.116.11.80.918.104.22.168.22.214.171.124.80.91ThresholdError Cancer Benign Average Figure 5.12 Error Plots for Experiment # 6. Experiment # 7 The 13 best features selected by SFS in Experiment # 2 were combined with the corresponding features from CC-views (Experiment # 3) independent of the ranking, to form a set of 25 features. Again, age was common to both and hence selected just once. The classification error plots are shown in Figure 5.13 and the operating point data in Table 5.17. 81
00.10.20.30.126.96.36.199.80.9188.8.131.52.184.108.40.206.80.91ThresholdError Cancer Benign Average Figure 5.13 Error Plots for Experiment # 7. Table 5.17 Operating Points for Experiment # 7. Operating Point Sensitivity Specificity 0.4 95 % 80 % 0.45 95 % 85 % 0.5 95 % 88 % 0.55 92 % 89 % 0.6 90 % 90 % The classifier achieved an A Z of 0.9650 with a standard error of 0.0160. This is again comparable to the result from Experiment # 3 that gave the best classification performance. A threshold of 0.4 attained a sensitivity of 95 % at a specificity of 80 %. Similar to the previous experiments, the number of features, i.e., 25 may be large relative 82
to the sample-size of 101 clusters. So, this may be an over-optimistic estimate of the performance of the classification and additional tests are needed for conclusive results. Experiment # 8 In the final experiment, the 9 best features selected from the CC-views were combined with the corresponding 9 features from the MLO-views. The latter were selected independent of ranking. Again, age of the patient was considered only once. Figure 5.14 shows the error plots while the operating-point list follows in Table 5.18. 00.10.20.30.220.127.116.11.80.918.104.22.168.22.214.171.124.80.91ThresholdError Cancer Benign Average Figure 5.14 Error Plots for Experiment # 8. 83
Table 5.18 Operating Points for Experiment # 8. Operating Point Sensitivity Specificity 0.4 96 % 82 % 0.45 93 % 82 % 0.5 91 % 82 % 0.55 90 % 89 % 0.6 89 % 90 % This 17-feature combination achieved an A Z of 0.9757 with a standard error of 0.0139 surpassing the CC-view results of Experiment # 3. A sensitivity of 96 % was achieved with a specificity of 82 % at a threshold of 0.4similar to the results obtained from the CC-views (Table 5.8). A set of 17 features is not incompatible with our sample size. Figure 5.l5 shows the ROC curves for Experiments 5 to 8 are overlapping. Their differences can be better appreciated in the partial ROC plots of Figure 5.16. The ROC partial area index was recently proposed as a measure of clinical diagnostic performance and a more accurate index of performance when a high sensitivity is desired [Jiang et al 1996-b]. Partial Area index estimates suggested that for the curves of Figure 5.16 classification with all 14 features estimated by averaging the feature values in the CC and MLO views (Experiment # 4) yields overall the best benign/malignant discriminatory performance for mammographic calcification clusters. 84
00.10.20.30.126.96.36.199.80.9188.8.131.52.184.108.40.206.80.91False Positive FractionTrue Positive Fraction Experiment # 5 Experiment # 6 Experiment # 7 Experiment # 8 Figure 5.15 ROC Plots of Experiments 5-8. 85
0.90.910.920.930.940.950.960.970.980.99220.127.116.11.18.104.22.168.80.91False Positive FractionTrue Positive Fraction Experiment # 5 Experiment # 6 Experiment # 7 Experiment # 8 Figure 5.16 Partial Area ROC Plot for Experiments 5-8 Where A TPF of 0.9 Is Selected. 86
CHAPTER 6 CONCLUSIONS This work investigated several aspects of the automated diagnosis of mammographic microcalcifications clusters with an emphasis on combining information from two breast views for improving the benign/malignant cluster differentiation. We have used 136 patient cases from the mammography database of the Imaging Science Research Division (ISRD). All the mammograms were collected from the patient files of the H. Lee Moffitt Cancer Center & Research Institute at the University of South Florida and showed the presence of at least one pathologically verified malignant or benign microcalcification cluster. A total of 260 regions of interest (ROIs) were manually created from the mammograms to contain a single cluster of calcifications in the center which formed the single-view dataset. The ROIs were processed and analyzed to extract features related to the morphology and distribution of the clusters that were combined with demographic information for cluster classification. A three-layer feedforward neural network, NevProp4 [Goodman and Harrell 1998] with Quickprop training and leave-one-out resampling was used for classification. The area under the ROC curve (A Z ) served as a measure of performance. 87
Work was done to optimize an existing CAD-Dx algorithm and test on a large dataset. Furthermore, several experiments were conducted to investigate: (a) the impact of feature selection approach on classification and particularly the use of a single-view vs. two-view average for feature estimation and (b) the impact of combining best features from the various optimization stages on classification. The conclusions that can be drawn from this work are: Among the selected features, the number of calcifications in the cluster and the age of the patient were the most important and were consistently ranked among the best in all experiments. These two features were new additions to a previous tested set and were proven to be powerful discriminants.. Patient age was a demographic feature while the other thirteen features in the selected set were related to the shape and the distribution of the calcification clusters. The importance of the age feature suggested that the inclusion of other demographic information in the classification process might lead to a better and more robust performance. Separating views, namely considering only CC or only MLO views for training of the classifier led to better performances. These results indicated that CAD-Dx systems could benefit significantly by using single view information rather than mixing views in the same set. The classifier trained better with the CC-views of the clusters than with the MLO-views of the clusters. The reason for this outcome is not evident in this study but the result is interesting and worthy of further investigation. 88
An analysis of the classification results from separate CC and MLO views revealed a large number of clusters that were classified correctly in just one view while very few (just 2) were incorrectly classified in both. This suggested a need for using information from both views to reduce the classification errors. The averaging of features in two-views yielded high sensitivity and specificity values proving our hypothesis that view combination would lead to improved CAD-Dx performance. For future work, larger datasets can be used that would permit separate training, validation and testing databases with statistically significant results. To overcome the drawback of a limited dataset we used the leave-one-out resampling that utilizes the maximum number of samples to train the network; Bootstrapping could be an alternative training to eliminate potential biases. The neural network used in this study seems to give a good estimation of the classification error-rate. However, other classifiers may be used to confirm this hypothesis before a clinical implementation is planned. Also, automation of the parameter selection for the neural network and the theoretical approaches would eliminate the errors associated with the empirical selection. The SFS method used here for feature ranking and selection has the risk of leading to non-optimum tree search. To avoid this problem, SBE, genetic algorithms or best-first search could be used. 89
REFERENCES American Cancer Society (ACS) (2003). Mammograms and other Breast Imaging procedures. URL= http://www.cancer.org/docroot/cri/content/cri_2_6x_mammography_and_other_breast_imaging_procedures_5.asp?sitearea=cri. Andersson, I., J. Hildell, A. Muhlow, and H. Pettersson (1978). Number of projections in mammography: influence on detection of breast disease. Am J Roentgenol 130: 349-351. Andersson, I. (1981). Radiographic screening for breast carcinoma. III. Appearance of carcinoma and number of projections to be used at screening. Acta Radiologica Diagnosis 22, Fasc.4: 407-420. Bassett, L. W., J. J. Pagani, and R. H. Gold (1980). Pitfalls in mammography. Radiology 136: 641-645. Bassett, L. W., and R. H. Gold (1983). Breast radiography using the oblique projection. Radiology; 149(2): 585-587. Bassett, L. W., D. H. Bunnell, R. Jahanshahi, R. Gold, R. D. Arndt and J. Linsman (1987). Breast cancer detection: one versus two views. Radiology, 165: 95-97. Bassett, L.W, and S. Gambhir (1991). Breast imaging for the 1990s. Seminars in Oncology. 18: 80-86. Bassett, L. W., (1998). Incorporating new technologies into clinical practice. Radiology 206(2): 301-303. Bassett, L. W., (1998). Mammography and other techniques for screening and diagnosing breast cancer. Ted Mann Family Resource Center website, URL= http://cancerresources.mednet.ucla.edu/5_info/5c_archive_lec/1998/mammography.htm. Bishop, C. W., Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995. 90 Bryan, S., J. Brown, and R. Warren (1995). Mammography screening: an incremental cost effectiveness analysis of two view versus one view procedures in London, J Epidemiol Community Health. 49: 70-79.
Canny, J. F., (1986). A computational approach to edge detection. IEEE Trans. Pattern Analysis and Machine Intelligence. Pp. 679-698. Castleman., K., (1979). Digital Image Processing (Prentice Hall, Englewood Cliffs, Reading, MA). Chan, H-P., K. Doi, S. Galhotra, C. J. Vyborny, H. MacMohan, and P. M. Jokich (1987). Image feature analysis and computer-aided diagnosis in digital mammography. I. Automated detection of microcalcifications in mammography, Med Phys. 14: 538-548. Chan, H-P., K. Doi, C. J. Vyborny, K. L. Lam, and R. A. Schmidt (1988). Computer-aided detection of microcalcifications in mammograms: methodology and preliminary clinical study,, Invest Radiol. 23: 664-671. Chan, H-P., K. Doi, C. J. Vyborny, R. A. Schmidt, C. E. Metz, K. L. Lam, T. Ogura, Y. Z. Wu, and H. MacMohan (1990). Improvement in radiologists detection of clustered microcalcifications on mammograms: The potential of Computer-aided diagnosis Invest. Radiol. 25: 1102-1110. Chan, H-P., S. C. Lo, B. Sahiner, K. Lam, and M. A. Helvie (1995). Computer-aided detection of mammographic microcalcifications: Pattern recognition with an artificial neural network. Medical Physics 22(10):1555-1567. Chan, H-P., B. Sahiner, K. L. Lam, N. Petrick, M. A. Helvie, M. M. Goodsitt, and D. D. Adler (1998). Computerized analysis of mammographic microcalcifications in morphological and texture feature spaces. Med. Phys. 25(10):2007-2019. Cheng, H-D, Yui Man Lui and R. I. Freimanis (1998). A novel approach to microcalcification detection using fuzzy logic technique. IEEE Transactions on Medical Imaging 17(3): 442-450. Davies, D. H. and D. R. Dance (1992). Automatic computer detection of clustered calcifications in digital mammograms, Phys Med Biol 37: 1385-1390. Duda, R. O., P. E. Hart, and D. G. Stork (2002). Pattern Classification. 2 nd edition, AWiley-Interscience Publication, John Wiley & Sons, Inc, New York. El-Naqa, I., Y.Yang, M. N. Wernick, N. P. Galatsanos, and R. Nishikawa (2002). Support vector machine learning for detection of microcalcifications in mammograms, IEEE Transactions on medical Imaging 21(12): 1552-1563. Fam, B. W., S. L. Olson, P. F. Winter, and F. J. Schloz (1988). Algorithm for the detection of fine clustered calcifications on film mammograms. Radiology 169: 333-337. 91
Feig., S. A., and M. J. Yaffe (1995). Digital mammography, computer-aided diagnosis and Telemammography. The Radiologic Clinics of North America, Breast Imaging 33(6): 1205-1230. Feig., S. A (2002) Clinical evaluation of computer-aided detection in breast cancer screening. Seminars in Breast Disease 5(4): 223-230. Fischer, U., F. Baum, S. Obernauer, S. Luftner-Nagel, D. von Heyden, R. Vosshenrich, and E. Grabbe (2002). Comparative study in patients with microcalcifications:full-field digital mammography vs screen-film mammography., European Radiology (Publisher: Springer-Verlag Heidelberg), 12(11): 2679-2683. Floyd, C. E., J.Y. Lo, A. J. Yun, D. C. Sullivan, and P. J. Kornguth (1994). Prediction of breast cancer malignancy using an artificial neural network. Cancer 74(11): 2944-2948. Gavrielides M. (1996) Shape analysis of mammographic calcification clusters. Masters thesis, Department of Electrical Engineering, University of South Florida, Tampa. Gavrielides, M., M. Kallergi, and L. P. Clarke (1997). Automatic shape analysis and classification of mammographic calcifications. SPIE 3034: 869-876. Gavrielides, M. A., J. Y. Lo, and C. E. Floyd (2002). Parameter optimization of a computer-aided diagnosis scheme for the segmentation of microcalcification clusters in mammograms. Med. Phys. 29(4): 475-483. Giger, M. L., (1993) Computer-aided diagnosis RSNA Categorical Course in Physics, 283-298. Giger, M. L. and Z. Huo (1999). Artificial neural networks in breast cancer diagnosis: merging on computer-extracted features from breast images,, Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress, IEEE. 3: 1768-1769. Giger, M. L., Huo. Z, Kupinski M, and C. J. Vyborny (2000). Computer-aided diagnosis in mammography., in Handbook of Medical Imaging, Volume 2, Medical Image Processing and Analysis (editors: Milan Sonka and J. Michale Fitzpatrick, ISBN: 0-8194-3622-4). A publication of SPIE, Bellingham, Washington, USA. Giger, M. L (2002) Computer-aided diagnosis in radiology. Acad. Radiol. 9:1-3. Gonzalez, R.C and R. E. Woods, Digital image processing. Reading, MA:Addison-Wesley, 1993. Good, W. F., B. Zheng, Y-H.Chang, X. H. Wang, G. Maitz, and D. Gur (1999). Multi-image CAD employing features derived from ipsilateral mammographic views. Part of 92
the SPIE Conference on Image Processing, SPIE 3661: 474-485. Goodman, P. H. and F. E. Harrell Jr. (2001). NevProp manual with introduction to artificial neural network theory. University of Nevada, Reno 2001. URL: http://brain.unr.edu/publications/NevPropManual.pdf Harrell, F. E., K. L. Lee, and D. B. Mark (1996). Tutorial in BiostatisticsMultivariate Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors. Statistics in Medicine 15: 361-387. Haykin S, (1999). Neural Networks: A Comprehensive Foundation. 2 nd edition, Pretince Hall, Upper Saddle River, New Jersey. Hojjatolesami, S. A., and J. Kittler (1996). Detection of clusters of microcalcification using a k-nearest neighbour classifier. IEEE, Digital Mammography 10: 1-6. Holmes, C., and H. Muss (2003). Diagnosis and treatment of breast cancer in the elderly. CA Cancer J Clin., 53: 227-244. Huo, Z., M. L. Giger, C. J. Vyborny, and C. E. Metz (2002). Breast cancer: effectiveness of computer-aided diagnosisobserver study with independent database of mammograms. Radiology 224: 560-568. Imaginis, (2003) General information on mammography. URL= http://imaginis.com/breasthealth/mammography.asp#what. Jemal, A., A.Thomas, T. Murray, and M. Thun (2002). Cancer Statistics, 2002, CA Cancer J. Clin. 52: 23-47. Jemal, A., T. Murray, A.Samuels, A. Ghafoor, E. Ward, and M. J. Thun (2003). Cancer Statistics, 2003. CA Cancer J Clin., 53:5-26. Jiang, Y., R. M. Nishikawa, D. E. Wolverton, C. E. Metz, M. L. Giger, R. A. Schmidt, C. J. Vyborny, and K. Doi. (1996-a). Malignant and benign clustered microcalcifications: automated feature analysis and classification. Radiol. 198:671-678. Jiang, Y., C. E. Metz, and R. M. Nishikawa (1996-b). A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 201:745-750. Jiang, Y., R. M. Nishikawa, D. E. Wolverton, C. E. Metz, R. A. Schmidt, and K. Doi (1997). Computerized classification of malignant and benign clustered microcalcifications in mammograms,, Proc. 19 th Intl Conf. IEEE/EMBS pp. 521-523, Oct. 30Nov 2, Chicago, Il. Jiang, Y, R. M. Nishikawa, R. A. Schmidt, C. E. Metz, M. L. Giger, and K. Doi. (1999). 93
Improving breast cancer diagnosis with computer-aided diagnosis. Acad. Radiol. 6(1): 22-33. Kahn, C. E., L. M. Roberts, K. A. Schaffer, and P. Haddawy (1997) Construction of a bayesian network for mammographic diagnosis of breast cancer. Comput. Biol. Med., 27(1): 19-29. Kallergi, M., M. A. Gavrielides, L. He, C. G. Berman, J. J. Kim, and R. A. Clark (1998). A simulation model of mammographic calcifications based on the ACR BIRADS. Academic Radiology 5: 670-679. Kallergi, M., L. He, M.Gavrielides, J. Heine, and L. P. Clarke, (1998). Resolution effects on the morphology of calcifications in digital mammograms. Proc. of Medicon '98, Cyprus, June 14-17. Kallergi, M. (2003) Computer Aided Diagnosis of Mammographic Microcalcification Clusters. to appear in Medical Physics. Kallergi, M, J. J. Heine and M. Tembey, (2004). Computer-aided diagnosis of mammographic calcification clusters: impact of segmentation. to appear in the Handbook of Medical Image Analysis: Advanced Segmentation and Registration Models. Heine, J. J, S. R. Deans, D. K. Cullers, R. Stauduhar, and L. P. Clarke(1997). Multiresolution statistical analysis of high-resolution digital mammograms. IEEE Trans. Med. Imag. 16(5): 503-604. Heine, J. J., C. Deng, and A. Bilhanan, (2003). A practical adjustable two-dimensional band-pass filter, sumitted to OE Letters. Highnam, R, Y. Kita, M. Brady, B. Shepstone, R. English (1998). Determining correspondence between views. IWDM 1998: 4 th International Workshop on Digital Mammography, pp 111-118. Kita, Y., R. Highnam, and M. Brady (1998). Correspondence between different view breast x-rays using a simulation of breast deformation., Computer Vision and Pattern Recognition Proceedings, pp 700-707. Kita, Y., R. Highnam, and M. Brady (2001). Correspondence between different view breast x-rays using curved epipolar lines. Computer Vision and Image Understanding 83: 38-56. Kita, Y., E. Tohno, R. Highnam, and M. Brady (2002). A CAD system for the 3D location of lesions in mammograms. Medical Image Analysis 6: 267-273. Kopans., D. B. (1991). Discriminating analysis uncovers breast lesions., Diagnostic 94
Imaging pp. 96-101. Kreager, J. A., P. J. Kornguth (1994). Mammography for screening high-risk patients for cancer: value of including a lateral projection. AJR, 162:295-297 Kupinski, M. A (1997). Feature selection and classifiers for the computerized detection of mass lesions in digital mammography. IEEE pp. 2460-2463. Lanyi, M., (1985) Morphologic analysis of microcalcifications: A valuable differential diagnostic system for early detection of breast carcinomas and reduction of superfluous exploratory excisions, In Early Breast Cancer, edited by J. Zander and J. Baltzer (Springer-Verlag, Berlin, Heidelberg, 1985). Lanyi, M. (1986). Diagnosis and Differential Diagnosis of Breast Calcifications. (Springer Verlag, Berlin Heidelbert, 1986). Lester, R. G. (1984). The contribution of radiology to the diagnosis, management, and care of breast cancer. Radiology 151: 1-7. Libshitz, H. I., S. Fetouh, J. Isley, and R. G. Lester (1976). One-view Mammographic Screening?, Radiology 120:719-722. Lo, J. Y., J. A. Baker, P. J. Kornguth, J.D. Iglehart, and C. E. Floyd (1997). Predicting breast cancer invasion with artificial neural networks on the basis of mammographic features. Radiology 203:159-163. Lo, J. Y., J. A. Baker, P. J. Kornguth, C. E. Floyd (1999). Effect of patient history data on the prediction of breast cancer from mammographic findings with artificial neural networks. Acad Radiol 6:10-15. Lo, J. Y., M. Gavrielides, M. K. Markey, and J. L. Jesneck (2003). Computer-aided classification of breast microcalcification clusters: Merging of features from image processing and radiologists. Medical Imaging 2003: Image Processing, Proceedings of the SPIE 5032: 882-889. Lundgren, B. (1977). The oblique view at mammography., British Journal of Radiology 50: 626-628. Lundgren B. and A. Helleberg (1982). Single oblique-view mammography for periodic screening for breast cancer in women JNCL, 68(3): 351-355. Markopoulos, C, E. Kouskos, K. Koufopoulos, V. Kyriakou, and J. Gogas (2001). Use of an artificial neural networks (computer analysis) in the diagnosis of microcalcifications on mammography. European J. Radiol. 39: 60-65. 95
Massachusetts Breast Cancer Coalition (MBCC) (2002). Facts about breast cancer in the United States: Year 2002.. URL= http://www.mbcc.org/facts/general.shtml. Metz, C. E. (1978). Basic principles of ROC analysis. Seminars in Nuclear Medicine vol. VIII, No. 4, pp. 283-298. Mitka, M. (2003). Researchers seek mammography alternatives. JAMA 290 (4): 450-451. Muir, B. B., A. E. Kirkpatrick, M. M. Roberts, and S. W. Duffy (1984). Oblique-view mammography: Adequacy for screening-work in progress. Radiology 151(1): 39-41. Nishikawa, R.M., M. L. Giger, K. Doi, C. E. Metz, F. F. Yin, C. J. Vyborny, and R. A. Schmidt (1994). Effect of case selection on the performance of computer-aided detection schemes. Med. Phys. 21(2):265-269. Paquerault, S., N. Petrick, H. Chan, and B. Sahiner (2001). Improvement of mammographic lesion detection by fusion of information from different views, Medical Imaging 2001: Proceedings of the SPIE .4322: 1883-1889. Paquerault, S., N. Petrick, H-P Chan, B. Sahiner, and M. A. Helvie (2002). Improvement of computerized mass detection on mammograms: Fusion of two-view information. Med Phys. 29(2): 238-247. Paquerault, S., H-P Chan, B. Sahiner, N. Petrick, L. Hadjiiski, M. N. Gurcan, C. Zhou, and M. A. Helvie (2002). Prediction of object location in different mammographic views using geometrical models., IWDM 2000, Proceedings of the 5 th International Workshop on Digital Mammography, (edited by Martin J. Yaffe), Toronto, Canada, pp. 748-755. Petrick, N., H-P. Chan, B. Sahiner, M. A. Helvie, and S. Paquerault (2000). Preclinical evaluation of a CAD Algorithm for early detection of breast cancer. IWDM 2000, Proceedings of the 5 th International Workshop on Digital Mammography (edited by Martin J. Yaffe), Toronto, Canada, pp. 329-333. Qian, W., L. P. Clarke, M. Kallergi, and R. A. Clark (1994). Tree-structured non-linear filters in digital mammography. IEEE Trans. Med. Imaging 13(1): 25-36. RadiologyInfo: The Radiology information resource for patients. (2003) Mammography, URL= http://www.radiologyresource.org/content/mammogram.htm. Sahiner, B., N. Petrick, H-P.Chan, S. Paquerault, M. A. Helvie, and L. M. Hadjiiski (2001). Recognition of lesion correspondence on two mammographic viewA new method for false-positive reduction of computerized mass detection, Medical Imaging 2001: Proceedings of the SPIE 4322: 649-655. 96
Samuels, T. H., (1998). Breast Imaging: A look at current and future technologies. Postgraduate Medicine Imaging Techniques Symposium: Breast Imaging vol. 104:5, November. Shen., L., R. M. Rangayyan, and J. E. Leo Desautels (1993). An automatic detection and classification system for calcifications in mammograms. Proc. SPIE, 1905: 799-805. Shen., L, R.M. Rangayyan, and J.E. Leo Desautels (1994). Application of shape analysis to mammographic calcifications. IEEE Transactions on Medical Imaging 13: 263-274. Sickles, E. A. (1984). Mammographic features of early breast cancer., AJR 143:461-464. Strickland, R. N., and H. I. Hahn (1996). Wavelet transform for detecting microcalcifications in mammograms. IEEE Transactions Med. Imag. 15:, 218-229. Thiele, D. L., C. Kimme-Smith, T. D. Johnson, M. McCombs, and L. W. Bassett (1996). Using tissue textures surrounding calcification clusters to predict benign vs malignant outcomes. Med. Phys. 23(4):549-555. Thurfjell E, Taube A, and Tabar L (1994). One versus two view mammography screening: A prospective population-based study. Acta Radiologica 35: 340-344. Thurfjell, E. L, K. A. Lernevall, and A. A. S. Taube (1994). Benefit of independent double reading in a population-based mammography screening program. Radiol. 191(1): 241-244. Tourassi, G. D. and C. E. Floyd (1997). The effect of data sampling on the performance evaluation of artificial neural networks in medical diagnosis Med. Decis. Making 17: 186-192. Tourassi, G. D., M. K. Markey, J. Y. Lo, and C. E. Floyd, (2001). A neural network approach to breast cancer diagnosis as a constraint satisfaction problem,, Med Phys, 28(5), pp. 804-811. Tveter D. R., (accessed 2003). Professional Basis of AI Backprop Hypertext Documentation., URL = http://www.dontveter.com/probp/prodoc/Q.html Verma., B. and J. Zakos, (2001). A Computer Aided Diagnosis system for digital mammograms based on fuzzy-neural and feature extraction techniques IEEE Tran. On Information Technology in Biomedicine 5(1): 46-54. Vyborny, C. J., M. L. Giger, and R. M. Nishikawa (2000).Computer-aided detection and diagnosis of breast cancer. Radiologic Clinics of North America 38(4): 725-740. 97
Warren, R. M. L., S. W. Duffy, and S. Bashir (1996). The value of the second view in screening mammography. The British Journal of Radiology 69: 105-108. Woods, K. S, J. L. Solka C. E. Priebe, C. C. Doss, K. W. Bowyer, and L. P. Clarke (1993). Comparative pattern recognition techniques for detection of microcalcifications, Proc. SPIE, vol. 1905, pp. 841-852. Woods, K. S. Automated Image Analysis Techniques for Digital Mammography. (1994). Ph.D Dissertation, Department of Computer Science and Engineering, University of South Florida. Williams, M. B., E. D. Pisano, M. D. Schnall, and L. L. Fajardo (1998). Future directions in imaging of breast diseases. Radiology 206(2): 297-300. Wu, Y., K. Doi, M. L. Giger, and R. M. Nishikawa (1992). Computerized detection of clustered microcalcifications in digital mammograms: applications of artificial neural networks, Med. Phys. 19: 555-560. Wu, Y., M. L. Giger, K. Doi, C. J. Vjborny, R. A. Schmidt, and C. E. Metz (1993). Artificial neural networks in mammography: Application to decision making in the diagnosis of breast cancer. Radiology 187:81-87. Yoshida, H., W. Zhang, W. Cai, K. Doi, R. M. Nishikawa, and M. L. Giger (1995). Optimizing wavelet transform based on supervised learning for detection of microcalcifications in digital mammograms, in Proc. IEEE Int. Conf. Image Processing, vol 3, Washington DC pp. 152-155. Yu., S., and L. Guan (1999). Feature selection using general regression neural networks for the automatic detection of clustered microcalcifications. IEEE pp. 1101-1104. Zhang, W., K. Doi, M. L. Giger, Y. Wu, R. M. Nishikawa, and R. A. Schmidt (1994). Computerized detection of clustered microcalcifications in digital mammograms using a shift-invariant neural network. Med Phys. 21:517-524. 98
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001441495
007 cr mnu|||uuuuu
008 031203s2003 flua sbm s000|0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0000168
Computer-aided diagnosis for mammographic microcalcification clusters
h [electronic resource] /
by Mugdha Tembey.
[Tampa, Fla.] :
University of South Florida,
Thesis (M.S.C.S.)--University of South Florida, 2003.
Includes bibliographical references.
Text (Electronic thesis) in PDF format.
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
Title from PDF of title page.
Document formatted into pages; contains 112 pages.
ABSTRACT: Breast cancer is the second leading cause of cancer deaths among women in the United States and microcalcifications clusters are one of the most important indicators of breast disease. Computer methodologies help in the detection and differentiation between benign and malignant lesions and have the potential to improve radiologists' performance and breast cancer diagnosis significantly. A Computer-Aided Diagnosis (CAD-Dx) algorithm has been previously developed to assist radiologists in the diagnosis of mammographic clusters of calcifications with the modules: (a) detection of all calcification-like areas, (b) false-positive reduction and segmentation of the detected calcifications, (c) selection of morphological and distributional features and (d) classification of the clusters. Classification was based on an artificial neural network (ANN) with 14 input features and assigned a likelihood of malignancy to each cluster. The purpose of this work was threefold: (a) optimize the existing algorithm and test on a large database, (b) rank classification features and select the best feature set, and (c) determine the impact of single and two-view feature estimation on classification and feature ranking. Classification performance was evaluated with the NevProp4 artificial neural network trained with the leave-one-out resampling technique. Sequential forward selection was used for feature selection and ranking. Mammograms from 136 patients, containing single or two views of a breast with calcification cluster were digitized at 60 microns and 16 bits per pixel. 260 regions of interest (ROI's) centered on calcification cluster were defined to build the single-view dataset. 100 of the 136 patients had a two-view mammogram which yielded 202 ROI's that formed the two-view dataset. Classification and feature selection were evaluated with both these datasets. To decide on the optimal features for two-view feature estimation several combinations of CC and MLO view features were attempted. On the single-view dataset the classifier achieved an AZ =0.8891 with 88% sensitivity and 77% specificity at an operating point of 0.4; 12 features were selected as the most important. With the two-view dataset, the classifier achieved a higher performance with an AZ =0.9580 and sensitivity and specificity of 98% and 80% respectively at an operating point of 0.4; 10 features were selected as the most important.
Co-adviser: Goldgof, Dmitry
Co-adviser: Kallergi, Maria
receiver operating characteristics (roc).
x Computer Science
t USF Electronic Theses and Dissertations.