USF Libraries
USF Digital Collections

A learning approach to obtain efficient testing strategies in medical diagnosis


Material Information

A learning approach to obtain efficient testing strategies in medical diagnosis
Physical Description:
Fakih, Saif
University of South Florida
Place of Publication:
Tampa, Fla.
Publication Date:


Subjects / Keywords:
Markov decision process
solitary pulmonary nodule
reinforcement learning
rough sets
Dissertations, Academic -- Industrial Engineering -- Masters -- USF   ( lcsh )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )


ABSTRACT: Determining the most efficient use of diagnostic tests is one of the complex issues facing the medical practitioners. It is generally accepted that excessive use of tests is common practice in medical diagnosis. Many tests are performed even though the incremental knowledge gained does not affect the course of diagnosis. With the soaring cost of healthcare in the US, there is a critical need for cutting costs of diagnostic tests, while achieving a higher level of diagnostic accuracy. Various decision making tools assisting physicians in diagnosis management have been presented to the literature. One such method, called analytical hierarchy process, utilize a multilevel structure of decision criterion for sequential pair wise comparison of available test choices. Many of the decision-analytic methods are based on Bayes' theory and decision trees. These methods use threshold treatment probabilities and performance characteristics of the tests, such as true-positive rate and false-positive rates, to choose among the available alternatives. Sequential testing approaches tend to elongate the diagnosis process, whereas the parallel testing approach generally involves higher number of tests. This research is focused on developing a machine learning based methodology for finding an efficient testing strategy for medical diagnosis. The method, based on the patient parameters (both observed and tested), recommends test(s) with the objective of optimizing a measure of performance for the diagnosis process. The performance measure is a combined cost of the testing, the risk and discomfort associated with the tests and the time taken to reach diagnosis. The performance measure also considers the diagnostic ability of the tests. The methodology is developed combining tools from the fields of data mining (rough set theory, in particular), utility theory, Markov decision processes (MDP), and reinforcement learning (RL). The rough set theory is used in extracting diagnostic information in the form of rules from the medical databases. Utility theory is used to bring three non-homogenous measures (cost of testing, risk and discomfort and diagnostic ability) into one cost based measure of performance. The MDP framework along with an RL algorithm facilitates obtaining efficient testing strategies. The methodology is implemented on a sample problem of diagnosing Solitary Pulmonary Nodule (SPN). The results obtained are compared with those from four other approaches. It is shown that the RL based methodology holds significant promise in improving the performance of diagnostic process.
Thesis (M.S.I.E.)--University of South Florida, 2004.
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Saif Fakih.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 83 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001469407
oclc - 55731158
notis - AJR1161
usfldc doi - E14-SFE0000309
usfldc handle - e14.309
System ID:

This item is only available as the following downloads:

Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001469407
003 fts
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 040524s2004 flua sbm s000|0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0000309
b SE
1 100
Fakih, Saif.
2 245
A learning approach to obtain efficient testing strategies in medical diagnosis
h [electronic resource] /
by Saif Fakih.
[Tampa, Fla.] :
University of South Florida,
Thesis (M.S.I.E.)--University of South Florida, 2004.
Includes bibliographical references.
Text (Electronic thesis) in PDF format.
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
Title from PDF of title page.
Document formatted into pages; contains 83 pages.
ABSTRACT: Determining the most efficient use of diagnostic tests is one of the complex issues facing the medical practitioners. It is generally accepted that excessive use of tests is common practice in medical diagnosis. Many tests are performed even though the incremental knowledge gained does not affect the course of diagnosis. With the soaring cost of healthcare in the US, there is a critical need for cutting costs of diagnostic tests, while achieving a higher level of diagnostic accuracy. Various decision making tools assisting physicians in diagnosis management have been presented to the literature. One such method, called analytical hierarchy process, utilize a multilevel structure of decision criterion for sequential pair wise comparison of available test choices. Many of the decision-analytic methods are based on Bayes' theory and decision trees. These methods use threshold treatment probabilities and performance characteristics of the tests, such as true-positive rate and false-positive rates, to choose among the available alternatives. Sequential testing approaches tend to elongate the diagnosis process, whereas the parallel testing approach generally involves higher number of tests. This research is focused on developing a machine learning based methodology for finding an efficient testing strategy for medical diagnosis. The method, based on the patient parameters (both observed and tested), recommends test(s) with the objective of optimizing a measure of performance for the diagnosis process. The performance measure is a combined cost of the testing, the risk and discomfort associated with the tests and the time taken to reach diagnosis. The performance measure also considers the diagnostic ability of the tests. The methodology is developed combining tools from the fields of data mining (rough set theory, in particular), utility theory, Markov decision processes (MDP), and reinforcement learning (RL). The rough set theory is used in extracting diagnostic information in the form of rules from the medical databases. Utility theory is used to bring three non-homogenous measures (cost of testing, risk and discomfort and diagnostic ability) into one cost based measure of performance. The MDP framework along with an RL algorithm facilitates obtaining efficient testing strategies. The methodology is implemented on a sample problem of diagnosing Solitary Pulmonary Nodule (SPN). The results obtained are compared with those from four other approaches. It is shown that the RL based methodology holds significant promise in improving the performance of diagnostic process.
Adviser: Das, Tapas K.
Markov decision process.
solitary pulmonary nodule.
reinforcement learning.
rough sets.
0 690
Dissertations, Academic
x Industrial Engineering
t USF Electronic Theses and Dissertations.
4 856


A Learning Approach To Obtain Efficient Testing Strategies In Medical Diagnosis by Saif Fakih A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Industrial Engineering Department of Industrial and Ma nagement Systems Engineering College of Engineering University of South Florida Major Professor: Tapas K. Das, Ph.D. Ali Yalcin, Ph.D. Donald Berndt, Ph.D. Date of Approval: March 15, 2004 Keywords: rough sets, reinforcement learning, solitary pulmonary nodule, markov decision process Copyright 2004 Saif Fakih


DEDICATION To my Mom, Dad, Lateef and Jyoti


ACKNOWLEDGEMENTS I would like to thank Dr. Tapas Das for his immense support, easy availability and excellent guidance for my re search. He is a great mentor and his enthusiasm and dedication has been my bigges t motivation for completing this thesis. I would also like to thank Dr Ali Yalcin and Dr. Donald Berndt for accepting to be on my thesis committee. Hearty thanks go to my best friends Bhavani, Swetha, Aditi and Vikas for being very dependable and for encouraging my work. I would like to make a special note of my wife Jyoti for all t he motivation, care, encouragement and support during my years of graduate studies. And finally, I would like to thank my parents for nourishing my dr eams of higher education a nd for always being there.


i TABLE OF CONTENTS LIST OF TABLES v LIST OF FIGURES vi ABSTRACT vii CHAPTER 1 INTRODUCTION 1 1.1 Diagnostic tests and health care economics 2 1.2 Diagnostic tests and de fensive medicine 5 1.3 Decision making in diagnostic processes 7 1.4 Different methods of test sele ction for the diagnosis process 8 1.5 Problem and res earch approach 10 1.6 Existing computer systems in diagnostic process 11 1.7 Outline of the thesis 12 CHAPTER 2 LITERATURE REVIEW 13 2.1 Data mining in medicine 13 2.2 History of data mining in medical decision making 14 2.3 Applications of different data mining techniques in medicine 17 2.3.1 Rule induction 19 If then rules 19 Rough sets 20 Association rules 23 2.3.2 Decision trees 24


ii 2.3.3 Inductive logic programming 25 2.3.4 Neural networks 26 Supervised learning 27 Unsupervised learning 29 2.4 Applications of rough sets in medical diagnosis 30 2.5 Different methods to optimiz e the diagnostic process 32 CHAPTER 3 RESEARCH OBJECTIVES 34 3.1 Problem revisited 34 3.2 Research objective 35 3.3 Parameters and assumptions 35 3.3.1 Data availability for rule generation 35 3.3.2 Final diagnosis 35 3.3.3 Cost function 36 3.3.4 Test values from the distributions 36 3.3.5 Patient arrival 36 CHAPTER 4 SOLUTION METHODOLOGY 37 4.1 Overview of the solution methodology 37 4.2 Model formulation(MDP) 38 4.3 Actions from rough sets methodology 39 4.4 An utility theory based cost estimation 40 4.4.1 Direct Cost 40 4.4.2 Diagnostic utility 41 4.5 RL algorithm 41


iii CHAPTER 5 TEST PROBLEM AND IMPLEMENTATION 47 5.1 Introduction 47 5.2 Solitary pulmonary nodule (SPN) 47 5.3 Issues of SPN diagnosis 47 5.4 SPN data availability for implementation 49 5.4.1 Selection of attri butes and their values 49 5.4.2 Decision rules to define the action set 50 5.4.3 Generation of attribute values 50 5.5.4 Effective costs for tests/procedures 53 5.5 Implementation 53 5.5.1 Descritization of the available dataset 54 5.5.2 Categorization of patient states 55 5.5.3 Dynamic generat ion of states 56 5.5.4 Generation of actions for a patient state 57 5.5.5 Learning parameters 57 5.5.6 Q-value updates 58 5.5.7 Software framework 58 CHAPTER 6 RESULTS AND ANALYSIS 60 6.1 Introduction 60 6.2 RL policy 60 6.3 Benchmarking RL policy 62 CHAPTER 7 CONCLUSION 66 7.1 Future extensions 67




v LIST OF TABLES Table 1. Attributes utilized in the test problem 49 Table 2. Decision Rule obtained from rough set analysis 51 Table 3. Test cases utilized in the implementation 52 Table 4. Utility values for the attributes 53 Table 5. Attributes, value ranges and corresponding discrete values 54 Table 6. Values for the learning parameters 58 Table 7. Comparison of cost averaged for first and last 500 patients 62 Table 8. Comparison of the total cost of diagnosis averaged over 3000 runs 63 Table 9. Policy comparison between HMO and RL approaches for set A 64 Table 10. Policy comparison between HMO and RL approaches for set B 64


vi LIST OF FIGURES Figure 1. Average annual growth of national health expenditure 3 Figure 2. National health expend iture as percentage of GDP 3 Figure 3. Breakdown of personal healthcare expenditure 5 Figure 4. Rule for early diagnosis of rheumatic disease induced by CN2 19 Figure 5. A prolog clause induced by GKS 26 Figure 6. Architecture of mu lti-layered neural network 27 Figure 7. Overview of the solution methodology 37 Figure 8. Detailed flowchart for RL implementation 46 Figure 9. RL results for cost set A 61 Figure 10. RL results for cost set B 61 Figure 11. Comparison of result s obtained using RL approach and benchmark approaches 63


vii A LEARNING APPROACH TO OBTAIN EFFICIENT TESTING STRATEGIES IN MEDICAL DIAGNOSIS Saif Fakih ABSTRACT Determining the most efficient use of diagnostic tests is one of the complex issues facing the medical practiti oners. It is generally accepted that excessive use of tests is common practice in medical diagnosis. Many tests are performed even though the in cremental knowledge gained does not affect the course of diagnosis. With the soaring cost of healthcare in the US, there is a critical need for cutting costs of diagnostic tests, while achieving a higher level of diagnostic accuracy. Various decision making tools assisting physicians in diagnosis management have been presented to the literature. One such method, called analytical hierarchy process, ut ilize a multilevel stru cture of decision criterion for sequential pair wis e comparison of available test choices. Many of the decision-analytic methods are based on Bayes theory and decision trees. These methods use threshold tr eatment probabilities and performance characteristics of the tests, such as true-positive rate and false-positive rates, to choose among the available alternatives Sequential testing approaches tend to elongate the diagnosis process, whereas the parallel testi ng approach generally involves higher number of tests.


viii This research is focused on developing a machine learning based methodology for finding an efficient testi ng strategy for medical diagnosis. The method, based on the patient paramet ers (both observed and tested), recommends test(s) with the objective of optimizing a measure of performance for the diagnosis process. The performanc e measure is a combined cost of the testing, the risk and discomfort associat ed with the tests and the time taken to reach diagnosis. The performance measure also considers the diagnostic ability of the tests. The methodology is develo ped combining tools from the fields of data mining (rough set theory, in particu lar), utility theory, Markov decision processes (MDP), and reinforcement learni ng (RL). The rough set theory is used in extracting diagnostic information in the form of rules from the medical databases. Utility theory is used to br ing three non-homogenous measures (cost of testing, risk and discomfort and diagnostic ability) into one cost based measure of performance. The MDP framework along with an RL algorithm facilitates obtaining efficient testing strategies. The methodology is implemented on a sample problem of diagnosing So litary Pulmonary Nodule (SPN). The results obtained are compared with those from four other approaches. It is shown that the RL based methodology holds si gnificant promise in improving the performance of diagnostic process.


1 CHAPTER 1 INTRODUCTION Diagnostic tests extend the inform ation base beyond what is usually obtained through a physical examination and from medical hist ory. They allow the doctor to obtain a deeper insight of t he patients medical condition. New and advanced diagnostic tests and tools are cons tantly being introduced to facilitate better understanding and treatment of diseas e, injury, and congenital or acquired abnormalities Diagnosis of a disease and its treat ment are not separate processes. Although correct diagnosis helps narrow the appropriate treatment choices, it is often the case that treatment must be pursued without know ing the underlying patient state with certainty. The reason bei ng the diagnostic process is not a one shot activity and it requires additional information about the underlying disease or problem. This may delay the treatment and make the patients outcome worse. The process is often made more complex by the uncertainty associated with the patients reaction to different treat ment choices and also the high costs associated with various diagnostic (investi gative) alternatives. Thus, in the course of patient management, one need to carefully evaluate the benefits of possible diagnostic and treatment steps with regards to the overarching theme of patients well being.


2 The diagnosis can be defined as the description of a health problem in terms of known diseases, and the diagnosti c process is a set of actions needed to obtain the diagnosis. Since a diagnostic process has various technical, organizational, and economical aspects, it can be defined from an optimization point of view as follows [1]. The diagnostic process consist of a num ber of tests/procedur es, individualized for a specific patient, that ar e sequenced in order to minimize the duration of the diagnos tic process as a whole, waiting time between indi vidual test or procedure, the overall cost of the process, and the risk and discomfort associ ated with the tests/procedures. The actions in a diagnostic process involve determining the appropriate sequ ence of the tests/procedures, performing the selected test/procedure, and evaluating the results of performed examinations in order to reach a diagnosis. Hence, the objective of improvi ng a diagnostic process could be to optimize a utility measure that can be defined as a func tion of the costs of the tests and/or procedures, the risk and disco mfort associated with the tests, the diagnostic ability of th e tests and the time ta ken by the process. 1.1 Diagnostic tests a nd health care economics United States has the best medical research and the best scientist and physicians since World War II. Although, it offers the best medical care in the


3 world, there is an inexorable push to modify the way health care is delivered and financed. One of the major factors is the rising cost of health care. Since 1960, healthcare costs have been rising on a very steep curve and Figure 1. Average annual growth of national health expenditure Figure 2. National health ex penditure as percentage of GDP 12 9.7 5.7 8.7 8.6 7.3 7.2 6.9 0 2 4 6 8 10 12 14 1966 1988 1989 1993 1994 2000 2001 2002 2003 2004 2008 2009 2012 Calendar Years Average Annual Growth Note: All values are in generic units


4 annual rate of 7.3 percent (Figure 1) reac hing $3.1 trillion. This increase would bring health care spending to approximat ely 17.7 percent of GDP by 2012, up from its 2002 share of 14.8 (Figure 2). Some of the major factors causing health ca re cost inflation are listed as follows. U.S. demographics increasing porti on of population over the age of 65 Waste of resources by fraud and abus e of administrative structure and errors in accounting which approxim ates to 15% of the total costs Physician behavior, including o Defensive medicine: the ordering of tests, procedures and visits or avoidance of certain procedures or patients due to concerns of about medical malpractice liability risk, which is estimated to cost more than $50 billion per year. o Need for more information: Medical educators have spent the past 50 years ingraining into physicians that more information is better o Diagnostic uncertainty: The desir e to order diagnostic tests or procedure by a physician to boost confidence and reduce certainty; against incremental cost


5 Figure 3. Breakdown of pers onal healthcare expenditure This is also visible from the break down of expenditure for 2003 (figure 3) that shows physicians and hospitals direct ly consuming 61 percent of the total expenditure and are areas in which health care spending must be constrained. 1.2 Diagnostic tests a nd defensive medicine Defensive medicine is an increasingl y common practice that exacts huge costs, in both fiscal and human terms For more than two decades many physicians, researchers and government offici als have claimed it to be the most damaging and costly research of medi cal malpractice system. Defensive medicine [2] occurs when doctors order test s, procedures or visi ts, or avoid highrisk patients or procedures to reduce t heir exposure to malpractice liability. When physicians do extra tests or procedures to primarily reduce malpractice liability, they are practicing positive defensive medicine. When they are avoiding certain patients or procedures, t hey are practicing negative defensive medicine. Expenditure-2003 36% 25% 3% 5% 4% 17% 10% Hospital care Physician and clinical services Other processional Services Dental Services Other personal health care Nursing home and health care Medical products


6 Over the period of time, many procedur es performed out of the conscious concerns about liability may become so ingr ained in the customary practice that the physician is no longer aware of the original motivation for doing them and come to believe that such practices are medically indicated. Thus, although physicians may practice conscious defensiv e medicine in a limited set of clinical situations, additional defensive practi ces may result from the cumulative response of the medical profession to ma lpractice system. It can also exist when it acts together with other motivations such as belief in procedures effectiveness or desire to reduce medical un-certainty or financial gains. Particularly, when the medical consequences of being wrong are se vere, as in the case of a life threatening or debilitating di sease or condition for which early diagnosis would mean better or more effective treatmen t, then the desire for certainty, and the tests that can increase i t, grows. Thus, the frequency of test ordering at any given probability of disease should be higher in patients suspected of having diseases that are more serious. Some of the factors affe cting the amount of defensive medicine are as follows. New technology: Perception of increas ing risks may arise from continual development of new diagnostic tool s and improved therapy for serious conditions. With new medical tec hnology, physician may have greater uncertainty the appropriate indications for its use and woul d therefore feel more concern for the potential for liability. Changing consequences of malpractic e suits: Recent federal and state laws requiring reporting of malpractice claims to a central repository may


7 increase the professional and financia l significance of even a single lawsuit in the minds of physicians. 1.3 Decision making invol ved in diagnostic processes A physician is visited by patients due to presence of symptoms or due to the feeling of ill health. The patient would want to know the reason for the symptom and wants it elim inated quickly and painlessly. The physicians task is to either discover the cause of the symptom or rule out the possibility of a serious disease. In this diagnosis process, decis ion making is involved in three different areas. With initial symptoms and examinations, the physician may postulate more than one diagnostic hypothesis. The physician often narrows down the list and chooses to pursue one hypothesis at a time. The choice is made based on the probability the presence of the disease and the urgency of treating it. It also depends on the expert ise of the physician, his\her prior experience on a similar case, etc. The physician uses clinical findings to confirm the selected hypothesis. This is usually done by ordering diagnostic tests and/or procedures. Depending on the circumst ance, there could be a wide number of choices of tests\procedures available to confirm the hypothesis. The physician decides on the strategy to choose for the hypothesis. Using the results of the tests and /or procedures, the physician has to decide either to accept or disca rd to the current hypothesis.


8 Our main focus in this research is to develop a procedure to determine an effective strategy of obtaining additional information from diagnostic tests and/or procedures in order to comp lete the diagnosis process. 1.4 Different methods of test selection for the diagnosis process There are two basic approac hes of ordering diagnostic test and/or procedures. Sequential approach Parallel approach In a sequential approach, physician order s tests and/or procedures one at a time. The results are obtained and analyze d and its affect on the process of diagnosis documented. There is a possibi lity of confirmation of the diagnostic hypothesis by the test/ pr ocedure ordered by the physici an. If not, the physician may order more tests. The approac h has following advantages and disadvantages. Advantages: Avoidance of unnecessary test and/or procedures saves dollars The patient can be shielded from the risk and discomfort cause by the test and/or procedure Disadvantages: The physician needs to select an appropriate choice among the given alternatives sequence, lack of which could lead to missed diagnosis The time elapsed between the tests may not be good for the patient Prolonged stay in the health care in stitution before the whole process is completed


9 In parallel testing, all the tests ar e ordered at once and then from the data obtained, the physician deduces the most likely diagnosis. This method is more costly as the knowledge obtained from the tests ma y be redundant. It may also put strain on the institutions resources. Let us look at the example of meni ngoencephalitis, a neur ological disease that affects dura sheet, a covering on the brain, which causes severe inflammation in dura [3]. O ne of the most important problems occurring when dealing with meningoencephalitis is to ma ke a differential diagnosis between bacterial and viral infections as soon as possible. The treatments for the above infections are independent and the bacterial infection treatment is not effective for viral infection. Thus, immediate di agnosis is crucial requiring appropriate choice of laboratory tests. In such a situation, sequential testing may not be a good choice as valuable time may be wasted before the ph ysician can reach diagnosis and star t the treatment. Another example is pulmonary embolis m, which is often caused by deep vein thrombosis [4]. A pulmonary embolism is a blockage in an artery in the lungs caused by an embolus (a free-floating blo od clot) that travels through the blood vessels (usually from a vein in a leg or in the pelvic area) to the lungs. Pulmonary embolism causes damage to lung tissue, disrupts proper f unctioning of the damaged lung, and can cause death. Various tests are available for its diagnosis such as angiography, ventilation-perfusion nuclear medicine scans, contrast and radionuclide venography, sonography and doppler sonography, computed tomography, etc. All the tests are ex pensive and most of them have high


10 morbidity and mortality associated with the procedure especia lly angiography which is considered to be the gold standar d for diagnosis and still not a preferred choice. Thus, parallel testing would be a bad choice in this case and the physician as well as the patient would per haps prefer a sequent ial diagnosis with minimal number of tests. 1.5 Problem and research approach As seen in the examples given in the previous section, both testing strategies have pros and cons. The best strategy would be to combine both techniques, that is, to sequent ially order a set of test s customized for each case. This research attempts to develop an approach by combining Rough Set theory, Utility Theory, Markov Decision Process (MDP) and Reinforcement Learning (RL) in order to optimize the diagnostic process. Rough set theory would be used to mine medical dataset consisting of old cases to obtain decision rules. These rules would be used to determine the set of tests to be conducted on a patie nt for the given medical st ate. It is apparent that for a given state there could be more than one option and the choice would be dependent on the cost function obtained using utility theory. This cost function would account for the dollars spent for conducting the test and the diagnostic performance of the test. The results obt ained by the application of these two theories would serve as an input to MD P framework, used to model the given diagnostic process. An RL algorithm is developed to efficiently solve the MDP model and derive testing strategies.


11 1.6 Existing computer sy stems in diagnostic process Various computer systems have been developed and are further enhanced to assist clinicians in the process of diagnosis, typically with the intention that it would be used during a clinical encounter with a patient. Diagnostic process can benefits from thes e systems which can assist doctors to analyze medical data in improving diagnos tic accuracy. Dxplain and Iliad are couple of such systems in routine use. Dxplain, an example of clinical de cision support systems, was developed at the Massachusetts General Hospital (Bar nett et al., 1987). It is used to assist in the process of diagnosis, taking a set of clinical findings including signs, symptoms, and laboratory data and then produces a ranked list of diagnoses. DXplain is in routine use at a number of hospitals and medical schools, mostly for clinical education purposes, but is also available for clinical consultation. Iliad has been under development at t he University of Utah, School of Medicine's Dept. of Medical Informatics, for several years. Iliad uses Bayesian reasoning to calculate the posterior pr obabilities of various diagnoses under consideration, given the findings present in a case Iliad which was developed primarily for diagnosis in Internal Medicine is currently used as a teaching tool for medical students As more and more hospitals take adv antage of the power of computers and begin to store medical data in digital formats, computer based technologies can provide useful aids to assist the physi cian in the diagnosis of many diseases


12 1.7 Outline of the thesis This chapter introduces various aspects of decision making as related to the process of medical diagnosis. It also discusses some of the factors affecting the diagnostic process such as healthca re economics, defensive medicine, etc. Chapter 2 gives an overview of vari ous data mining techniques and their application in the medical domain Applications of Rough Set theory in medical diagnosis are also discussed. In c hapter 3, the problem is revisited and the research objectives and the research objectives outlined. Also, various assumptions and inputs to the resear ch are enumerated. Chapter 4 would involve the actual model building proc ess and provide the solution methodology for the problem. Chapter 5 will provide im plementation details for the solution methodology and results would be pres ented in chapter 6. And finally, conclusions and future research direct ions would be provided in chapter 7.


13 CHAPTER 2 LITERATURE REVIEW 2.1 Data mining in medicine Medical informatics has become an int egral part of successful medical institution. Modern hospitals and health ca re institutions are well equipped with monitoring and other data collection devices which provide relatively inexpensive means to collect and store data in inte r and intra-hospital information systems. The increase in data volume causes gr eat difficulties in extracting useful information for decision support particularly when data analysis is conducted manually. To overcome this problem, medical informatics use the technologies developed in the new interdisciplinary fi eld of knowledge discovery in databases (KDD) or data mining, encompassing statis tical, pattern re cognition, machine learning, and visualization tools to suppor t the analysis of data and the discovery of regularities that ar e encoded within the data. KDD typically consisting of the following steps [5]: understanding the domain, forming the dataset and cleaning the data extracting regularities hidden in t he data and formulating knowledge in the form of patterns or rules which is the meat of the overa ll KDD process and is usually referred to as data mining post-processing of discovered kno wledge and exploiting the results.


14 The knowledge obtained from the KDD process can be used to directly or indirectly enhance the provision of healthcare For example, in clinical settings it can act as an intelligent assistant that tries to bridge the gap between data collection and data compr ehension and help physician perform his/her task more efficiently and effectively. A physician, with the right information at his disposal at the right time, would be in a better positi on to reach correct decisions or perform correct actions within the given time cons traint. Raw data as such are of little value since their sheer volume and/or the very specific level at which they are expressed make their utilization in the c ontext of problem solving impossible. But, a good data mining procedure on this very data set can convert it to a wealth of information that can be used to support decision making in patient management. 2.2 History of data mining in medical decision making Even though data mining has gained popula rity in recent years, its underlying algorithms have been used since early eighties, particularly in the advancement of medical expert systems. In the late seventys and early eighties, the development of medica l expert systems was mainly focused at supporting diagnostic decision making in speciali zed medical domain. This work was pioneered by MYCIN [6], an expert system developed at Stanford in 1970, whose job was to diagnose and reco mmend treatment for certain blood infections. It was followed by numerous other efforts leading to specialized diagnostic and prognostic systems, many of which were developed for supporting diagnosis in internal medicine [7, 8]. However, duri ng this early stage of expert system


15 research, problems such as knowledge ac quisition, knowledge representation, reasoning and explanation were encountered. From the early days of expert systems, rules were proposed as a prime formalism for expressing knowledge in a symbolic way. They have undisputed advantages of simplicity, uniformity, tr ansparency and ease of inference, that over the years have made them one of the most widely adopted approaches for representing real world k nowledge. Rules elicited from domain experts are expressed at the right level of abstraction from the persp ective of the expert, and are comprehensible to the expert since they are formulations of rules of thumb. However, human-defined rules risk capt uring the biases of one expert, and although each rule individually may appear to form a coherent, modular chunk of knowledge, the analysis of rules as an integr al whole can reveal inconsistencies, gaps, and other deficiencies. Thus, knowledge acquisition became t he hardest part of the expert system development task and was identified as Feigenbaum bottleneck [9] in the construction of a knowledge base. For the effective use of expert system technology a knowledge base needs to be consistent and as complete as possible, throughout its deployment and to at tain these desirable characteristics, manual knowledge maintenance should be fa cilitated and the system should be able to evolve on the basis of its problem solving experience. The limitations of the first generation of expert systems coupl ed with the relatively high costs (in human and other terms) involved in acqui ring knowledge directly from the experts, as well as the fact that dat abases of example cases started becoming


16 readily available, made the learning of rules from such data especially appealing as a more efficient, less biased, and more cost-effective approach. In the late eighties and early nine ties it became apparent that knowledge acquired from experts alone is unsuitabl e for solving difficult problems, when developing decision support systems. The analysis of data gathered in the daily practice of experts and stored system atically in databases could play an important role to support decision making. This led to the development of early machine learning algorithms aimed at t he automatic extraction of rules or decision trees from data. Early machine learning systems, deali ng with real-world data which may be erroneous (noisy) and incomplete, include CART, Quinlans extensions to ID3, ASSISTANT, A Q, and CN2. The C4.5 system [10] is an efficient and probably the most popular mach ine learning system of the nineties. Even though rules were extracted us ing machine learning algorithms, experts were actively involved, but in a different and more constructive way. They provided example cases, validated rules for comprehensibility and desired quality and ensured that the system provi ded adequate coverage for a particular domain. Furthermore, the expert provides important background knowledge for focusing and guiding the learning of rules. Irrespective of whether rules were learned from data or directly acquired fr om domain experts attempts were made to keep them simple, intuitive and adequatel y expressive for th e purpose of the particular application. The nineties were characterized by massive data storage and there was a need to interpret and understand this data. This led to challenging new areas of


17 KDD and data mining where a major role was played by areas such as machine learning, statistics and mathematics with the sole aim of extracting knowledge from raw data. 2.3 Application of different data mining techniques in medicine Large collections of medi cal data are a valuable resource from which potentially new and useful knowledge c an be discovered through data mining. The available datasets are often incomplete (missing data) and noisy (erroneous). The methods for extrac ting meaningful and understandable symbolic knowledge are referred to as data mining methods. These methods are particularly used to gain insight into the relationships and patterns hidden in the data. Of particular value to medi cine is the requested accuracy and interpretability of the results of data mini ng. The interpretability may be achieved by graphical representation of the data mi ning results or symbolically expressed rules or relationships. To increase the chances of getting useful and interpretable results, data mining can benefit from medical ex perts who may specify additional (background) knowledge, interact wit h the mining process, and evaluate its results. Only the accurate patterns and re lationships that are expressed at the right level of abstraction in the voc abulary used by medical experts may be of use for a practitioner who will decide whet her to adopt and utilize the extracted knowledge in daily decision making. The quality assessment of data mining methods is based both on the performance (classification and prediction


18 accuracy, misclassification cost, sensitivit y, specificity, etc.), as well as the understandability and significance of the discovered knowledge. Various data mining techniques are used to extract medical knowledge aimed at supporting diagnosis, screening, pr ognosis, monitoring, therapy support or overall patient managemen t. Most frequently applied me thods in this context are supervised symbolic machine learning methods. For example, effective tools for inductive learning exist that can be used to generate understandable diagnostic and prognostic rule s. Symbolic clustering, discovery of concept hierarchies, qualitative model discovery and learning of probabilistic causal networks fit in this framework as we ll. Sub-symbolic learning and case-based reasoning methods can also be classified in the data mining category. Other frequently applied sub-symbolic methods are the nearest neighbor method, Bayesian classifier, and (non-symbolic) clustering. Rule and tree induction are most ly concerned with the analysis of classificatory properties of data tables. Data represented in the tables may be collected from measurement s or acquired from exper ts. Rows in the table correspond to objects (training exampl es) to be analyzed in terms of their properties (attributes) and the class (concept ) to which they belong. In a medical setting, a concept of interest could be the set of patients with a certain disease or outcome. Supervised learning assumes t hat training examples are classified whereas unsupervised learning concerns the analysis of unclassified examples.


19 2.3.1 Rule induction If then rules These rules are generated by the rule induction system from the given set of classified examples The rules have the form If conditions then conclusion where conditions part contain one or mo re attribute tests of the form Ai = vik for discrete attributes and Ai < value of Ai > values for continuous attributes. The conclusion part has the form C = ci, assigning a particular value ci to the class C. An example is covered by a rule if the attribute values of the example fulfill the conditions in the IF part of the rule. T he following is an example of a generalized rule used for early diagnosis of rheumatic disease. Figure 4. Rule for early diagnosis of rheumatic disease induced by CN2 If-then rule induction was studied by Michalski [11], and implemented in a series of AQ algorithms, for example, the AQ15 system which was also applied for the analysis of medical data. Like AQ algorithms, rule induction system CN2, which is among the best known of if-then ru le learners, also uses the covering approach to construct a set of rules for each possible class ci. The covering approach works as follows: CN2 constructs a rule that correctly classifies some examples, removes the positive examples co vered by the rule from the training


20 set and repeats the process until no more examples remain. To construct a single rule that classifies examples into class ci, CN2 starts with a rule with an empty antecedent (IF part) and the selected class ci as a consequent (THEN part). The antecedent of this rule is satisf ied by all examples in the training set, and not only those of the se lected class. CN2 then progressively refines the antecedent by adding conditions to i t, until only examples of class ci satisfy the antecedent. To allow for the handling imperfe ct data, CN2 may construct a set of rules which is imprecise, i.e., does not cl assify all examples in the training set correctly. Rough sets If-then rules can be also induced by using the theory of rough sets introduced by Pawlak [12, 13]. Rough sets (RS) are concerned with the analysis of classificatory properties of data aim ed at approximations of concepts. RS can be used both for supervised and unsupervised learning. In rough set theory, an information system consists of a data set represented as a table, where each row r epresents a case, an event or simply an object. Every column represents an attribut e that can be measured for an object. More formally, it is a pair A = (U, A ) where U is a non-empty finite set of objects called the universe and A is a non-empty set of attributes such that V U : aa for every a A Each object x U is assumed to be described by a subset of attributes B, B A. The basic concept of RS is an i ndiscernibility relation. Two objects x and y are indiscernible on the basis of the availa ble attribute subset B if they have the


21 same values of attributes B. It is usually assumed that this relation is reflexive, symmetric and transitive. The set of objects indiscernible from x using attributes B forms an equivalence class and is denoted by [x]B. There are extensions of RS theory that do not require transitivity to hold. Let XU, and let IndB(X) denote a set of equivalence classes of examples that are indiscernible, i.e., a se t of subsets of exam ples that cannot be distinguished on the basis of attributes in B. The subset of attributes B is sufficient for classifica tion if for every [x]B IndB(X) all the examples in [x]B belong to the same decision class. In this case crisp definitions of classes can be induced; otherwise, only rough concept definitions can be induced since some examples can not be decisively classified. The main goal of RS analysis is synthesis of approximations of concepts ci. Let X consists of training examples of class ci. X may be approximated using only the information contained in B by constructing the Blower and B-upper approximations of X, denoted X B and B X respectively, where B X = { x | x X, [x]B X} and X B = {x | x X, [x]B X 0 }. On the basis of the knowle dge in B the objects in B X can be classified with certainty as members of X, while the objects in X B can be only classified as possible members of X. The set BNB(X) = X B B X is called the B-boundary region of X thus consisting of those objec ts that on the basis of the knowledge in B cannot be unambiguously classified into X or its complement. The set U X B is called the B-outside regi on of X and consists of those objects which can be with certainty classified as not belongi ng to X. A set is said to be rough


22 (respectively crisp) if the boundary region is non-empty (respectively empty). The boundary region consists of examples that are indiscernible fr om some examples in X and therefore can not be decisively classified into ci; this region consists of the union of equivalence classes each of wh ich contains some examples from X and some examples not in X. A main task of RS analysis is to find minimal subsets of attributes that preserve the indiscernibility relation. This is called reduct computation. Decision rules are generated from reducts by reading off the values of the attributes in each reduct. The main challenge in i nducing rules lies in determining which attributes should be included in the condition al part of the rule. Rules synthesized from the (standard) reducts will usually result in large sets of rules and are likely to over-fit the patterns of interest. Inst ead of standard reducts, at tribute sets that almost preserve the indiscernibilit y relation are generated. Good results have been obtained with dynamic reducts [14] th at use a combination of reduct computation and statistical resampling. Al so, several software tools exists for data analysis using rough sets e.g., the Rosetta system, Grobin, Rough Set Library (RSL), etc. Rough sets have significant number of applications in medicine domain. Its list of applications includes extrac tion of diagnostic rules, image analysis and classification of histological pictures modeling set residuals, EEG signal analysis and many others. As Rough sets plays an im portant role in this thesis, more elaborate discussion on rough set methodol ogy and its applications in medical diagnosis is provided later in the chapter.


23 Association rules Association rules have recently rece ived a lot of attention in the data mining community for its applications in market-basket analysis. The problem of inducing association rules [15] is defined as follows: Given a set of transactions, where each transaction is a set of items (i.e., literals of t he form Attribute = value), an association rule is an expres sion of the form X Y where X and Y are sets of items. The intuitive meaning of such a rule is that transactions in a database which contain X tend to contain Y. An example of such a rule is: % of patients with pneumonia also have high fever. 10% of all transactions cont ain both of these it ems. Here 80% is called confidence of the rule, and 10% support of the rule. Confi dence of the rule is calculated as the ratio of the number of records having true values for all items in X and Y to the number of records having true values for all items in X. Support of the rule is the ratio of the number of records having true values for all items in X and Y to the number of all records in the database. The problem is to find all association rules that satisfy mi nimum support and minimum confidence constraints. All association rule learners approach the problem similarly. First all itemsets that satisfy th e minimum support level, are generated using different algorithms for traversing the lattice of s ubsets. In this way all frequent itemsets are discovered. In the se cond phase association rules are constructed from frequent sets without reconsulting the data (which makes the process efficient): all frequent itemsets are combined with eac h other to produce all possible rules


24 satisfying the minimum confidence constrai nt. If itemsets A and B can be found, such that A is a proper subset of B and the ratio of frequency of B to frequency of A is above the specified minimum, then a rule of the form A C is generated, where C is a set of items in B but not in A. Association rule learning was applied in medicine, for example, to identify new and interesting patterns in surveilla nce data as in the analysis of the Pseudomonas aeruginosa infect ion control data [16]. 2.3.2 Decision trees Decision Trees are versatile tools fo r data mining, borrowed from AI and statistics. A decisio n tree has a flowchart like tree st ructure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node represents a cla ss or class distribution. The basic algorithm for the decision tree induction is a greedy algorithm that constructs it in a topdown recursive divide and conquer manner. It involves recursive partitioning of the data used to build t he tree model. The algor ithm tries breaking up the data, using every possible split on ev ery field. It chooses the split that partitions the data into parts that ar e purer than the original. The process continues until no more useful splits can be found. An early decision tree learner, ASSISTANT, that was developed specifically to deal with the particular characteristics of medical datasets, supports the handling of incompletely specified training examples (missing attribute values), binarization of cont inuous attributes, binary construction of decision trees, pruning of unreliable parts of the tree, etc. Recent


25 implementations of the ASSISTANT algorithm include ASSISTANT-R and ASSISTANT-R2. Instead of the standar d informative search heuristic, ASSISTANT-R employs ReliefF as a heuristi c for attribute selection [17]. This heuristic is an extension of RELIEF [ 17, 18], a non-myopic heuristic measure, capable of estimating quality of attributes in the absence of strong conditional dependencies. In addition, it improves the performance of the algorithm by using m-estimates of probabilities instead of relative frequencies. ASSISTANT-R2 is a variant of ASSISTANT-R that generates one decision tree for each class instead of a single tree for the whol e domain, when classifying a new instance all trees are tried. If several trees classify t he instance into its corresponding class the most probable class is selected. If none of the trees `fires', the general tree for all the diagnoses generated by ASSISTANT-R is used. The best known decision tree learner is C4.5 (C5.0 is its recent upgrade) which is widely used and has been inco rporated into commercial data mining tools (e.g. Clementine and Kepler). The system is well maintained and documented, reliable, efficient and capabl e of dealing with large numbers of training examples. As such, it is cons idered to be one of the best data mining tools among those developed by t he machine learning community. 2.3.3 Inductive logic programming Inductive logic programming systems lear n relational concept descriptions from relational data. In ILP, induced ru les typically have the form of Prolog clauses. A rule for ocular fundus image cl assification for glaucoma diagnosis, as induced by an ILP system GKS [19] s pecially designed to deal with low-level


26 measurement data including images, is gi ven to illustrate the output of an ILP algorithm. Figure 5. A prolog clause induced by GKS Compared to rules induced by a rule learning algorithm of the form IF Conditions THEN Conclusion, Pr olog rules have the form Conclusion Conditions. The rule in Figure 5, for ex ample, means that the segment of image is classified as undermining (i.e., not normal) if the conditions of the right-hand side of the clause are fulfilled. ILP has also been successfully applied to carcinogenesis prediction in the predictiv e toxicology evaluation challenge [20]. 2.3.4 Neural networks An artificial neural network (ANN) is an information processing paradigm that was inspired by the way biological nervous systems, such as the brain, process information. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems and has been successfully applied to various ar eas of medicine, such as diagnostic aides, biochemical analysis, im age analysis, and drug development. Artificial neural networks ma y be used for both supervised and unsupervised learning.


27 Supervised learning Among different neural network par adigms and those available for supervised learning, feed-forward multila yered neural networ ks [21] are most frequently used for modeling medical data. They are computational structures consisting of interconnected processing elements (PE) or nodes arranged on a multilayered hierarchical architecture. In general, PE computes the weighted sum of its inputs and filters it through some sigmoid function to obtain the output. A typical architecture of multi-laye red neural network comprising an input, hidden, and output layer of nodes is giv en in Figure 6. The number of nodes in the input and output layer are domaindependent and related to the number and type of attributes Figure 6. Architecture of multi-layered neural network and a type of classification task. For ex ample, for a two-class classification problem, a neural net may have two output s PEs, each modeling the probability of a distinct class.


28 Weights are associated with each node and their values are determined from training instances. The most popul ar learning algorithm for supervised learning is backpropagation [21]. Backpr opagation initially sets the weights to some arbitrary value, and then considering one or several training instances at a time, adjust the weights so that the e rror (the difference between the expected and the obtained value of nodes at output leve l) is minimized. Such a training step is repeated until the overall classifica tion error across all of the training instances falls below some specified threshold. One of the major drawbacks of usi ng neural net models is the lack of explanation capabilities: the induced we ights together with the networks architecture do not usually have an obvious interpretation and it is usually difficult or even impossible to explain why a certain decision was reached. In-spite of this limitation, multi-layered neur al networks often have equal or superior predictive accuracy when compared to symbolic learne rs or statistical approaches. They have been extensively used to model medi cal data. Example applications areas include survival analysis, clinical medi cine, pathology and laboratory medicine, molecular sequence analysis, pneumonia ri sk assessment, and prostate cancer survival. Different types of neural net works for supervised learning include the Hopfiled recurrent network, used in tu mor boundary detection [22] and neural networks based on adaptive resonance theo ry mapping (ARTMAP). Applications of ARTMAP in medicine include classifica tion of cardiac arrhythmias [23] and treatment selection for schizophrenic and unipolar depressed in-patients [24].


29 Unsupervised learning Unsupervised learning involves identif ication of groups of instances with similar attribute values from an unclassi fied dataset. The most frequently used neural network approach to unsupervised l earning is Kohonens self organizing maps (SOM) [25]. Typically, SOM consist of a single layer of output nodes. An output node is fully connected with nodes at the input layer. Each such link has an associated weight. There are no explicit connections between nodes of output layer. The learning algorithm initially sets t he weights to some ar bitrary value. At each learning step, an instance is present ed to the network, and a winning output node is chosen based on instances attribut e values and nodes present weights. The weights of the winning node and of the topologic ally neighboring nodes are then updated according to their present we ights and instance attribute values. The learning results in internal organizati on of SOM, such that when two similar instances are presented, yield a similar pattern of networks output node values. Hence, the data analysis based on SOM may be additionally supported by the proper visualization methods that show how the patterns of output nodes depend on input data. As such, SOM may not onl y be used to identify similar instances, but can, for example, also help to detec t and analyze time changes of input data. Example applications of SOM include analysis of ophthalmic field data, classification of lung sounds, clinical gait analysis, and analysis of molecular similarity.


30 2.4 Application of rough set s in medical diagnosis Various applications of rough set theory have been found in data mining and rough set literature and to a limited exte nd the medical informatics literature. The applications are broadly classified into two categories, diagnosis and outcome prediction and feat ure selection with an overwhelming majority of papers falling into the former. Rough sets are used for diagnosis and outcome prediction by synthesizing IF-THEN rules. Pawlak [12,13] introduced rough set theory in the early 1980s as a tool for representing and reasoning about imprecise or uncertain information. As the methodology has matured, several interesting applicat ions of the theory have surfaced in the medical domain. Fibak et al. [26] applied rough sets to analyze a database of patients with duodenal ulcer treated by highly select ive vagotomy (HSV) at a hospital in Poznan, Poland. From preoperative informat ion, the goal was to predict the longterm success of the operation, as evaluated by a surgeon, into four outcome classes. The analysis was done with discretiz ation being carried out using expert medical knowledge, and a semi-manual search for minimal approximate discerning attribute sets. As a result, descriptive models of each outcome class could be formulated in natural language, and a small set of decision rules was also synthesized. The HSV study by Fibak et al. is one of few data analysis studies, regardless of methodology, that has managed to cross the clinical deployment barrier. Slowinski [27] repor ts that the developed prediction models were consulted for screening a group of subsequent candidates for surgery, with


31 the result of the group accepted for su rgery having a clearly more advantageous distribution among the outcome classes than the group from which the models were originally derived. Other papers per taining to the HSV study include [28, 29]. Vinterbo et al. [30] report on an exper iment where a rough set predictor for myocardial infarction was synthesized using data collected at an emergency room in Edinburgh, Scotland, and evaluated on data collected at another emergency room in Sheffield, England. T he behavior and stability of classifiers across site boundaries is of great prac tical interest, and this issue had been previously investigated by others [31, 32] using other methods on the same sets of data. Both these studies report ed performance losses when crossing site boundaries, while Vinterbo et al. concludes that the rough set predictor seems to generalize well across sites. Wojcik and Ziarko [33] used a rough set approach to analyze a database on women with breast cancer, and induc ed prognostic rules for determining short-term and long-term follow-up surviv al. The rules are subsequently analyzed and interpreted from a clinical perspective, and although in general the rules were found to support well-known estab lished facts about factors influencing survival, new insight was reported gained on the impact some medical and social factors may have, on long-term survival patterns. Woolery and Grzymala-Busse [34] r eported on the development of a prototype expert system for assessing pret erm birth risk. The goal of the system was to predict preterm deliveries, using a knowledge-base of rules extracted from


32 empirical data by means of a machi ne learning system bas ed on rough sets. The prototype expert system was verified to be more accurate than traditional manual techniques in predict ing preterm birth. Some of the other medical applicat ions of rough sets for predictive modeling as reported in the literature includes di agnosis of progressive encephalopathy in children [35], diagn osis of acute appendicitis [36,37] 2.5 Different methods to optimize the diagnostic process Although the bulk of the research in the eighties and early nineties were focused on decision making related to medical diagnosis, efforts were made to optimize the diagnostic proce ss. The objective is to reduce excessive use of diagnostic tests and/or procedures to conserve healthcare resources and to protect the patient from unnecessary discomfort. The methods range from selecting the most appropriate tests customized for an individual case to optimizing the process for the institution as a whole. The analytical hierarchy process (A HP), a multiple criterion decision making method was used to assist physici ans to sequentially select the most appropriate tests for medical diagnosis [38]. It uses multilevel structure of elements to address large complex pr oblems in a progressive manner. Qualitative preferences and quantitative in formation are weighted with respect to each other at the level directly above using methods of sequential pair-wise comparison. To determine the next optimal step, the information gained by the previous phase is evaluated using Baye sian analysis and used by the AHP. The process of alternating AHP with Bayesian analysis could be retreated as often as


33 necessary for the required number of in formed choices. Some other examples are; use of modified Bayes formula met hod analogous to logistic regression to analyze the 16 test combinations for the diagnosis of Hepatolenticular Degeneration or Wilsons disease [38] and application of Rough sets and Boolean reasoning to select test and procedures for dia gnosis of coronary artery disease(reference to the paper). On the other hand, evolutionary al gorithms are used to improve the efficiency of healthcare institutions. DIAPRO Diagnostic Process Optimization Framework was introduced by Vili Podgorelec [1] to optimize the diagnostic process in a hospital setting. This integrated computer ized environment performed various activities including in dividualization of the examinations, optimization of the num ber of examinations, scheduling of examinations, performance of preventive maintenance programs for various equipment and maximization of accuracy, specific ity and sensitivity of diagnosis.


34 CHAPTER 3 RESEARCH OBJECTIVES In this chapter, the problem of optimiz ing the process of medical diagnosis is revisited in Section 3.1. The research approach is given in Section 3.2 and the assumptions along with their justificat ions are presented in Section 3.3. 3.1 Problem revisited The problem considered in this res earch is optimizatio n of the medical diagnostic process individualized for ever y patient. A physician, after the first interaction with the patient, formulates one or more hypothesis for the patients condition and then takes steps to evaluat e the most viable hypothesis. Such an evaluation may require ordering of one or more tests and/or procedure based on the initial information consisting of visibl e symptoms and physica l examination. If tests are ordered, the result s may provide enough information for the physician to reach diagnosis. If not, further testing may be required. Several factors affect the physicians decision to order test(s) such as actual cost, associated risk and discomfort, time elapsed in the testing process and above all the diagnostic utility of the test. This research is focus ed on developing a decision aid that based on patient parameters (both observed and test ed) recommends test(s) in order to improve the medical diagnosis process by optimizing a utility function that takes into account factors such as cost, risk, discomfort, time to reach diagnosis, and the diagnostic ability.


35 3.2 Research approach The research approach is as follows. To mine data for the given medica l domain using rough set methodology and elicit decision rules to define the action set fo r the implementation of the reinforcement learning algorithm. To formulate the problem of optimiz ing the process of medical diagnosis as a Markov decision problem (MDP), and to use a simulation based reinforcement learning al gorithm for the solution. To develop a detailed implementation st rategy for the optimization process and its use for a real world scenario. 3.3 Parameters and assumptions 3.3.1 Data availability for rule generation The data, as required for the rule generation, is assumed to be available. Also, the information, whether explicit or hidden, about the problem is assumed to exist in the set of cases for the given domain. These cases are previous occurrences of the disor der that have been diagnosed. 3.3.2 Final diagnosis It is assumed that a decision is always reached at the end of the diagnostic process. The decision could be a definite diagnosis of the problem or to discard the current hypothesis for an alte rnative hypothesis. It is also assumed that the methodology confirms one parti cular hypothesis, and does not provide alternative diagnosis.


36 3.3.3 Cost function The cost of taking an action is the combined cost of conducting test(s) and/or procedures as required by the action. Calculating the cost of each test requires various parameters such as the ac tual cost of the test, the cost of dealing with morbidity and mortality associ ated with the tests and probability of various tests out comes. It is assumed that the above paramete rs are available. 3.3.4 Test values fr om the distribution The results of the tests and/or procedures, conducted as an action selected for a state, are obtained from the dens ity functions elicited from the data. It is assumed that the conditi onal dependence between two ore more tests and/or procedures could be provided by domain experts. However, in this research, the test outcomes are consi dered to be independent of each other in order to simplify the exposition of the methodology. 3.3.5 Patient arrival Each patient is assigned an initial st ate based on the data collected from the patient on arrival


37 CHAPTER 4 SOLUTION METHODOLOGY 4.1 Overview of th e solution methodology In this chapter, the problem of opt imizing the process of diagnosis is modeled as a Markov Decision Process (MDP). Figure 7 gives an overview of the solution approach which can be divided in to three distinct tasks. The first task involves mining medical data for the given domain using rough sets to generate rules. These rules are used to Figure 7. Overview of the solution methodology determine the action set. The second task involves determination of the utility value for each action as a function of ac tual cost and diagnosti c performance of Generation of rules using rough sets to define action sets Determination of utility for every action Learning (near) optimal diagnostic strategies using simulation and RL


38 the test(s) associated with the action. The third task involves simulation based optimization of the deci sion process using reinforcement learning (RL). 4.2 Model formulation (MDP) Let the system consist of N test and/or diagnostic procedures. Define the test set as N = {1, 2,, N} The state of the diagnosis process at any decision epoch m can be given as a vector Xm = ( 1, 2, m, 1, 2, m, D), where 1, 2, m denote the subset of tests conducted at the decision epochs such that ]iN for all I = 1,2,.,m. The elements 1, 2 m denote the test results corresponding to 1, 2, m respectively. The last element of the patient state vector (D) contains the diagnostic decision, which could be binary or a multi-valued variable. The change in t he state of the diagnostic process occurs when either a new patient arri ves or the tests results from the most recent action are obtained. The state change epochs ar e the decisions epochs. It is assumed that the time between any two decision ep ochs is exponentially distributed with parameter dependent on the system state at the first of those two epochs. Define X = {Xm : mG} as the system state proce ss, where G is the set of natural numbers. It c an be easily shown that P[Xm + 1 = j | X0,, Xm ] = P[Xm + 1 = j | Xm ]. The above equation indicates that the stochastic process X is a Markov process. Let Dm denote the decision (action) chos en at the decision epoch, where Dm is the subset of the test set N Also, let Dxm denote the availabl e action choices at state Xm. Then we have that D Dxm. Define a decision process D as


39 D = {Dm : mG}. Then the combined process (X, D) is a Markov decision process. 4.3 Actions from rough set methodology The synopsis of rough set theory pres ented below is adopt ed in part from [14]. An information system is a pair A = (U, A ), where U is a nonempty finite set of objects and A is a nonempty finite set of attributes. For each attribute a A there exists a mapping a: U Va where Va is the set of values for attribute a Examples of U are medical cases, patients and observations. Some examples of the attributes are features, variables, te sts, and procedures. A decision system is a special type of inform ation system of the form A = (U, A{d}) where d A is the decision attribute. The decision attribute can be inte rpreted as classifier on the object set U and is provided by a decision maker or an expert such as a physician. Decision systems are called training sets of examples in machine learning. Let V = { Va : a A } and B A Atomic formula over the sets B and V are expressed in the form a = v where a B and v Va. The set F ( B V ) is the least set containing all the atomic formulae over B and V closed with respect to the propositional connectives (conjunction), V ( disjunction) and (negation). The set F ( B V ) is called the conditional formulae of A The rules obtained using rough sets are ex pressed in the form d = vd, where F ( B V ) and vdVd The elements of a decision rule and d = vd are referred to as the predecessor and successor. The predecessor or the conditional part of the decision rule consist of one or more conditional formulae and are used to define actions for a given state in the system.


40 We can express the system state as ( Fx, D ) where FxF ( B V ) and D the decision variable. Let BxB be the attribute types pr esent in the system state and BrB be the attribute types present in the predecessor of a decision rule. The actions for a give stat e are defined as follows. The attribute set Bx is individually compared with Br for all the rules present in the rule set. Specifically, for each rule in the rule set, the attribute values for all aBxBr are compared. Ifr xa av v then the rule is discarded else an action, which consist of attribute set Br\ Bx is added to the action set of the given state. The decision states are identified using similar technique as follows. For any patient state, if t here exists a rule in t he rule set such that r xa av v for all axBx and arBr, where Br Bx, then the given patient stat e is a decision state. 4.4 An utility theory based cost estimation The cost of taking an action is the combined cost of each test and/or procedure included in the action. The per formance measure of each test or procedure as used in this research is a measure of actual amount of money spent per unit of diagnostic performance and will be referred to as effective cost. Effective cost can be given as [39] Utility Diagnostic Cost Direct cost Effective 4.4.1 Direct cost The direct cost of a test is combined cost of conducting the test, the cost of dealing with morbidity and mortality associated with some diagnostic tests. The cost of conducting the test is the act ual list price for the test. The cost of


41 morbidity resulting from a diagnostic test can be estimated as CjPj where Cj is the cost of managing the adverse condition j, and Pj is the probability of that condition resulting from the proc edure. The cost of mortality (Cmt) can be considered the average payoff value of a ty pical life insurance policy or the cost of a wrongful-death suit times the probability (Pmt) of mortality resulting from the test. Thus, the direct cost can be expressed as [39] Direct Cost = cost of the test + CjPj + CmtPmt. 4.4.2 Diagnostic utility Any diagnostic test has four traditional outcomes: true positive (TP), true negative (TN), false positive (FP), false negat ive (FN). A utility is associated with each outcome of the test. The value ranges between -1 (the worst possible outcome) and +1 (the best outcome). The diagnostic utility represents the probability weighted sum of all the utilit ies of all possible outcomes of a diagnostic test and can be expressed according to [39] as follows: DU = U(TP) P(TP) + U(TN) P(TN) + U(FP) P(FP) + U(FN) P(FN) where, U(x) = utility of an outcome and P( x) = probability of the outcome. 4.5 RL algorithm Define trajectory as a path along wh ich a patient traverses during the diagnostic process. It begins with a new patient arrival and ends when a diagnosis is reached. Let S = (x1, x2, xn) denote the set of states visited in a trajectory, where xn, represents the patient state at the nth decision epoch of the trajectory. Let = (1, 2, .., n) denote the set of actions in states x1, x2, xn


42 of S and C = (c1(x1, 1), c2(x2, 2), cn(xn, n)) denote the set of immediate costs at the decision epochs within a trajectory. Also, let m, T and maxruns denote the indices for count of decision epoc hs over all trajecto ries, the count of total number of trajectories, and the maximu m number of trajecto ries that are to be simulated during the learning process respectively. The patient state space (E) is initialized with starting states only, and is appended with new states that are visited along the trajectories. For each starting state and the new states appended to E, corresponding Q values are also initialized as Q (j, ) = 0, where j and j is the set of actions available in state j Initialize sets S, C as empty sets. Choose appropriate values of0 ,0 ,0 , , which are initial values and the decay param eters for the learning ( and ) and exploration ( ) rates. Define i, f, mt and indcost as temporary variables. Set i = 0, f = 0, n = 1, m = 0, mt = 0, T = 1, =0 and = 0 and begin the learning process. 1. Simulate a new patient arrival by randomly choosing an initial state. 2. If the current state has been previously visited, then go to Step 3, else Identify the set of possible actions for the current state by comparing information present in the state vect or with individual rules present in the rules set. Append the Q array with all combinat ions of current state and actions. 3. Update as


43 1 0 u where m m u 2 With probability (1 ), choose an action n nx for which Q (xn, n) is minimum; otherwise choose a r andom action from the set {nx \ n}. 4. Obtain results for the te st(s) included in the action n. 5. Append the current state ( xn), action chosen (n), and the immediate cost cn(xn, n) as the nth element of the vectors S, and C respectively. 6. Construct the next state by appending t he test types and their results to the current state vector. Also, check if the next state indicates that a diagnosis is reached. 7. If diagnosis has been reached, then Obtain cost per action withi n the trajectory (denoted as indcost ) as n c c c ( ( indcost))) 2 n ( n) ( 3 2 1 1 n k kC, where the quadratic function ab ove is used to access the cost of time ( n ) needed to reach diagnosis; c1, c2 and c3 are constants that should be chosen suitably. Update average effective cost and the Qvalues of the current state as: for k = 1 to n


44 o ) ( ) ( ) ( ) 1 ( ) (1 1 min1k k x b f k k k k old k k newb x Q x c x Q x Qk o 1 indcost ) 1 (tm mi t i f o Set mt mt + 1 and i f o 1 0 u where t tm m u 2 o 1 0 u where t tm m u 2. If T = maxruns Go to step 8 Else, Reinitialize S, C as empty sets. Set m m + 1 and n = 1 Go to step 1 Else, current state next state Set m m + 1 and n n + 1 Go to step 2. 8. Start a fresh simulation using the existing Q values and no exploration for maxruns of trajectories to estimate the average effective cost.


45 The exploration and the learning rates are updated using Darken-ChangMoody (DCM) scheme [41]. A detailed flowc hart of RL algorithm is presented in Figure 8.


46 New Patient Arrival (Randomly assigned with initial state) Store intermediate cost and state/action pair Construct the next state by appending test information to the current state Obtain results for tests conducted through the selected action Select action based on the Q-values or using exploration Define actions for the given state using decision rules Record details in the state space and Q-matrix tables Check if actions are defined for the current state Check if diagnosis has been reached yes No No Assign next state as current state Update Q-values and the learning parameters Simulate the system without learning to assess the average effective cost of diagnosing a patient Maximum number of iterations reached yes No yes Figure 8. Detailed flowchart for RL implementation


47 CHAPTER 5 TEST PROBLEM AND IMPLEMENTATION 5.1 Introduction In this chapter, a sample diagnos is scenario to test the proposed methodology is presented. Diagnosis of Solitary pulmonary nodule is considered for implementation. Some of the issues associated with the diagnosis of SPN are discussed in section 3. Section 4 high light information re quirements for the methodology implementation and section 5 provide the key aspects of the RL implementation in optimizi ng the diagnosis process. 5.2 Solitary pulmonary nodule (SPN) A solitary pulmonary nodule or "coin lesion" has been defined as a single spherical lesion, 1 to 6 cm in size, completely surrounded by lung without associated atelectasis or adenopathy. In mo st of the cases, SPN is discovered incidentally on the chest radiograph. The cause of SPN can be a malignant disease such as bronchogenic carcinoma, lymphoma, scarcoma, or it may result from a benign condition such as hamartoma, granuloma, and sclerosing hemangioma. Malignancy is estimated to o ccur in 20%-40% of the patients with SPN. 5.3 Issues of SPN diagnosis SPN may be a sign of cancer and vi rtually all affected patients undergo further diagnostic evaluation. The diagnos tic evaluation may include, and may


48 not be limited to, retrieval and review of any previous chest radiographs, further imaging with conventional tom ography or computed tomography (CT), bronchoscopy or transthoracic needle as piration biopsy (TNAB), consultation with a pulmonologist and a surgeon, and, fo r some patients, thoracotomy or thoracostomy in conjunction with resection of the SPN. The diagnostic challenge in the asse ssment of a patient with SPN is in establishing whether the nodule is benign or malignant. If the SPN is malignant, the physician should expedite the remo val of the nodule and avoid thoracotomy in patients with surgically incurable l ung neoplasms. If the SPN is benign, the physician should avoid thoracotomy becaus e benign nodules rarely pose a risk and need not be resected. Large amount of information and the risks associated with the diagnostic procedures complicate this process. For example, as many as 100 attributes obtained from various diagnostic procedures may be available to the physician to make the decision. It is usually difficult to effectively synthesize the array of available information even for experienced ph ysicians. Examples of risk involved with the diagnostic procedures include o ccurrence of pneumothorax in 15% to 20% due to fine needle aspiration (FNA) pr ocedure and 2% to 3% mortality rate due to thoracotomy. Hence, it is import ant to consider morbidity and mortality associated with invasive procedures. Unfo rtunately, the information provided by non-invasive procedures can be misleading. For example, 20%-40% of resected lesions are generally found to be benign even after sophisticated computer


49 tomography (CT) imaging. However, it is important to promptly and positively identify malignancy in an SPN to ensure optimal treatment. 5.4 SPN data availabili ty for implementation The solution methodology presented in the previous chapter is implemented in the diagnosis of solit ary pulmonary nodule. The methodology requires following information for implementation 5.4.1 Selection of attr ibutes and their value In the model formulation, test set N = {1, 2,, N} is defined as a set of all tests and/or procedures associated with the di agnosis. Each test/procedure can Table 1. Attributes utili zed in the test problem ATTRIBUTES IDENTIFIERS Patients age F1 Computed Tomographys maximum radius F2 Computed tomography maximum area F3 Borders F4 Calcification type F5 Location in thorax F6 Nodes F7 Other sus leasion F8 Sex F9 PET PN image > bg F10 Packets of cigarette per year F11 Body mass index F12 History of cancer F13 Forced expiratory volume, 1 sec, percentage predicted F14 Adjusted DLCO, percent predicted F15 Forced vital capacity, percent predicted F16 Forced Expiratory Flow, 25-75% F17 PET standard update value F18 Decision d provide one or more attributes that can be utilized in the process of diagnosis. The patient state can thus be defined as


50 X = ( 1, 2, N, 1, 2, N, Z, D), where m :mM denote the subset of attributes obtained by the tests ordered at the mth decision epoch, and m denotes the corresponding values of the attributes. Let A denote the set of all patient attr ibutes that are used in the diagnosis process. The subsets of A represent the tests/ procedures associated with the diagnosis. For the current implem entation, 18 different attributes as provided in [40] are used, wh ich are listed in Table 1. Each attribute is assigned an identifier as shown in column 2 of the table. 5.4.2 Decision rules to define the actions set Rough set analysis is utilized to find the minimal attributes of features that preserve the indiscernibility relation. T hese minimal subsets or reducts are used to generate decision rules. The rules so generated are required to define the action set for each patient state. T he rule set utilized in the current implementation is obtai ned from [40], which ar e listed in Table 2. 5.4.3 Generation of attribute values The values for the attributes that constitute an action are generated from the probability distributions defined usi ng the data set from [40]. The data set given in Table 3 consists of 50 cases with confirmed diagnosis. For each case, the data set contains values for all the 18 attributes. Probability distributions for attributes with discrete values were directly based on the frequencies for the attribute values in the data set. Whereas the attributes with continuous values were first descretized and then the frequenc ies of these discrete values were used to define the distribution.


51 Table 2. Decision rule obtained from rough set analysis Rule # Decision Rule 1 IF (F10<=1) AND (F17<=89) THEN (D=B) 2 IF (F2 IN [0.9,1.55]) AND (F 5<=0) AND (F6>=2) AND (F8 IN [2.75,8.25]) THEN (D=B) 3 IF (F14 >= 78) AND (F17<=54) AND (F18 <= 4.1) THEN (D=B) 4 IF (F14 IN [94,109]) A ND (F16 >=103) THEN (D=B) 5 IF (F6<=2) AND (F10>=1) AND (F16<=94) THEN (D=M) 6 IF (F2<=0.9) AND (F10>=1) THEN (D=M) 7 IF (F10>=1) AND (F13<=0) AND (F14<=74) THEN (D=M) 8 IF (F15<=52) THEN (D=M) 9 IF (F14<=80) AND (F17>=31) THEN (D=M) 10 IF (F14 IN [81,87]) THEN (D=M) 11 IF (F6<=2) AND (F10>=1) AND (F14>=84) AND (F15>=63) THEN (D=M) 12 IF (F16<=93) AND (F17>=49) THEN (D=M) 13 IF (F6>=2) AND (F14>=91) AND (F15<=97) THEN (D=M) 14 IF (F2>=1.65) AND (F15>=116) THEN (D=M) 15-1 IF (F8<=0) AND (F10<=1) THEN (D=B) 16-2 IF (F8>=0) AND (F9<=0) AND (F18<=4.1) THEN (D=B) 17-3 IF (F2 IN [0.9,1.1]) AND (F 4>=2) AND (F18<=8.25) THEN (D=B) 18-4 IF (F2 IN [1.1,1.55]) AND (F6>=2) AND (F8<=0) THEN (D=B) 19-5 IF (F2 IN [2.25,2.75] ) AND (F4<=1) THEN (D=B) 20-6 IF (F2 <=2) AND (F6<=2) AND (F10>=1) THEN (D=M) 21-7 IF (F2 >=2.75) AND (F18<=8.6) THEN (D=M) 22-8 IF (F2 <=2.75) AND (F13<=0 )AND (F18>=5.3) THEN (D=M) 23-9 IF (F2 <=3.25) AND (F18>=8.45) THEN (D=M) 24-10 IF (F2 >=3.75) THEN (D=M) 25-11 IF (F2 <=6) AND (F4<=1)AND (F6>=2) THEN (D=M) 26-12 IF (F7 >=1) AND (F8<=0) THEN (D=M)


52 Table 3. Test cases uti lized in the implementation F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 D 1 68 1 3.1 1 3 2 3 1 1 2 150 27 0 104 79 97 79 12 M 2 73 1.5 5.7 1 0 3 1 1 0 2 5 37 1 99 108 92 91 9.4 M 3 57 1.2 3.8 3 0 3 0 0 1 1 20 24 0 90 58 12 41 0 B 4 71 0.75 1.8 1 0 2 0 0 1 1 20 23 0 55 118 78 25 0 B 5 85 1.6 6.2 2 0 3 4 0 0 2 140 19 0 60 100 61 45 3.4 M 6 65 3 28.3 4 0 2 0 0 0 2 60 30 0 87 71 103 34 11.4 M 7 39 4 38.5 2 0 1 0 0 1 2 40 37 0 33 41 25 18 10.1 M 8 57 1.8 10.2 1 0 2 3 0 0 2 30 26 1 97 104 85 98 11 M 9 49 2.5 19.6 1 0 3 0 0 1 1 30 42 0 104 106 106 91 3.6 B 10 72 1.5 7.1 2 0 2 0 0 1 2 20 30 0 64 71 113 31 4.6 M 11 77 3 28.3 2 0 3 0 0 1 2 50 26 1 116 45 107 80 5.5 M 12 73 1 3.1 2 0 3 0 1 1 2 0 25 1 96 113 91 67 9.1 M 13 62 2 12.6 1 0 2 2 1 1 2 100 21 1 50 89 68 22 10.3 M 14 64 2 12.6 2 0 2 2 1 0 2 60 23 1 75 58 82 37 5.1 M IS 71 4 50.3 2 0 1 1 0 0 2 100 39 0 94 62 38 73 4.6 M 16 76 0.8 2 3 0 2 4 0 0 2 25 20 1 76 69 112 16 1.2 M 17 34 4 38.5 2 0 3 0 0 0 2 0 23 1 104 117 102 92 18 M 18 72 2 9.6 2 0 2 1 1 0 2 30 27 0 37 59 43 20 9.3 M 19 70 1.5 4.9 2 0 3 0 0 0 2 0 20 0 109 109 102 84 4.3 B 20 74 3 28.3 2 0 2 0 0 1 2 20 27 0 106 86 92 87 2.25 M 21 34 2 9.6 3 0 1 0 0 0 2 0 19 1 99 96 101 78 1.8 M 22 72 2 7.1 4 0 2 3 1 1 2 100 26 0 56 56 77 24 3.5 M 23 73 2 7.1 4 0 3 0 1 1 2 30 24 0 68 90 59 66 9.9 M 24 70 1 3.1 1 0 3 0 0 0 2 0 29 0 112 84 109 77 26 M 25 75 2 12.6 2 0 3 0 1 1 2 24 26 0 32 56 57 15 9.2 M 26 55 1 3.1 2 0 2 2 1 0 1 70 22 1 79 77 97 31 0 B 27 79 3 28.3 3 0 2 0 0 1 2 35 33 1 44 61 61 24 6.4 M 28 61 5 78.5 2 0 1 0 0 0 2 0 29 1 74 58 63 74 9.5 M 29 69 1 3.1 1 0 3 4 0 1 2 60 24 0 93 88 99 50 13.7 M 30 72 2 12.6 4 0 2 1 0 1 2 50 24 0 102 78 80 31 7.1 M 31 69 1.2 3.8 2 0 2 0 0 0 2 50 32 0 74 51 92 25 2.8 M 32 64 1.5 4.9 3 0 | 0 0 0 2 75 37 0 53 63 80 21 6.1 M 33 64 1 3.1 3 0 3 0 1 1 2 0 31 1 104 115 116 56 7.8 B 34 49 5 50.3 1 0 1 1 0 1 2 30 27 0 75 114 69 65 6.6 M 35 64 4 38.5 1 0 3 1 1 0 2 92 25 0 70 80 88 24 17.6 M 36 66 3.5 33.2 4 0 3 2 0 1 2 81 37 0 88 67 83 59 3.6 M 37 66 6 95 2 0 2 0 0 0 2 50 20 0 90 52 98 59 3.3 M 38 76 5 50.3 1 0 1 1 1 0 2 60 30 1 87 79 87 79 7.1 M 39 62 1 3.1 3 0 3 0 0 0 2 40 28 0 79 61 93 32 17.6 M 40 56 1.5 4.9 1 0 2 0 0 0 1 0 21 1 109 105 107 85 0 B 41 68 2.5 19.6 2 1 3 0 1 1 2 60 26 0 115 136 114 76 6.3 M 42 72 2.5 19.6 3 0 3 2 0 1 2 30 28 1 56 59 57 36 3.2 M 43 53 1 3.1 3 0 3 3 1 0 2 15 19 0 88 87 94 52 4 B 44 52 3.2 30.2 2 0 2 0 0 1 2 35 25 1 92 109 105 46 7.6 M 45 71 1 3.1 3 0 3 0 1 0 2 50 22 0 52 48 81 15 11.4 M 46 54 1 3.1 2 0 3 1 0 0 1 41 32 0 44 81 44 28 0 B 47 47 2.2 10.8 4 0 2 0 1 0 2 30 22 1 81 89 100 37 2.4 B 48 70 3 28.3 4 0 1 0 1 1 2 90 20 0 89 82 90 34 13.6 M 49 45 2 9.6 1 0 2 0 0 0 2 11 23 0 121 128 124 86 2.2 M 50 64 0.5 0.8 1 0 3 1 1 0 2 90 46 0 75 69 84 33 4.2 M


53 5.4.4 Effective cost for tests/procedures The performance measure of each test or procedure used in this research is a measure of actual amount of money spent per unit of diagnostic performance and is referred to as effective cost The method to calculate effective cost is indicated in the previous chapter. For t he purpose of implem entation, the cost values for the attributes were chosen so mewhat arbitrarily. Two sets of such values (denoted as set A and set B) are listed in Table 4. Table 4. Utility values for the attributes Effective Cost Attribute identifier Set A Set B F1 75 70 F2 500 650 F3 150 150 F4 300 300 F5 85 850 F6 140 260 F7 260 140 F8 190 190 F9 100 100 F10 450 750 F11 125 125 F12 210 75 F13 75 210 F14 750 450 F15 600 600 F16 425 425 F17 550 800 F18 650 500 5.5 Implementation This section discusses the key aspec ts of implementation of the RL algorithm in optimizing the diagnosis process of solitary pulmonary nodule.


54 5.5.1 Descritization of the available dataset The data set presented in the Tabl e 3 contains 10 attributes with continuous values and 8 attributes with di screte values. To reduce computational complexity, attributes wit h continuous values were descretized. The number of steps in which an attribute value range was descritized was selected based on the frequency of the attribut e appearing in the rule se t. Higher the frequencies, larger were the number of steps. Table 5 shows the attr ibutes, their value ranges, and the corresponding number of discrete val ues .This process of descretization was carried out by visual inspection of the rule set. However, a more formal algorithmic procedure of th is could be developed. The descretized values were then used for 1) defining patient states 2) transforming decision rules, and Table 5. Attributes, value ranges and corresponding discrete values Attribute Identifiers ( indicates continuous ) Value Range # of de scretization steps *F1 35-85 2 *F2 0.5-6 13 *F3 0.8-95 2 F4 1-4 4 F5 0-3 3 F6 1-3 3 F7 0-4 5 F8 0-1 2 F9 0-1 2 F10 0-1 2 *F11 0-150 2 *F12 19-46 2 F13 0-1 2 *F14 32-121 9 *F15 41-136 5 *F16 12-124 3 *F17 15-98 5 *F18 0-26 7


55 3) generating values for various attributes during simulation runs. An example of decision rule transformation is as follows. The original rule says IF (F10<=1) AND (F17<=89) THEN (D=B). This was transformed to IF (F10<=1) AND (F17<=4) THEN (D=B). F10 is a binary valued attribute and hence t here was no change to its value in the rule transformation. The values for F17 which is a continuous valued attribute, were descretized in 5 steps as indicat ed in Table 5. Hence, its value was changed to 4 in the transformed rule. 5.5.2 Categorization of patient states The patient states are divi ded into three categories: initial states, transient states and decision states. Init ial states are the patient states at the beginning of a diagnosis process. The attributes pr esent in the init ial states are patients age (F1), sex of the patient (F9), packets of cigarettes per year (F11), patients body mass index (F12) and, history of cancer (F13). All states between the initia l and the decision states ar e classified as transient states. These states indicate the asse ssed condition of t he patient during the diagnosis process and are formed by appending attribute information obtained through additional testing. Decision states are those with attribute values that satisfy one or more decision rules.


56 5.5.3 Dynamic genera tion of the states The patient state is generated progressi vely by appending attributes of the tests are ordered starting with an initial a ttribute vector of size 5 representing initial states. Since tests (attributes) can be chosen in any order, all possible combinations of attribute vectors (of all sizes) must be consi dered. For example, beyond the 5 attributes in the initial stat es, if two new attributes are to be appended, then there are 13C2 possible ways to do tha t. The patient state also contains the values of the attributes. H ence the number of possi ble patient states in the above example will be ni 13C2 d1 d2 where ni indicates the number of initial states, and d1 and d2 represent the possible valu es of the new attributes respectively. Let K be the set of all possible attri bute vectors of sizes 5 through 18. For any vector k K of size n, then possible combi nations of attribute values can be given as n i i kd S1 whereidis the number of distinct values of attribute i and n is the number of attributes present in the vector. Cardinality of the set K is then given by 13 1 131j jC where 1 indicates the fixed set of init ial attributes, and the summation takes care of all the possible wa ys of appending the initial vector. Thus, the state space E is given by KS for all k K.


57 Using the above method and the descritiz ed values present in the Table 5, the size of the state space was found to be in the range of 6.69x1011. For this size of the state space, it would be difficult to perform the necessary computations in a reasonable amount of time. Hence, t he system is initiated with 32 different initial patient st ates with only two possible va lues of each of the five initial attributes. The remaining stat es are dynamically generated, as needed, and are added to the state s pace along with their corres ponding entries in the Qmatrix table. 5.5.4 Generation of acti ons for a patient state An action is a set of new attributes that, when supplemented to the current patient state attributes, fulfill the diagnos is criteria of one or more rules. Generation of actions for a patient state involves identifying all such attributes sets. This is accomplished by comparing the information available in the patient state at the given diagnostic st age with all the available rules. 5.5.5 Learning parameters Three learning parameters involved in the implementation of the methodology are and used for updating Q-values, average cost and in determination of the choice between explor ation or exploitation when selecting an action, respectively. The running value of the learning parameter at any given Table 6. Values for the learning parameters Learning parameter Initial value Constant 0.1 1000000 0.1 250000 0.5 100000


58 time is determined by its initial value and the large constant used to control the decay rate. The values used for the curr ent implementation are listed in Table 6. 5.5.6 Q-value updates The Q-values for the state-action pai rs involved in the trajectory are updated at the end of each diagnosis proc ess. In the equation for updating Qvalues, the immediate cost of taking action consists of two parts. The first is the sum of effective costs of the attributes involved in the action and the second is the cost of time. It is assumed that longer diagnosis process incurs additional cost beyond the cost of testing and this cost increases with time. This cost is obtained using a quadratic function given by cost of time = 285.714 (398.81 ) + (151.19 2) where is the number of steps involved in the diagnosis process. The function was defined using regression on set of cost values for each step of diagnosis. The values ranged from zero to the maxi mum cost of the diagnosis and were chosen somewhat arbitrarily. For updating t he q-values, the total cost is equally distributed over all the steps involved in the trajectory. The total cost is the summation of cost of all the actions taken in the trajectory plus the cost of steps involved. 5.5.7 Software framework The application developed for the implementation of the proposed methodology is based in an Oracle 8i database. Data management is one of the major challenges in an RL implementat ion especially when t he state space is large. Oracle provides resilient O-RDBMS framework for efficient data structuring


59 and storage. The logic for t he algorithm provided in Figure 8 was programmed using PL/SQL.


60 CHAPTER 6 RESULTS AND ANALYSIS 6.1 Introduction In this chapter, the numerical resu lts obtained by applying the proposed solution methodology to the diagnos is of solitary pulmonary nodule are presented. Results from the learning and the lear nt phases of the RL implementation alongwith a comparative study between the RL and several alternative policies are presented. 6.2. RL policy Figures 9 and 10 show the results from the learning and the learnt phases, of the RL algorithm using costs from se t A and set B respectively. The horizontal axis represents the number of patients us ed in the simulated diagnosis process, and the vertical axis represents the cost of diagnosis. Since a large number of patients (about 5000) were considered, in stead of plotting individual entries, costs for every 100 consecutive patients were plotted. The learning curve shows cost incurred during the learning phase of the RL methodology, and the learnt curve indicates the cost of im plementing the learnt policy. The trends of the learning curves cl early indicate a significant policy improvement through a drop in the diagnosis cost as learning proceeds. The policy improvement during the learning process is also ev ident from the results


61 0 50000 100000 150000 200000 250000 135791113151719212325272931333537394143454749Number of patients (in 100)Cost of diagnosis (100 patients)Note: All values in generic unitsLearning Phase Learnt Phase Figure 9. RL results for cost set A 0 50000 100000 150000 200000 250000 300000 135791113151719212325272931333537394143454749Number of patients (in 100)Cost of diagnosis (100 patients)Note: All values in generic unitsLearnt Phase Learning Phase Figure10. RL results for cost set B


62 summarized in Table 7, where the cost of diagnosis averaged for the first and last 500 patients are shown. Table 7. Comparison of cost av eraged for first and last 500 patients First 500 runs Last 500 runs Average Cost 95% C.I Average Cost 95% C.I. Set A 1924.81 90.73 1346.75 98.25 Set B 2172 103.60 1213.6 113.93 Note: All values in generic units 6.3 Benchmarking RL policy The results obtained using the RL policy were benchmarked with four other alternative policies. The comparison is based on the total cost of diagnosis. The four policies chosen for benchmarking differ in their methods of selecting an action, which are described below. 1. Minimum cost policy (HMO policy): The action with minimum effective cost is chosen for any given patient state. 2. Random action policy: An action fr om all available actions is chosen randomly. 3. Minimum number of tests policy: The action that involves minimum number of tests to be perform ed at a time is selected 4. Maximum number of tests policy: The action that involves maximum number of tests to be perform ed at a time is selected. A visual comparison of the performances fo r the four alternate policies along with the RL policy for costs in set A is shown in Figure 11. As evi dent from the plots, the results obtained from t he RL policy fares better t han all the other approaches.


63 A numerical performance summary of all the fi ve policies is presented in Table 8. It is noted from all the resu lts presented that RL policy seems to have a slightly higher confidence interval than some of the other benchm arked policies. 0 50000 100000 150000 200000 250000 300000 123456789101112131415161718192021222324252627282930Number of Patients (in 100)Cost of diagnosis (100 patients)Note: All values in generic unitsRandom action Minimum tests RL HMO Maximum tests Figure 11. Comparison of results obt ained using RL approach and benchmark approaches Table 8. Comparison of the total cost of diagnosis averaged over 3000 runs Policy Average Cost 95% C.I. Cost(%) above RL Policy RL 1338.52 46.2 N/A Minimum Cost 1863.37 59.67 28.17 Random Action 2163.61 42.31 38.14 Minimum Tests 2171.91 41.49 38.37 Maximum tests 1932.45 30.99 30.74 Note: All values in generic units Since the HMO policy performed the bes t among the alternatives of RL, actions for some of the frequently vi sited states as adapted by HMO and RL


64 policies are summarized in Tables 9 and 10. The data in these tables gives a visual depiction of the action choices made by the policies. Actions chosen for five of the most visited fr equently initial states for the simulations run with cost set A are presented in Table 9. The action choices summarized in Table 10 are for the most frequently visited in termediate states that were reached after the visits to the initial states during the simulation run with cost set B. The first two columns of the tables indicate the patient stat e (identified by a coded number) in the state space and the number of time s the state was visited (in Table 10) during the simulation runs respectively. The actions under the two policies are indicated by their attributes and the corresponding costs. Table 9. Policy comparison between HMO and RL approaches for set A HMO Policy RL policy Most visited state Action attributes Action Cost Action attributes Action Cost 21 7,8 450 7,8 450 20 7,8 450 2 500 5 7,8 450 7,8 450 22 7,8 450 2,18 1150 23 7,8 450 7,8 450 Note: All values in generic units Table10. Policy comparison betw een HMO and RL approaches for set B HMO Policy RL policy Most visited state Number of visits in learning phase Action attributes Action Cost Action attributes Action Cost 1451 39 14 450 2,4 950 1711 33 14 450 2,8 1150 437 31 7,8 330 17 800 414 29 7,8 330 6,16 685 869 29 7,8 330 14 450 Note: All values in generic units


65 Though for the initial states (Table 9) the RL policy often chose the minimum cost action, it may be noted that for intermediate states (Table 10) the RL policy actions are significantly different with much higher costs. Given the fact that RL policy provides the lowest average cost of diagnosis (table 8), the action choices indicate that the learning agent is able to account for the long term impact of its decisions in the diagnosis process.


66 CHAPTER 7 CONCLUSION A machine learning based methodology for developing efficient strategies for medical diagnosis is presented. The problem of diagnosis is first modeled as a Markov decision process (MDP). The pat ient state is defined as a vector of attributes of the diagnostic tests conduc ted and their numerical outcomes. For each patient state, possible actions are obtained from the set of decision rules that are generated by mining available data set using rough set theory. Since, there are a large number of patient states and a significant number of corresponding action choices, a reinforc ement learning algorithm is developed for obtaining efficient diagnostic policies. The methodology is implemented on the problem of Solitary Pulmonar y Nodule (SPN) diagnosis. An existing data set of patients diagnosed with SPN along with a rule set developed from this data using rough sets analysis served as an input fo r implementation of the methodology. The results show that RL based me thodology presents to be a superior alternative to many other commonly used approaches. Most of the work done in the area of medical decision making consider various learning techniques such as neur al networks, decision trees, rough sets, reinforcement learning in isolation with each other. The proposed methodology combines the principals of rough set theory, utility theor y and reinforcement learning to provide a robust mechanism of developing efficient strategies for


67 medical diagnosis. The system so developed learns all of its parameters from the databases of previous cases. Thus the methodology is data driven and is independent of any intera ction with domain expert. As with the implementation of most of real world decision making models, the methodology presented is often bur dened by very large state spaces. However, the concept of dynamic generat ion of system states, which allows dealing with the previously encountered states and not the whole state space, coupled with RL based approach make this methodology comput ationally viable. The methodology developed here is desig ned for offline use, i.e., the diagnostic strategies for all patient states are predetermined. The practicing physician needs only specify the patient state and retrieve the corresponding diagnostic action. However, the methodolog y should be rerun periodically to get new diagnostic strategies using the updated databases. 7.1 Future extensions Although, the methodology presented in this research should cover any medical diagnosis process, the extensions listed bel ow would have a positive impact in its implementation. A system to suggest and evaluate al ternative hypothesis based on the initial symptoms to diagnose a patient. This would then serve as an entry point for the presented methodology to come up with an efficient strategy to approach diagnosis process. Each test/procedure can provide one or more attributes that can be utilized to model the diagnosis proce ss. The RL algorithm can be modified


68 to account for the multiple attr ibute information made available by conducting a single test when selecting an action. The tests involved in the diagnosis pr ocess are considered independent of each other. Formal methods should be used to model joint probability distributions for t he test outcomes.


69 REFERENCES [1] Vili Podgorelec and Peter Kokal, (2001), Towards more optimal medical diagnosing with evolutionar y algorithms, Journal of Medical Systems; 25,3: 195-219. [2] Defensive Medicine and Medical Ma lpractice, Chapter 1, Findings and Policy Options, The Office of Technology Assessment. [3] Shusaku Tsumoto, (1996), The app lication of Rough Sets based data mining techniques to differential diagnosis of Meningoenchepahlitis; ISMIS; 438-447. [4] Claudia Henschke, (1994), Evaluation of competing tests for the diagnosis of pulmonary embolism and deep vein thrombosis, Part 1, Clinical Imaging; 18: 241-247. [5] U.M. Fayyad, G. Piat etsky-Shapiro and P. Smyth, (1996), The KDD process for extracting useful kn owledge from vo lumes of data. Commun. ACM 39 ; 27. [6] Shortli, E.H., (1976), Computer-B ased Medical Consultations: MYCIN. Elsevier. [7] Miller, R.A., Pople, H.E. and My ers, J.D. (1982), Internist-I, An experimental computer-based diagnosti c consultant for general internal medicine, The New England Jour nal of Medicine; 307(8): 468. [8] Pople, H.E., (1982), Heuristic methods for imposing structure on ill structured problems: The st ructuring of medical diagno sis. In Szolovits, P. (Ed.) Artificial Intelligence in M edicine, AAAS Selected Symposium Series, West View Press;119. [9] Feigenbaum, E.A.,(1977), The art of artificial intelligence 1: Themes and case studies of knowledge engineering. Technical report, Pub. no. STANSC-77-621, Stanford University, D epartment of Com puter Science. [10] Quinlan, J.R., (1993), C4.5: Progr ams for Machine Learning, San Mateo, CA, Morgan Kaufmann.


70 [11] Michalski, R.S. (1986) Understanding the nature of learning: Issues and research directions. In: Machine Learning: An Artificial Intelligence Approach, Morgan Kaufmann; 3. [12] Zdzislaw Pawlak, (1991), Rough Sets: Theoretical Aspects of Reasoning about Data volume 9 of Series D: System Theory, Knowledge Engineering and Problem Solving Kluwer Academic Publishers, Dordrecht, The Netherlands. [13] Zdzislaw Pawlak, (1982) Rough sets. International Journal of Information and Computer Science ; 11(5):341. [14] Jan Basen, (1994), A comparis on of dynamic and non-dynamic Rough Sets methods for extracting laws from decision tables, Rough Sets in Knowledge Discovery; 321-365. [15] Agrawal, R., Manilla, H., Srikant, R., Toivonen, H. and Verkamo A.I., ( 1996) Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, AAAI Press; 307. [16] Brossette, S.E., Sprague, A.P., Ha rdin, J.M., Waites, K.B., Jones, W.T., Moser, S.A., (1998), Association rules and data mining in hospital infection control and public health surv eillance. Journal of the Americal Medical Inform. Assoc; 5(4): 373. [17] Kira K, Rendell L, (1992), T he feature selection problem: Traditional methods and new algorithm. In: Proceedings AAAI; 129-134 [18] K, Rendell L, (1992) A practical approach to feature selection. In: Proceedings International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann; 249. [19] Mizoguchi, F., Ohwada, H., Daidoj i, M., Shirato, S., (1997), Using Inductive Logic Programming to learn cl assification rules that identify glaucomatous eyes. In: Intelligen t Data Analysis in Medicine and Pharmacology, Kluwer; 227. [20] Srinivasan, A., King, R.D., Mugglet on, S.H. and Sternber g, M.J.E., (1997), Carcinogenesis predictions using inductive logic programming. In Intelligent Data Analysis in Medicine and Pharmacology, Kluwer; 243. [21] Fausett, L.V., (1994), Fundamentals of neural networks: Architectures, algorithms and applications, Prentic e Hall, Upper Saddle River, NJ.


71 [22] Zhu, Y. and Yan, H., (1997), Computerized tumor boundary detection using a hopfield neural network, I EEE Transactions on Medical Imaging; 16(1): 55. [23] Ham, F.M. and Han, S. (1996), Classification of cardiac arrhythmias using fuzzy artmap, IEEE Transactions in Bi omedical Engineering; 43(4): 425 430. [24] Modai, I., Israel, A., Mendel, S ., Hines, E.L. and Weizman, R., (1996), Neural network based on adaptive resonance theory as compared to experts in suggesting treatment for schizophrenic and unipolar depressed in-patients, Journal of Medi cal Systems; 20(6): 403. [25] Kohonen, T., (1988), Se lf-organization and associ ative memory, SpringerVerlag, New York. [26] Jan Fibak, Zdzislaw Pawlak, Kr zysztof Slowinski, and Roman Slowinski, (1986), Rough sets based decision al gorithm for treat ment of duodenal ulcer by HSV. In Bulletin of the Polish Academy of Sciences volume 34 of Biological Sciences ; 227. [27] Krzysztof Slowinski, (1992), Rough cl assification of HSV patients. In Intelligent Decision Support: Handbook of Applications and Advances in Rough Sets Theory volume 11 of Series D: System Theory, Knowledge Engineering and Problem Solving Kluwer Academic Publishers, Dordrecht, The Netherlands; 77. [28] Zdzislaw Pawlak, Krzysztof Sl owinski, and Roman Slowinski, (1986), Rough classification of patients afte r highly selective vagotomy for duodenal ulcer. International Journal of Man-Machine Studies ; 24:413 433. [29] Krzysztof Slowinski an d Roman Slowinski, (1990), Sensitivity analysis of rough classification. International Journal of Man-Machine Studies; 32:693. [30] Staal Vinterbo, Lucila Ohno-Mac hado, and Hamish Fraser. A description of a strategy for building rough set cla ssifiers using performance filtering of reducts; 975. [31] R. L. Kennedy, A.M. Burton, H. S. Fraser, L. N. McStay, and R. F. Harrison, (1996), Early diagnosis of acute myocardial infarction using clinical and electrocardiographic dat a at presentation: Derivation and evaluation of logistic regression models. European Heart Journal ; 17:1181.


72 [32] Christine L. Tsien and Hamish S. F. Fraser, (1997), Optimizing diagnosis of myocardial infarction in the em ergency room: A new flowchart decision aid. Presented at the American College of Cardiology Dearborn Summit. [33] Barbara E. Wojcik and Wojciech Zi arko, (1996), Rough sets approach to analysis of databases of women with br east cancer treated in the U.S. military facilities. In Proc. Symposium on Modelling, Analysis and Simulation, IMACS ulticonference on Computational Engineering in Systems Applications (CESA) ; 2: 748. [34] L. K.Woolery and J. Grzymala-Busse, (1994), Machine learning for an expert system to predict preterm birth risk. Journal of the American Medical Informatics Association ; 1(6):439. [35] Alicja Wakulicz-Deja and Piotr Paszek, (1997), Diagnose progressive encephalopathy applying t he rough set theory. International Journal of Medical Informatics ; 46(2):119. [36] Ulf Carlin, Jan Komorowski, and Aleksander hrn, (1998) Rough set analysis of medical datasets in a ca se of patients with suspected acute appendicitis. In Proc. ECAI98 Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP98) ; 18. [37] Ulf Carlin, Jan Komorowski, and Aleksander hrn, (1998), Rough set analysis of patients with sus pected acute appendicitis. In Proc. Seventh Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU98) ; 1528. [38] Frank Castro, ( 1999), Sequential Test Sele ction in the analysis of abdominal pain, Medical De cision Making; 19:178-183. [39] Dennis Patton, MD, and James W oolfenden, MD, (1989) A Utility-based model for comparing the cost-effectiveness of Diagnostic Studies. In Investigative Radiology; 24: 263-271. [40] Andrew Kusiak, Jeffrey Kern, Ke mp Kernstine, and Bill Tseng, (2000) Autonomous decision making: a data mining approach. [41] C. Darken, J Chang, and J Moody. Learning rate schedules for faster stochastic gradient search. In Neural Networks for Signal Processing 2 Proceedings of the 1992 Workshop. I EEE Press, Piscataway, NJ, 1992.