USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
PAGE 1 Detecting Financial Statement Fraud: Three Essays o n Fraud Predictors, MultiClassifier Combination and Fraud Detection Us ing Data Mining by Johan L. Perols A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Information Systems and Decision Scie nces Department of Accountancy College of Business University of South Florida CoMajor Professor: Kaushal Chari, Ph.D. CoMajor Professor: Jacqueline L. Reck, Ph.D. Uday S. Murthy, Ph.D. Manish Agrawal, Ph.D. Date of Approval: April 10, 2008 Keywords: Earnings Management, Discretionary Accru als, Unexpected Productivity, Information Markets, Combiner Methods, Machine Lear ning Copyright 2008, Johan L. Perols PAGE 2 Dedication To Becca who provided support (in many ways), encou ragement and motivation, helped me with my ideas, and believed in me more than I sometimes did; and to family and friends for providing the motivation for completing this dissertation. PAGE 3 Acknowledgments To the faculty, administrators and fellow Ph.D. stu dents in the information systems and decision sciences department and the accounting department, thank you for all the support and for creating an excellent learning environment. I especially wan t to acknowledge Dr. Chari, Dr. Reck, Dr. Murthy, Dr. Agrawal, Dr. Bhattacherjee and Ann Dzur anin. To my dissertation committee, thank you for reviewing, improving, supporting, criticizi ng and editing my dissertation. I am grateful to Dr. Chari for mentoring me, for emphasizing the imp ortance of producing quality research, for guiding me in design science research, for working with me on the first essay and for prioritizing my development as a researcher over obtaining resea rch assistance. I am thankful to Dr. Reck for introducing me to and guiding me in the archival re search method, for helping me shape and organize my dissertation, for working with me on th e second essay, and for providing advice on how to combine the accounting and information syste ms concentrations. I am grateful to Dr. Murthy for introducing me to AIS research, for enco uraging and supporting the addition of the accounting concentration, and for working with me o n various projects. I thank Dr. Agrawal for helping me with my first conference paper, for work ing with me on the first essay, for coaching and helping me prepare for conference presentations and for taking time to discuss my research. Outside of my dissertation committee I also want to acknowledge Dr. Bhattacherjee for teaching me, both in and outside of class, about research an d the importance of theory, and Ann Dzuranin for encouraging and pushing me to add the accountin g concentration, for the many great discussions and for being a great friend. PAGE 4 i Table of Contents List of Tables..................................... ................................................... .........................................v List of Figures.................................... ................................................... ......................................vii Abstract .......................................... ................................................... ........................................viii Chapter 1. Dissertation Overview.............................. ................................................... ..............1 1.1. Research Framework................................. ................................................... ......1 1.2. Overview of the Three Essays....................... ................................................... ..3 Chapter 2. Information Market Based Decision Fusion........... ..................................................5 2.1. Introduction....................................... ................................................... ..............5 2.2. Related Research................................... ................................................... ..........7 2.3. Information Market Based Fusion.................... ..................................................8 2.3.1 Information Markets................................ ................................................... .8 2.3.2 Information Market Based Fusion.................... ...........................................9 2.3.2.1 Determining Final Odds............................. .............................................10 2.3.2.2 Classifying Objects................................ .................................................14 2.3.2.3 Distributing Payout................................ .................................................14 2.4. Experimental Setup................................. ................................................... ......15 2.4.1 BaseClassifiers and Data.......................... ................................................15 2.4.2 Experimental Design and Factors.................... ..........................................17 2.4.2.1 Dependent Measure.................................. ..............................................18 2.4.2.2 Combiner Method Factor............................. ...........................................19 2.4.2.3 Sensitivity Analysis............................... .................................................20 2.4.2.4 Investigating the True Class of All Objects........ ....................................21 2.4.3 Time Lag, IMF Parameters and BaseClassifier CostB enefit Retraining.22 2.4.3.1 Time Lag and Performance........................... ..........................................22 2.4.3.2 Selection of IMF Parameters........................ ..........................................22 2.4.3.3 BaseClassifier CostBenefit Retraining............ .....................................22 2.5. Results............................................ ................................................... ...............23 2.5.1 Relative Combiner Method Performance............... ...................................23 2.5.1.1 Overview........................................... ................................................... ...23 PAGE 5 ii 2.5.1.2 Combiner Method Main Effect........................ .......................................23 2.5.1.3 Sensitivity Analysis............................... .................................................24 2.5.1.4 Investigating the True Class of All Objects........ ....................................25 2.5.2 Time Lag, IMF Parameters and BaseClassifier CostB enefit Retraining Overview........................................... ................................................... ......26 2.6. Discussion Overview................................ ................................................... .....28 2.6.1 Combiner Method Performance........................ .........................................28 2.6.2 Time Lag, IMF Parameters and BaseClassifier CostB enefit Retraining.30 2.6.3 Combiner Method Design Considerations.............. ...................................31 2.7. Conclusions and Future Research Directions......... ..........................................31 Chapter 3. The Effect of Discretionary Accruals, Earnings Expe ctations and Unexpected Productivity on Financial Statement Fraud.......... ................................................... 32 3.1. Introduction....................................... ................................................... ............32 3.2. Related Research................................... ................................................... ........33 3.2.1 Fraud Motivated by Prior Years Earnings Management ..........................35 3.2.2 Fraud and Earnings Management Motivations.......... ................................36 3.2.3 Fraud in the Revenue Account....................... ............................................38 3.3. Hypotheses Development............................. ................................................... .40 3.3.1 Prior Years Discretionary Accruals and Fraud...... ...................................40 3.3.2 Capital Market Expectations and Fraud.............. .......................................41 3.3.3 Unexpected Labor Productivity and Fraud............ ....................................42 3.4. Research Design.................................... ................................................... ........43 3.4.1 Variable Construction.............................. ..................................................4 3 3.4.1.1 Total Discretionary Accruals....................... ...........................................43 3.4.1.2 Forecast Attainment................................ ................................................44 3.4.1.3 Unexpected Revenue per Employee.................... ...................................44 3.4.1.4 Control Variables.................................. ..................................................4 5 3.4.2 Model for Hypotheses Testing....................... ............................................48 3.4.3 Data Sample........................................ ................................................... ....48 3.4.3.1 Experimental Sample................................ ..............................................48 3.4.3.2 Comparing Treatment and Control Samples............ ...............................50 3.5. Results............................................ ................................................... ...............51 3.5.1 Hypotheses Testing................................. ................................................... 51 3.6. Additional Analyses................................ ................................................... ......56 PAGE 6 iii 3.6.1 Sensitivity Analyses............................... ................................................... .56 3.6.1.1 Discretionary Accruals............................. ..............................................56 3.6.1.2 Real Activities Manipulation....................... ...........................................57 3.6.1.3 Additional Control Variables....................... ...........................................59 3.6.1.4 Industry Clustering................................ .................................................60 3.6.2 Alternative Measure Design......................... .............................................62 3.6.2.1 Revenue Fraud...................................... ..................................................6 2 3.6.2.2 Total Discretionary Accruals Aggregation Periods... .............................66 3.6.2.3 Analyst Forecast Period............................ ..............................................69 3.7. Conclusions........................................ ................................................... ...........70 Chapter 4. Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms................................ ................................................... ............72 4.1. Introduction....................................... ................................................... ............72 4.2. Related Research................................... ................................................... ........74 4.3. Experimental Variables and Data.................... .................................................77 4.3.1 Classification Algorithms.......................... ................................................77 4.3.1.1 Overview........................................... ................................................... ...77 4.3.1.2 Algorithm Selection................................ ................................................78 4.3.1.3 Algorithm Overview and Tuning...................... ......................................81 4.3.2 Classification Cost................................ ................................................... ..84 4.3.3 Prior Probability of Fraud......................... .................................................85 4.3.4 Dependent Variable................................. ..................................................8 5 4.3.5 Data............................................... ................................................... ..........86 4.3.5.1 Classification Objects: Fraud and NonFraud Firms D ata......................86 4.3.5.2 Object Features Financial Statement Fraud Predict ors........................87 4.4. Experimental Procedures and Preprocessing.......... ..........................................88 4.4.1 Preprocessing...................................... ................................................... ....88 4.4.1.1 Training Data Prior Fraud Probability.............. ......................................88 4.4.1.2 Data Filtering..................................... ................................................... ..96 4.4.1.3 Fraud Predictor Utility............................ ................................................96 4.4.2 Classifier Evaluation.............................. ..................................................1 02 4.5. Results............................................ ................................................... .............104 4.6. Discussion......................................... ................................................... ..........110 Chapter 5. Dissertation Conclusion............................ ................................................... .........114 PAGE 7 iv Chapter 6. References......................................... ................................................... .................117 Chapter 7. Appendices......................................... ................................................... ................123 PAGE 8 v List of Tables Table 2.1 BaseClassifiers......................... ................................................... .......................15 Table 2.2 Datasets................................. ................................................... ............................16 Table 2.3 Experimental Variables................... ................................................... ..................17 Table 2.4 Statistical Analysis Data................ ................................................... ...................24 Table 2.5 Summary of Primary Results............... ................................................... .............27 Table 3.1 Sample Selection......................... ................................................... ......................49 Table 3.2 Industry Distribution of Fraud Firm...... ................................................... ...........50 Table 3.3 Sample Descriptive Statistics for Study V ariables........................................... ...52 Table 3.4 Pearson and Spearman Correlations for Stu dy Variables....................................53 Table 3.5 The Effect of Total Discretionary Accrual s, Forecast Attainment and Unexpected Revenue per Employee on Financial Statem ent Fraud Likelihood Logistic Regression Results............. .................................................55 Table 3.6 Alternative Total Discretionary Accruals Measure Logistic Regression Results............................................ ................................................... ...................57 Table 3.7 Total Discretionary Accruals, Real Activi ties Manipulation and Financial Statement Fraud Logistic Regression Results........ ..............................................59 Table 3.8 Additional Control Variables Logistic Regression Results........................ .........61 Table 3.9 Major Industry, Total Discretionary Accru als Forecast Attainment and Unexpected Revenue per Employee Logistic Regression Results.......................62 Table 3.10 Comparison of % RE and Diffemp Logistic Regression Results......... ..............64 Table 3.11 Comparison of Model Fit and Predictive A bility of % RE and Diffemp Logistic Regression Results........................ ................................................... ......65 Table 3.12 Unexpected Revenue per Employee and Abno rmal Change in Employees Logistic Regression Results........................ ................................................... ......67 Table 3.13 Three Years, Two Years and One Year Tota l Discretionary Accruals Logistic Regression Results........................ ................................................... ......68 Table 3.14 Alternative Analyst Forecast Measure Log istic Regression Results...................69 Table 4.1 Sample Selection......................... ................................................... ......................88 Table 4.2 Prior Research Financial Statement Fraud Predictors......................................... 89 PAGE 9 vi Table 4.3 Experimental Financial Statement Fraud Pr edictors........................................... .91 Table 4.4 Training Prior Fraud Probabilities: Selec ted Training Prior Fraud Probabilities for each Classifier at Different Leve ls of Evaluation Prior Fraud Probability and Evaluation Relative Error Cos t........................................95 Table 4.5 Data Filtering: ERC for each Combination of Classifier and Data Filtering Method at Different Levels of Evaluation Prior Fraud Probability and Evaluation Relative Cost....................... ................................................... .....97 Table 4.6 Attribute Selection: The Percentage of Fo lds in which Predictor was Selected for Each Classifier....................... ................................................... .....101 Table 4.7 Preprocessing Result Overview: Selected T raining Prior Fraud Probabilities, Data Filtering Methods and Predictor s........................................103 Table 4.8 Descriptive Statistics of Classifier Est imate Relative Cost..............................10 4 Table 4.9 Regression Results for Testing Interactio ns between Classifier and Prior Fraud Probability, and Classifier and Relative Erro r Cost.................................106 Table 4.10 Comparison of Treatment Groups TukeyKramer HSD Connected Letters Report..................................... ................................................... ............109 Table 4.11 Classifier Average Estimated Relative Co st at Best Estimates of Relative Error Cost and Prior Fraud Probability Levels...... ............................................111 PAGE 10 vii List of Figures Figure 1.1: Research Framework.................... ................................................... .............................2 Figure 2.1: Generic Classifier Combiner Architectu re................................................. ..................5 Figure 2.2: IMF Flowchart......................... ................................................... ................................10 Figure 2.3: Binary Search......................... ................................................... .................................11 Figure 2.4: Payout Distribution Time Lag.......... ................................................... .......................15 Figure 2.5: Combiner Method (MAJ and IMF) x Divers ity Interaction.................................... ...26 Figure 3.1: IncomeIncreasing Discretionary Accruals of Fraud a nd NonFraud Firms..............35 Figure 4.1: Classifier Comparison Estimated Relati ve Cost............................................ ...........108 PAGE 11 viii Detecting Financial Statement Fraud: Three Essays o n Fraud Predictors, MultiClassifier Combination and Fraud Detection Us ing Data mining Johan L. Perols ABSTRACT The goal of this dissertation is to improve financi al statement fraud detection using a crossfunctional research approach. The efficacy of finan cial statement fraud detection depends on the classification algorithms and the fraud predictors used and how they are combined. Essay I introduces IMF, a novel combiner method classificat ion algorithm. The results show that IMF performs well relative to existing combiner methods over a wide range of domains. This research contributes to combiner method research and, thereb y, to the broader research stream of ensemblebased classification and to classification algorithm research in general. Essay II develops three novel fraud predictors: total discre tionary accruals, meeting or beating analyst forecasts and unexpected employee productivity. The results show that the three variables are significant predictors of fraud. Hence Essay II pro vides insights into (1) conditions under which fraud is more likely to occur (total discretionary accruals is high), (2) incentives for fraud (firms desire to meet or beat analyst forecasts), and (3) how fraud is committed and can be detected (revenue fraud detection using unexpected employee productivity). This essay contributes to confirmatory fraud predictor research, which is a s ubstream of research that focuses on developing and testing financial statement fraud pr edictors. Essay III compares the utility of artifacts developed in the broader research streams to which the first two essays contribute, i.e., classification algorithm and fraud predictor resear ch in detecting financial statement fraud. The results show that logistic regression and SVM perfo rm well, and that out of 41 variables found to be good predictors in prior fraud research, only si x variables are selected by three or more classifiers: auditor turnover, Big 4 auditor, accou nts receivable and the three variables introduced in Essay II. Together, the results from Essay I and Essay III show that IMF performs better than existing combiner methods in a wide range of domain s and better than stacking, an ensemblebased classification algorithm, in fraud detection. The results from Essay II and Essay III show PAGE 12 ix that the three predictors created in Essay II are s ignificant predictors of fraud and, when evaluated together with 38 other predictors, provide utility to classification algorithms. PAGE 13 1 Chapter 1. Dissertation Overview The Association of Certified Fraud Examiners (ACFE 2006) estimates that occupational fraud totals $652 billion per year in the U.S. or about 5 % of total revenues. A national survey conducted by KPMG in 2003 reported that 75% of orga nizations had experienced fraud in the three months leading up to the study (KPMG 2003). T he potential benefit of fraud reduction is staggering; a 33% reduction in fraud would result i n a 26% increase in average profits of American organizations. Detecting fraud is, however difficult and according to ACFE (2006) 25% of discovered fraud is detected by accident as compared to proactive measures such as internal audits (20%), internal controls (19%) or e xternal audits (12%). 1.1. Research Framework The overarching goal of this dissertation is to imp rove financial statement fraud detection. Figure 1.1 shows how the three essays are related a nd how they contribute to the goal of improving financial statement fraud prediction. The efficacy of the detection depends on the classification algorithms and the fraud predictors used and how they are combined. Essay III, Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms, evaluates the utility of different clas sification algorithms and fraud predictors for predicting financial statement fraud. This research pushes the research frontier in data mining fraud detection research in the functional area of accounting information systems. Essay I, Information Market Based Decision Fusion, introduce s a new classification algorithm and contributes to classification algorithm research in the functional area of information systems. Three new fraud predictors are developed in Essay I I, The Effect of Discretionary Accruals, Earnings Expectations and Unexpected Productivity o n Financial Statement Fraud. This research adds to fraud predictor research in the functional area of accounting. Thus, I use a crossfunctional research approach focusing on two functi onal areas, accounting and information systems, and their confluence, accounting informati on systems, to improve financial statement fraud detection. In the classification algorithm essay (Essay I) and the fraud predictor essay (Essay II) I approach specific research substreams with the int ention of moving each substream forward. PAGE 14 2 Essay I focuses on multiclassifier combination com biner method research, which is a research substream within multiclassifier combination rese arch that, in turn, is a substream of the broader research stream of classification algorithm s. Essay II extends confirmatory fraud predictor research within the broader research stre am of financial statement fraud predictors. The third essay represents the nexus of the first two e ssays, and brings the two broader research streams together with the sole objective of improvi ng fraud detection. This essay examines the utility of different combinations of classification algorithms and fraud predictors. Thus, the first two essays can be seen as contributing to the basic research needed for the third essay to accomplish its objective of improving financial sta tement fraud detection. Note, however, that the goal of the third essay is not to test the utility of the algorithm developed in the first essay or to test the efficacy of the predictors developed in th e second essay in detecting financial statement fraud. Rather, the third essay takes findings from the broader research streams to which the first two essays contribute, i.e., classification algorit hms and fraud predictors, and examines the efficacy of various artifacts developed within thes e research streams in detecting financial statement fraud. Of course, the algorithm developed in the first essay and the predictors developed in the second essay are part of these res earch streams, and are, thus, included in the examination in the third essay. This might seem lik e a subtle difference but it is important. By not focusing on the artifacts developed in the first tw o essays, the third essay is not tied to these artifacts. This allows me to choose among a larger number of artifacts and does not limit my Figure 1.1: Research Framework Essay III Classification Algorithms and Fraud Predictors in Fraud Detection Fraud PredictorsEssay II Confirmatory Fraud Prediction Classification AlgorithmsMultiClassifier CombinationEssay I Combiner Methods Financial Statement Fraud Detection I n f o r m a t i o n S y s t e m sA c c o u n t i n gDesign ScienceP o s i t i v i s mDesig n Sci enc e PAGE 15 3 evaluation to the efficacy of the artifacts develop ed in the first two essays. Furthermore, this allows Essay I to develop an artifact that contribu tes to combiner method research without being tied to a specific domain. Similarly, by focusing o n the confirmatory fraud research stream, Essay II can make a more theoretical contribution by deve loping artifacts that contribute to our understanding of conditions and incentives related to financial statement fraud, as opposed to focusing on developing artifacts that outperform ex isting predictors. It is also important to note that the contributions of the three essays are prim arily dependent on the results reported in the respective essays. For example, the contribution of the multiclassifier combination combiner method developed in Essay I is determined based on an evaluation in Essay I of the performance of the proposed combiner method to combiner methods developed in prior research. This evaluation does not focus specifically on the fraud domain and instead evaluates the contribution of the combiner method across multiple domains. In addition to my dissertation being crossfunction al, I also use two paradigms to accomplish my research objective of improving financial statem ent fraud detection. Essay II is confirmatory hypotheses testing grounded in positivism, while Es says I and III follow the design science paradigm. I design and evaluate a novel IT artifact in Essay I, and in Essay III, I evaluate the utility of multiple classification algorithm and fr aud predictor artifacts in the financial statement fraud domain. 1.2. Overview of the Three Essays Essay I is titled: Information Market Based Decisio n Fusion. In this essay, I design a novel combiner method based on theoretical and empirical findings in information market research to improve the performance over existing combiner meth ods. Combiner methods are used in multiclassifier combination to improve the classificatio n performance of individual classifiers by combining the decisions of many individual classifi ers, like artificial neural networks (ANN), logistic regression and decision trees (Kittler and Roli 2000). I show through extensive experiments that when the true classes of objects a re only revealed for objects classified as positive, IMF outperforms three benchmark combiner methods, Majority, Average and Weighted Average when the positive ratio is low, and outperf orms Majority and performs on par with Average and Weighted Average, when the positive rat io is high. When the true classes of all objects are revealed, IMF outperforms Weighted Aver age and Majority, and at marginal level of significance, outperforms Average. Essay II is titled: The Effect of Discretionary Acc ruals, Earnings Expectations and Unexpected Productivity on Financial Statement Frau d. The research objective in this essay is to PAGE 16 4 improve our understanding of conditions and incenti ves behind financial statement fraud. I hypothesize that (1) earnings management in prior y ears is positively related to financial statement fraud; (2) firms that meet or exceed anal yst forecasts are more likely to have committed fraud than firms that fail to meet analyst forecast s; and (3) unexpected productivity is positively related to financial statement fraud. I use an arch ival research approach to compare a set of fraud firms, handcollected from SEC enforcement actions, to a set of matched nonfraud firms. The empirical results show support for all three hypoth eses. Essay III is titled: Financial Statement Fraud Dete ction: An Analysis of Statistical and Machine Learning Algorithms. The research objective in this essay is to compare the utility of a fairly comprehensive set of classification algorith ms and fraud predictors in financial statement fraud prediction. With this objective in mind I pos e two specific research questions. (1) What classification algorithm provides the most utility given different assumptions about prior probabilities and costs of false positive and false negative classification errors? (2) What predictors are useful to these algorithms for detec ting financial statement fraud? I find that logistic regression and support vector machines (SV M) perform well relative to C4.5 (a decision tree), MultilayerPerceptron (a backpropagation neur al network), stacking (an ensemble method), bagging (also an ensemble method) and IMF (an ensem ble method combiner method), while stacking and C4.5 consistently perform relatively p oorly, where performance is measured using estimated relative cost. Furthermore, logistic regr ession and SVM provide the best performance under what is believed to be the most relevant prio r probability and relative cost estimates. The results also show that out of 41 variables that hav e been found to be good predictors in prior fraud research, only six variables are selected by three or more classifiers: auditor turnover, total discretionary accruals, Big 4 auditor, accounts rec eivable, meeting or beating analyst forecasts, and unexpected employee productivity. The reminder of the dissertation is organized as fo llows. Chapters 2, 3 and 4 contain the three essays, Essay I, Essay II and Essay III, respective ly. Each essay is written as a stand alone paper, thus the three essays can be read in any order. Cha pter 5 recaps the primary findings in the three essays and concludes with a discussion of how these results fit together. PAGE 17 5 Chapter 2. Information Market Based Decision Fusion 2.1. Introduction In many decisionmaking scenarios, decisions of mul tiple human experts or classifiers are fused to determine the overall decision. Examples i nclude: a group of accounting experts and classifiers making goingconcern decisions and an e nsemble of classifiers in a fraud detection application making decisions on whether a transacti on is fraudulent. Multiclassifier combination (MCC) is a technique that can be used to improve th e classification performance in various classification problems by combining the decisions of multiple individual classifiers (Suen and Lam 2000). In MCC, individual classifiers, commonly referred to as baseclassifiers, classify objects based on inputs consisting of object featur e vectors (see Figure 2.1). These classifications or decisions are then combined using a combiner met hod into a single decision about the objects class label. The basic premise behind MCC is that different clas sifiers in an ensemble have different strengths and weaknesses, and therefore provide com plementary information (referred to as diversity in MCC) about the classification problem. These differences can be leveraged to improve classification performance by combining bas eclassifiers decisions (Kittler et al. 1998). Different combiner methods have been proposed and e xamined in the literature, and can be Figure 2.1: Generic Classifier Combiner Architectu re PAGE 18 6 categorized based on whether they require training data. For example, Naive Bayes, Decision Templates and Weighted Average (WAVG) require train ing data, while Average (AVG), Majority (MAJ) and Product do not require training data. Existing combiner methods that require training data have limitations including the requir ement for training data, and restrictive assumptions such as: 1) constant ensemble baseclas sifier composition; and 2) training data performance being a good proxy for subsequent actua l performance. Experimental results generally indicate that MCC provides performance be nefits, and that the performance of MAJ and AVG methods are comparable or superior to that of m ethods requiring training (Duin and Tax 2000). To improve performance while overcoming these limit ations, I propose an information market based fusion approach for multiclassifier combinat ion that 1) has superior performance, 2) does not require training data, and 3) can adapt to chan ges in ensemble composition and baseclassifier performance. In evaluating the effectiveness of the proposed approach, I compare IMF against three combiner methods, AVG, MAJ and WAVG. These me thods have performed relatively well in prior research (Duin and Tax 2000) and have been used as benchmarks in recent MCC research1. For example, Zheng and Padmanabhan (2007) use AVG which they refer to as Unweighted Average, and a version of WAVG where the weights are variance based, which they refer to as Variance Based Weighting. The experimen tal evaluation was performed using computational experiments with 17 datasets that wer e obtained from the UCI Machine Learning Repository (Newman et al. 1998) and 22 different ba seclassifiers from Weka (Witten and Frank 2005). The rest of the chapter is organized as follows. In Section 2.2, I provide a review of related research. IMF is introduced in Section 2.3 along wi th an overview of information markets. I then present details on the computational experiments an d results in Sections 2.4 and 2.5 respectively. In Section 2.6, I discuss these results and conclud e in Section 2.7 with a review of my contributions and suggestions for future research. 1 In Duin and Tax (2000), AVG is referred to as Mean and MAJ is referred to as Majority; eight addition al combiner methods are evaluated: Bayes Rule (two dif ferent implementations), Nearest Mean, Nearest Neighbor, Maximum, Median, Minimum and Product. Whe n combining the decisions of different baseclassifiers trained using the same feature set, whi ch is comparable to the MCC architecture that I use their results (p. 23) show that Majority and Mean perform on par with or better than the other combiner meth ods. PAGE 19 7 2.2. Related Research A classifier is a model that makes decisions about an objects class membership based on the objects feature set. Examples of classifiers inclu de neural networks, logistic regression, decision trees and Bayesian classifiers (Witten and Frank 20 05). Classifier performance is typically dependent on the problem domain as well as on the c alibration of the classifier. Multiple classifiers are therefore typically tested in order to identify the best classifier for a given proble m domain. However, it is generally difficult to deter mine which classifier(s) will perform well in subsequent classifications. Furthermore, classifica tion for certain cases may even be improved by an inferior classifier (Kittler et al. 1998). Thu s, by combining the decisions of diverse classifiers, it is possible to improve the overall performance. Prior MCC research has primarily focused on one of two areas: (1) training and selection of ensemble baseclassifiers; or (2) combination of ba seclassifier decisions. Methods such as bagging, boosting and stacking fall into the first category (Witten and Frank 2005), while combiner methods such as MAJ, AVG and WAVG fall int o the second category. Recent research within the former stream has used ROC analysis to s elect dominant classifiers (Provost and Fawcett 2001), and Data Envelopment Analysis to sel ect efficient classifiers (Zheng and Padmanabhan 2007) under various cost and class dist ributions, and then combine these classifiers decisions. Zhao and Ram (2004) have in vestigated the appropriate cascading depth in cascade generalization, a variation of stacking, wh ere classifiers are trained sequentially using the original input data and lower level classifiers de cisions. This essay does not focus on classifier selection and training; but on the combiner methods Prior research within the combiner method research stream has found that methods that use mea surement data are typically more accurate than methods that handle unique labels; methods tha t require training data typically outperform methods that do not require training (Jain et al. 2 000), but that MAJ and AVG, which do not require training, perform either at the same level or significantly better than more complex methods Duin and Tax (2000). Another important, but largely overlooked aspect of combiner methods is how well they fit with different system architectures. Software agent s offer a new paradigm to support decisionmaking (Nissen and Sengupta 2006) where humandrive n or autonomous software agents embodying classifiers and other intelligent algorit hms can leverage their individual strengths to make collective decisions. The baseclassifiers, co mbiner method and providers of object features in a MCC can be implemented as software agents in m ultiagent systems. Research in data mining has implemented MCC agent systems for credit card fraud detection (Stolfo et al. 1997) and network intrusion detection (Lee et al. 2000). PAGE 20 8 In MCC multiagent systems that are implemented in dynamic real world settings, the relative performance of baseclassifiers and the ensemble co mposition can change over time as agents are retired, added or temporarily unavailable. Existing combiner methods that require training do not take this into consideration, and assume that the e nsemble composition is static, and that individual classifier performance does not change s ubsequent to training and validation. I next introduce IMF, a combiner method that takes these i ssues into consideration. 2.3. Information Market Based Fusion IMF is theoretically grounded in information market s. More specifically, the IMF aggregation mechanism used in this essay is based on parimutue l betting markets. 2.3.1 Information Markets Information markets are markets specifically design ed for the purpose of information aggregation. Equilibrium prices, derived using conv entional market mechanisms, provide information based on private and public information maintained by the market participants about a specific situation, future event or object of int erest (Hanson 2003). Although the concept of information markets is fairly recent, the underlyin g notion of markets being capable of aggregating information is not new (Hayek 1945), an d the efficient market hypothesis states that all private and public information is reflected in equilibrium prices (Fama 1970). Empirical research has found support for the efficient market hypothesis, and for information aggregation in information markets in general (Berg and Rietz 2003 ), and parimutuel betting markets in particular (Plott et al. 2003). The combiner method presented in this essay is base d on parimutuel betting, which originated in horserace gambling in France in 1865, and since then has become a popular betting mechanism in the horseracing world. Parimutuel means wager mutual and comes from the fact that in parimutuel betting, a winning wager (i.e., bet) receive s a share of the total wagers (winning and loosing bets less track commission) as a proportion of this winning wager to all winning wagers. The final track odd for a given horse is the total amount bet on all the horses in the race divided by the total amount bet on the given horse. The pay out for a winning horse is the product of the amount bet on it and its odd (less track commission ). From a MCC perspective, the odd associated with a horse is of great importance as i t represents the aggregated market information about the probability estimate of that horse winnin g the race. I use parimutuel betting over mechanisms such as continuous double auctions since parimutuel betting does not suffer from liquidity problems that could potentially impact co ntinuous double auction markets when there PAGE 21 9 are large bidask spreads or when bidask queues ar e empty (Pennock 2004). Hence parimutuel mechanisms would work effectively, even when the en semble of base classifiers is small. Plott et al. (2003) experimentally examined informa tion aggregation and different betting behaviors in parimutuel betting markets using two private information models, Decision Theory Private Information (DTPI) and Competitive Equilibr ium Private Information (CEPI), and one model with belief updatingCompetitive Equilibrium Rational Expectations (CERI). Plott et al. (2003) found that DTPI and CEPI best described the behavior of human participants in their Probabilistic Information Condition experimental parimutuel betting market. In DTPI, agents only consider their own private inf ormation and ignore market prices when deciding on their bets and in forming beliefs. In C EPI, agents base their bets on the current market price, although they do not update their bel iefs based on market prices. In both models, agents maximize their conditional expected utility given their private probability estimates and constraints such as available funds. In both DTPI a nd CEPI, prices are assumed to be in equilibrium; however, as each betting round starts without prices defined, the equilibrium must be obtained before the agents can place their final be ts. Assuming no track take, in equilibrium, all potential payouts are equal to the total amount bet across all events. 2.3.2 Information Market Based Fusion IMF is a multiclassifier combiner method based on a parimutuel betting information market that can be used in any classification application domain. I present IMF in the context of a fraud detection application. In this application, object t (i.e., transaction t ) can be classified as fraudulent ( j =1) or nonfraudulent ( j =2) by an ensemble E of agent classifiers. In this application, the set J ={1,2} is the index set of the two classes (i.e., f raudulent and nonfraudulent). The ensemble E has m agents embodying different baseclassifiers (refer red to as agents) represented by indices i in the index set D = {1 ,..,m }. While determining the class membership of object t agent i D uses the feature vector associated with t to determine the posterior probability estimate pitj [0, 1] that t belongs to class j J Agent i bets qitj that object t belongs to class j and is paid according to the parimutuel mechanism based on fou r factors: (1) the agents bets, qitj; (2) the total bets on class j Qtj = D i itjq ;(3) the total bets on all classes, Qt = J jD i itjq; and (4) the true class of object t Ensemble Es overall probability estimate that t belongs to j J is given by 1/ Otj [0,1], where Otj is the odd that t belongs to j J The odd Otj, which is equal to Qt/ Qtj, is in equilibrium when the potential payouts QtjOtj for each j J and the total bets Qt are equal (assuming no house commission), i.e., Otj is in equilibrium when QtjOtj = Qt. PAGE 22 10 Figure 2.2 provides an overview of IMF when the true class of objects is only determined for object s classified as positive. When all objects are invest igated, Figure 2.2 is changed by eliminating the decision b ox, i.e., going straight from Classify Object to Distribute Payout Investigations are however expensive, and in the real world only objects classified as positive are typically investigated. Unless otherwise noted, I w ill henceforth assume that only objects classified as positive are investigated. In Figure 2.2, for each new object t IMF first determines the final odds Of tj that are equilibrium or nearequilibrium odds. Establishing equilibrium odd s is a nontrivial task because of the recursive relation ship between Qtj and Otj, where odds are based on agent bets and agents base their bets on odds. Therefore, mult iple rounds of betting are required to determine the fin al odds that can then be used by agents to make their actual bets. In each round, odds are first up dated based on all the agents prior bets and then agents place new bets based on the current updated odds. After the final odds have been established, ensemble Es overall probability estimate 1/ Of tj is compared to a threshold value Cj to determine if t should be classified as belonging to class j If object t is classified as fraudulent ( j =1) then the true class of t is determined and winnings are distributed to the agents. In addition to establishing ensemble Es probability estimate 1/ Of tj, IMF facilitates the redistribution of wealth among the agents based on the agents bets and winnings. From an MCC perspective, IMF produces decisions that are wealth weighted probability estimates of the occurrence of event j I next describe the components of IMF in detail a s per the major steps depicted in Figure 2.2. 2.3.2.1 Determining Final Odds The problem to determine odds Otj for object t is given by P1. P1: Z1 = J j j M OMj tjmin (1) S.T. QtjOtj Mj = Qt j J (2) Mj 0 and Otj 1 j J (3) New Object Determine Final Odds Update Odds Take Agent Bets Section 3.2.1 Section 3.2.2 Classify Object Section 3.2.3 Distribute Payout j =1 no yes Figure 2.2: IMF Flowchart PAGE 23 11 The objective function Z1 minimizes dummy variables Mj that represent the differences between the total bets by all agents and the total payout for each outcome (2). At equilibri um, M1 and M2 are equal to zero. Due to the recursive relationship between Qtj and Otj, I solve P1 using binary search (see Figure 2.3) to determine e quilibrium or nearequilibrium odds. Binary search starts with a lower bound Pl=0 and an upper bound Pu=1 for the probability that object t belongs to class j =1. Ot 1 is then computed using: Ot 1=2/( Pl+ Pu). It can be easily verified that Ot 2 = Ot 1/( Ot 11) in the case of two class problems. The agents then place bets that maximize their individual utility, given their current wealth and probability estimates, and the current odds (Lemma 1 and Lemma 2 describe the optimal bets). The odds and bets are then used to evaluate whether the current odds are too high or too low. If the potential payout for j =1, i.e., Qt1Ot1, is greater than the total bets Qt, then odd Ot 1 is too high, and the lower search space boundary Pl is raised to the reciprocal of Ot 1, i.e., Pl=1/Ot 1. On the other hand, if the potential payout for j =1 is less than the total bets Qt, then the odd Ot 1 is too low and the upper search space boundary Pu is lowered to the reciprocal of Ot 1, i.e., Pu=1/Ot 1. If the potential payouts for j =1 is the same as the total bets Qt, then the potential payouts for j =1 and j =2 are equal, i.e., Qt1Ot1= Qt2Ot2, the odds are in equilibrium and the search space is set to this single value Pl=Pu=1/Ot1. Ot 1 is then set to the reciprocal of the mean of Pl and Pu and the agents place bets based on these odds. The updating of odd s and agent bets continues iteratively until the search space is within tolerance i.e., PuPl When binary search terminates, it is known that the optimal odds are within bounds 1/ Pu and 1/ Pl. Determining Agent Bets Given the current market odds Otj, the agents probability estimates pitj of object t being in class j the agents current wealth wit plus the periodic endowment m, and multiplier k that determines the house enforced maximum bet km, agent i solves the expected utility maximization problem P2 to determine the amount qitj to bet on classes j =1,2. The periodic endowment m is given to all the agents in order to prevent them from running out of funds. Given the utility function Ui of agent i as a function of wealth, problem P2 can be stated as follows: P2: Z2 = itjq max pit1Ui( wit+ m qit 1 qit 2+ qit1Ot1) + pit2Ui( wit+ m qit 1 qit 2+ qit2Ot2) (4) Set search space bounds Pl = 0 and Pu = 1 set Ot 1 = 2/( Pl + Pu) Take agent bets Do If Qt 1Ot 1 > Qt then set Pl = 1/ Ot 1 else if Qt 1Ot 1 < Qt then set Pu = 1/ Ot 1 else if Qt 1Ot 1 = Qt then set Pl and Pu to 1/ Ot 1 set Ot 1 = 2/( Pl + Pu) Take agent bets Until ( Pu Pl £ ) Figure 2.3: Binary Search PAGE 24 12 S.T. > + £ + + = + km m w km km m w m w q qit it it it it) ( if ) ( if2 1 (5) qitj 0 (6) The objective function in P2 represents the expecte d utility of agent i when it bets qitj 0 on event j Constraint (5) dictates that the total amount of bets placed by agent i on events j =1 and j =2 equals the lower of the agents available funds m + wit and km the house enforced maximum bet. km limits the amount of influence the best performing agents in the ensemble could exert on ensemble decision, due to the need to have all agen ts, not just the best performing agents, contribute to improving the success of the ensemble (Kittler et al. 1998). P2 is general enough to incorporate any utility fun ction to model an agents risk aversion. I utilize a natural logarithm (ln) utility function ( hence forth simply referred to as log utility), which has been widely used in prior research (Rubin stein 1976), for the following reasons: (1) log utility enables agents to place bets that yield opt imal long run growth rates (Kelly 1956); (2) it is twicedifferentiable and nondecreasing concave, le ading to a decreasing absolute risk aversion (Rubinstein 1976); and (3) depending on which betti ng constraint is binding (see Lemma 1 and 2 below), log utility bets are either increasing in pitj and wit + m but not a function of Otj, a betting behavior corresponding to DTPI (Plott et al. 2003), or increasing in pitj, wit + m and Otj, a betting behavior corresponding to CEPI (Plott et al. 2003). Using log utility, problem P2 is transformed to eit her P3 or P4 depending on the binding constraint in (5) for a given agent. If wit + m km then qit 1 + qit 2 = wit + m and wit + m qit 1 qit 2 = 0, leading to P3. P3: Z3 = maxqitj pit1ln ( qit1Ot1) + pit2ln ( qit2Ot2) (7) S.T. m w q qit it it+ = +2 1 (8) qitj 0 (9) Lemma 1 : The optimal bets of agent i in P3 while classify ing t is: itjq= pitj( wit+ m ) j J Proof: See the Appendix 1. If wit + m > km then qit 1 + qit 2 = km and wit + m qit 1 qit 2 = wit + m km which I denote by constant ait. Thus P2 can be transformed to P4. P4: Z4 = maxqitj pit 1ln ( ait + qit 1Ot 1) + pit 2ln ( ait + qit 2Ot 2) (10) S.T. qit 1 + qit 2= km (11) qitj 0 (12) PAGE 25 13 Lemma 2: The optimal bets of agent i in P4 while classifyi ng t is: Solution a: 1 itq= pit1 km + ait 2 1 2 2 1 1 t t t it t itO O O p O p and 2 itq= pit2 km + ait 2 1 1 1 2 2 t t t it t itO O O p O p, when 0 < pit1 km + ait 2 1 2 2 1 1 t t t it t itO O O p O p< km, and 0 < pit2 km + ait 2 1 1 1 2 2 t t t it t itO O O p O p< km; Solution b: 1 itq= km, and 2 itq= 0, when km pit1 km + ait 2 1 2 2 1 1 t t t it t itO O O p O p; and Solution c: 2 itq= km, and 1 itq= 0, when km pit2 km + ait 2 1 1 1 2 2 t t t it t itO O O p O p Proof: See Appendix 2. Equilibrium Odds The final odds Of tj, are equilibrium odds or nearequilibrium odds, wh ere nearequilibrium is defined as being within bounds 1/ Pu and 1/ Pl, and PuPl is less than or equal to tolerance When binary search terminates, it is known that t he optimal odds are within these bounds. As described in binary search, the final od ds Of tj are found for each object t by iteratively updating the odds, and requiring agents to place be ts using these odds until the odds provided to the agents and their subsequent bets result in QtjOf tj Qt, at which time the market closes. The following observations are made. First, bets placed in betting rounds before the final odds have been established are only used for the purpose of u pdating the odds2. Second, if agent bets are discontinuous over Otj then the existence of equilibrium odds cannot be gu aranteed (Carlsson et al. 2001). Lemma 3 show that in IMF, when agents i 1 D bet as per Lemma 1, and agents 2 I make the assumption that agents do not act strat egically by attempting to bluff about their private information, i.e., placing bets that do not maximiz e their utility given the current odds. This assump tion is made to make the utility maximization problem more tractable. In defense of this assumption, the agent s do not know when the market closes, i.e., they never k now if the current odds are the final odds, and str ategic behavior is therefore less likely even if allowed. Furthermore, Plott et al. (2003) found that strateg ic behavior was negligible among their human subjects in the Probabilistic Information Condition experime nt even though the subjects knew that the market would stay open at least until an announced time. PAGE 26 14 b aD i D i2 2, and cD i2 bet as per Lemma 2, solutions a, b and c, respecti vely, then equilibrium exists. However, even when equilibrium odds do exist, IMF may not always find it due to the recursive nature of Otj and Qtj. Binary search used in IMF, nevertheless, guarante es a result that is at most (a tolerance parameter) from the optimal probabili ty. Lemma 3 : Given any combination of betting behaviors as pe r Lemma 1 and Lemma 2, equilibrium exists, and the equilibrium odd for j=1 is: Ot1 = 1 ) ( ) ( ) ( ) (2 1 2 1 2 1 2 2+ + + + + a D D i b D i it it a D D i c D i it itkm m w p km m w p Proof: See Appendix 3. See Appendix 4 for an empiri cal evaluation of IMF when agent bets are discontinuous over Otj and an empirical evaluation of the ability of IMF to find the equilibrium odds when the agents bet as per Lemma 1 and Lemma 2 2.3.2.2 Classifying Objects Once the final odds Of t 1 and Of t 2 are available, the decision rule in (13) can be use d to classify3 object t If (1 /Of t 1 C1) then classify class of t as j =1; else classify class of t as j =2 (13) In (13), if the reciprocal of final odd for j =1 is higher than the threshold C1 then object t is classified as a member of the positive class, i.e., j =1 and agent i s wealth is decreased by the amount of i s final bets: wit = wit j J qitj (14) The true class of t is then investigated, and agent i 's wealth is updated with any potential winnings (see Section 2.3.2.3 below). If the object is classified as a member of the negative class i.e., j =2, then the verification of the object class is no t pursued further as investigations are not typically carried out for negative classifications. In this case, agent wealth is not updated with bets or winnings. 2.3.2.3 Distributing Payout Whenever object t is classified as belonging to the positive class, detailed investigations are necessary to establish the true class of t While final bets are deducted from the agents we alth 3 MCC users might prefer rankings or raw probabiliti es (SaarTsechansky and Provost 2004). In these situations the generated ensemble probability estim ates can be presented directly to the users. PAGE 27 15 immediately, due to the time taken for investigations, there is a time lag corresponding to v elapsed object classifications before winnings can be paid out. This mechanism is similar to sports betting (and other types of futures markets) where bets are collected when bets are placed and winnings are paid out after the game/rac e has been decided. In Figure 2.4, t is the current object being classified, t is the object for which the investigation has just been completed, and v is the number of objects that have been classified since t was classified. Based on the investigation, if t is found to be a positive, then agent i s wealth is updated using (15), else t is negative and agent is wealth is updated usin g (16). This is also followed when the true classes of both positive and negative classificatio ns are investigated. wit = wit + Q Q ) /1 1 (15) wit = wit + Q Q ) /2 2 (16) 2.4. Experimental Setup 2.4.1 BaseClassifiers and Data Using Weka (version 3.3.6), 22 heterogeneous baseclassifiers were created using their default settings (see Table 2.1). The baseclassifiers were trained and evaluated using 10fold cross validation on each of 17 datasets obtained from the UCI Machine Learning Repository (see Table 2.2). Datasets that included more than two classes were modified by either creating multiple subsets with only two classes in each subset Figure 2.4: Payout Distribution Time Lag Table 2.1 BaseClassifiers ADTree MultilayerPerceptron BayesNet NaiveBayes ConjunctiveRule NBTree DecisionStump Nnge DecisionTable OneR Ibk PART J48 RandomForest JRip RBFNetwork KStar Ridor LMT SimpleLogistic LWL SMO v object classifications during which t is investigated time t classified t classified t investigation completed PAGE 28 16 or by combining classes. In order for computational ly complex baseclassifiers to complete the classification using a reasonable amount of resourc es, datasets with a large number of observation and/or attributes were filtered randomly based on r ecords and/or attributes4. With an average dataset size of 5,350 records, a to tal of 2,000,900 baseclassifiers validation decisions (5,350 records 17 datasets 22 baseclassifiers) were generated from the 10fo ld cross validation. These decisions were imported int o Microsoft Access where combiner methods, implemented using Visual Basic, combined the data. Furthermore, since IMF, MAJ, AVG and dynamic WAVG did not require training, I did not us e nfold crossvalidation in the combiner method experiments. Each dataset was combined 96 ti mes as described in Section 2.4.2, for a 4 The number of attributes/instances to delete was d etermined iteratively by first deleting only a few attributes/instances, running the most resource con straining baseclassifier algorithm and then deleti ng more attributed/instances if needed. Attributes/ins tances selected for deletion were determined by ass igning a random number to each attribute/instance using Mi crosoft Excel and then deleting the attributes/inst ances assigned the lowest number. Table 2.2 Datasets BaseClassifier Accuracy Dataset Instances Attributes Positive Rate Ensemble Diversity Min Avg Max Std Adult 32,561 14 24.1% 0.847 75.9% 82.0% 86.0% 3.5% Wisconsin Breast Cancer 699 110 34.5% 0.894 76.8% 94.0% 97.1% 4.2% Contraceptive Choice 1,473 10 57.3% 0.698 60.0% 66.1% 71.0% 2.6% Horse Colic 368 22 37.0% 0.868 78.0% 81.7% 86.1% 2.3% Covertype (class 1 & 2) 10,000 11 72.9% 0.883 78.9% 86.6% 92.1% 3.8% Covertype (class 3 & 4) 10,395 11 6.8% 0.954 93.0% 95.2% 97.5% 1.6% Covertype (class 5 & 6) 10,009 11 66.9% 0.906 89.9% 96.2% 99.7% 3.5% Australian Credit App. 690 15 55.5% 0.857 76.2% 83.7% 85.8% 2.6% German Credit Approval 1,000 20 30.0% 0.778 63.7% 72.0% 75.7% 2.9% Pima Indians Diabetes 768 8 34.9% 0.842 68.8% 73.5% 77.9% 2.5% Thyroid Disease 3,772 5 7.7% 0.994 89.1% 92.7% 93.4% 1.2% Labor 57 16 64.9% 0.487 68.4% 80.0% 93.0% 7.4% Mushrooms 8,124 5 48.2% 0.775 77.0% 91.2% 96.6% 7.4% Sick 3,772 12 6.9% 0.956 93.9% 96.8% 98.4% 1.2% Spambase 4,601 58 39.4% 0.708 78.9% 87.6% 94.2% 5.6% Splicejunction Gene Seq. 3,190 20 51.9% 0.498 53.4 % 62.3% 67.1% 3.7% Waveform 3,345 40 49.4% 0.790 77.6% 86.9% 92.7% 4.5% PAGE 29 17 total of 8,745,888 (96 dataset combinations 5,350 average dataset size 17 datasets) ensemble decisions. 2.4.2 Experimental Design and Factors The primary purpose of the computational experiment s was to compare the effectiveness of IMF against MAJ, AVG and WAVG methods. As such, com biner method is the primary factor of interest. The experiment also included six other in dependent variables, two factors (costtobenefit ratio and number of agents) and four covari ates (dataset positive ratio, dataset size, dataset average baseclassifier accuracy and ensemble diver sity), used to evaluate the sensitivity of the results (see Table 2.3). As only main effects and second order interactions were investigated, and only interactions involving the combiner method factor, a full factor ial design is not needed. Instead, two factorial block designs (4 combiner methods 11 sets of agents, and 4 combiner methods 13 costtobenefit ratios) were used, for a total of 96 treatm ent groups. The costtobenefit ratio factor was held constant at 1:10 in the 4 11 factorial design. The number of agents factor wa s held constant at 10 in the 4 13 factorial design. Net benefit and the covariates were measured for each of the 17 datasets within each of the 96 treatment cells for a total of 1,632 observations (4 11 17 + 4 13 17). Table 2.3 Experimental Variables Variable Function Description Net Benefit DV FN cost avoidance number of TP investigation cost (number of FP + number of TP) Combiner Method Main IV IMF, AVG, WAVG, MAJ Number of Agents Manipulated Moderator 2, 4, 6, 22 agents in the ensemble CosttoBenefit Ratio Manipulated Moderator 1:100; 1:50, 1:25, 1:10, 1:7.5, 1:5, 1:4, 1:3, 1:2, 1:1.5, 1:1, 1.5:1, and 2:1 Dataset Size Measured Moderator Number of dataset records Dataset Average Agent Accuracy Measured Moderator Average dataset accuracy of the baseclassifiers Dataset Positive Ratio Measured Moderator Positive records/total number of records in dataset Ensemble Diversity Measured Moderator Dataset average pairwise diversity measured using Yules Q statistic PAGE 30 18 2.4.2.1 Dependent Measure Performance measures used in MCC combiner method re search include hitrate (TP/(TP+FN)) and accuracy ((TP+TN)/(TP+TN+FP+FN)). However, accu racy and hitrate only provide accurate measures of combiner method effectiveness under one specific scenario when the number of positive and negative instances is the sa me, and the cost of FP and FN is the same. This is rarely true (Provost et al. 1998). More rec ently, Receiver Operating Characteristic (ROC) curves and the associated measure Area Under the RO C curve (AUC) have gained popularity, partially because they show how well algorithms han dle the tradeoff between true positive rate (TP/(TP+FN)) and false positive rate (FP/(FP+TN)), i.e., benefits and costs, without having to define a specific class distribution and cost assum ption. ROC and AUC do not however allow for easy comparisons of combiner methods under specific distribution and cost assumptions that I am interested in. ROC also does not provide a single m easure that allows us to assess the statistical significance and sensitivity of the relative combin er method performance results to various factors such as the number of baseclassifiers in t he ensemble, costtobenefit ratio, and dataset size, average agent accuracy, positive ratio and di versity (Drummond and Holte 2006). Furthermore, ROC curves are created using true posi tive rate and false positive rate, and therefore cannot be used in situations where TN and FN are no t identified, i.e., in domains where negative classifications are not investigated. Another common performance measure misclassificat ion cost, which is the total cost of FP and FN classifications overcomes many of these sh ortcomings (Lin et al. 2003). However, this measure still requires knowing FN and ignores the c osts associated with TP classifications, such as investigation costs. Chan et al. (1999) use Cost Savings (CS), which takes into account costs associated with TP, FP and FN, but still requires k nowing FN. I use a measure very similar to CS5 that I call net benefit (NB). NB, like CS allows u s to overcome the problems described earlier, and in contrast to CS, does not require kn owing FN. NB is calculated as the benefit derived from TP classifications (FN costs avoided) minus costs of investigating positive classifications, see (17). Like ROC curves, NB capt ures the tradeoff between true positive rate 5 It is shown in Appendix 5 that NB is equivalent to CS, given that Transaction Amount and Overhead (Chan et al. 1999) is defined as being equivalent t o FN Cost Avoidance and Investigation Cost, respectively. Also note that my definition of costtobenefit ratio is based on the same idea used in Chan et al.s (1999) rule: only transactions with transacti on amounts > overhead should be investigated. PAGE 31 19 and false positive rate. To maximize NB, the classi fication threshold has to be selected so that it strikes an appropriate balance between net benefit of TP and cost of FP classifications. NB = FN cost avoidance*number of TP investigation cost*(number of FP and TP) (17) The experiments compare the performance of various combiner methods using optimal thresholds for each treatment in order to isolate t he treatment effect from noise introduced by using other mechanisms to determine the threshold. To determine the optimal thresholds, I run the MCC experiment 101 times for each treatment using a different threshold level (0, 0.01, 0.02, 1) for each run. The threshold from the run that ge nerates the highest total net benefit is then labeled as the optimal threshold for that specific treatment. By finding the best threshold for each combiner method, dataset, ensemble and costbenefit ratio combination, the combiner methods are compared at optimal tradeoff levels for that s pecific combination, which I believe is more relevant than comparing the sensitivity, specificit y, hitrate, etc, of the combiner methods at other suboptimal levels. Furthermore, by comparing the c ombiner methods at a number of different costtobenefit ratios the generalizability of the results to different domains that have different costtobenefit ratios is improved. 2.4.2.2 Combiner Method Factor Since the primary objective is to compare the perfo rmance of IMF to existing combiner methods, combiner method is included as a factor th at is manipulated at four levels, IMF, AVG, MAJ and WAVG. IMF is compared to MAJ, AVG and WAVG since prior resear ch indicates that AVG and MAJ perform well compared to other existing combiner methods (Duin and Tax 2000). WAVG is included primarily because of its similarit y to IMF, since IMF generates a wealth weighted average. In MAJ, each baseclassifier cast s a vote on the class for which the baseclassifiers probability estimate is higher than th e classification threshold. The class with the most votes is then selected as the ensembles decision. In AVG the mean of all the baseclassifiers probability estimates is compared to the threshold and the class with a mean probability estimate that is higher than the threshold is selected as th e ensembles decision. In WAVG, different weights are assigned to the different baseclassifi ers probability estimates when averaging these estimates. In order to maintain uniformity while co mparing IMF to WAVG, I implement a dynamic version of WAVG where the weights are updat ed based on positive classifications only. The weights are determined as the ratio of an indiv idual classifiers precision (TP/(TP+FP)) to the total precision of all the classifiers in the ensem ble. I also test two alternative weighting schemes as detailed in Section 2.4.2.4. PAGE 32 20 2.4.2.3 Sensitivity Analysis Number of Agents The number of agents factor is manipulated at 11 le vels: 2, 4, 6, , 22 agents. This manipulation is done since there is ev idence from prior research that the number of agents in an ensemble could impact ensemble classif ication performance (Lam 2000). The agents are randomly selected at each of the treatment leve ls, but the selection process is cumulative in nature. For ensembles consisting of two agents, the two agents are randomly selected from the 22 existing baseclassifiers, for ensembles with four agents, two additional agents are randomly selected from the remaining 20 baseclassifiers and added to the existing ensemble, and so on. To test the sensitivity of the combiner method perform ance to the number of agents, I examine if the relative combiner method performance is moderated b y the number of agents, while holding the costtobenefit ratio constant at 1:10. CosttoBenefit Ratio The benefit derived from TP classifications (FN cos t avoidance minus investigation cost) and the cost of FP classi fications (investigation cost) impact the net benefit provided by any classification effort. As t he costtobenefit ratio is domain specific, I use a wide range of costtobenefit ratios, 13 in total to explore the generalizability of the results: 1:100; 1:50, 1:25, 1:10, 1:7.5, 1:5, 1:4, 1:3, 1:2, 1:1.5, 1:1, 1.5:1, and 2:1. To clarify, the 1:100 ratio indicates that the net benefit of a TP classi fication (cost of fraud minus investigation costs o f detecting a fraud) is 100 times the cost of investi gating a transaction (for example, cost of fraud = $10,100 vs. cost of investigation = $100). Note tha t the range of costtobenefit ratios used assumes that the net benefit of a TP is always posi tive, i.e., the FN cost avoided when making a TP classification is always more than the investiga tion cost. To examine the sensitivity of the combiner method performance to costtobenefit rati o, I investigate whether the relative combiner method performance is moderated by costtobenefit ratio, holding the number of agents constant at 10 agents. Dataset Average Accuracy Average baseclassifier accuracy, measured as the p ercentage of all objects classified correctly for each dataset, is included as a possible interaction term given th e possibility that relative combiner method performan ce could depend on the average accuracy of the baseclassifiers in a given dataset. Thus, this interaction tests if the relative combiner method performance is moderated by the dataset average bas eclassifier accuracy. Dataset Size Dataset size refers to the number of records in the dataset, which varies from 57 to 32,561 records. Dataset size is included as a covariate primarily to examine the impact of size on the relative performance of IMF and WAVG to the other combiner methods. For example, if the data size is very small, the extent of adjus tment of weights in WAVG and redistribution of PAGE 33 21 wealth in IMF is small. I therefore evaluate if the relative combiner method performance is moderated by the dataset size. Dataset Positive Ratio The positive ratio of the dataset refers to the num ber of positive class objects divided by all objects in the dataset. Posi tive ratio is included as a covariate to test if th e relative combiner method performance depends on the dataset positive ratio. Theoretically, a performance difference, if any among combiner metho ds should be evident in datasets with positive ratios in the medium range, but not necess arily in datasets with very low (high) positive ratios where any trivial classifier that always pre dicts the object as negative (positive) does well. Thus, differences in performance among the combiner methods is only expected when the trivial rule is ineffective. The range of dataset positive ratios over which the trivial rule is effective is, furthermore impacted by the costtobenefit ratio l evel. The trivial rule that classifies everything as positive is effective over a wider range of dataset positiv e ratios (i.e., medium and high rather than just high dataset positive ratios) when the co sttobenefit ratio is low (Witten and Frank 2005). Conversely, the trivial rule that classifies everything as negative is effective over a smaller range of dataset positive ratios (i.e., just extrem ely low rather than low positive ratios) when the costtobenefit ratio is low. Considering that the median of the experimental costtobenefit manipulations is close to 1:5, i.e., in the low ran ge, I expect combiner method performance differences for low to medium dataset positive rati os, but not for medium to high dataset positive ratios. I therefore evaluate if the relative combin er method performance is moderated by the dataset positive ratio. Ensemble Diversity Baseclassifier diversity describes the degree to w hich the ensemble baseclassifiers differ in the errors they make. Di versity among the baseclassifiers is incorporated in the experiment by using different l earning algorithms for each baseclassifier. Diversity is measured using Yules Q statistic (Yul e 1900) for each dataset. By measuring diversity I can evaluate if the relative performanc e of combiner methods is impacted by the level of complimentary information provided by the basec lassifiers in the different datasets. 2.4.2.4 Investigating the True Class of All Objects To evaluate the external validity of the result to domains where the true object class is revealed for all objects I perform an experiment wh ere the performances of combiner methods are evaluated using both positive and negative classifi cations. In this experiment I examine a version of WAVG where wealth is updated for both positive a nd negative classifications, as well as aWAVG. In aWAVG, the weights are determined based o n AdaBoost: ln((1error rate)/error rate), where error rate is equal to (FP+FN)/(FP+FN+ TP+TN). PAGE 34 22 2.4.3 Time Lag, IMF Parameters and BaseClassifier CostB enefit Retraining IMF is a multiclassifier combiner method based on a parimutuel betting information market 2.4.3.1 Time Lag and Performance In the main experiment the true class of t is given instantly after t is classified, but in reality, it usually takes some time to determine the true class of t In order to determine the performance impacts of such time lags, I perform an experiment where wealth wit is not updated until v additional objects have been classified. v is manipulated at six different levels: 0%, 1%, 5% 10%, 25% and 50% of the size of the dataset, for each of the 17 datasets, while the main experiment factors are held constant as follows: combiner meth od = IMF; number of agents = 10; and costtobenefit ratio = 1:10. Using these treatments I i nvestigate if the net benefits from 0%IMF (no time lag) and the net benefits from 1%IMF, 5%IMF, 10%IMF, 25%IMF and 50%IMF are significantly different. 2.4.3.2 Selection of IMF Parameters Binary Search Stopping Parameter The tolerance value is used in binary search to determine when to stop the search. To gain a better understanding of how to select an appropriate value for and to investigate if this selection is domain dep endent, I run an experiment where different values of are tested. For a given value of (manipulated at 0.01, 0.001, 0.00000000001), I run IMF on each of the datasets w hile holding other factors constant as follows: number of agents = 10; and costtobenefit ratio = 1:10. I am interested in investigating interactions between and the different dataset characteristics in order to assess whether is domain dependent. Also, if no interactions exist, I am still interested in investigating the direct impact of on net benefit. Maximum Bet Multiplier k To ensure that the ensemble is not completely domin ated by a minority of better performing agents, while at the same time weighing the inputs of better performing agents more heavily, appropriate values of k are required to be used. For a given value of k (manipulated at 1, 2, 5, 10, 25, 50, 75, 100, 125, 150, 200, 250, 350, 500, and 1000), I run IMF on all datasets with the number of agents facto r set at 10 and the costtobenefit ratio set at 1:10. I am also interested in investigating interac tions between k and dataset characteristics to determine whether the choice of k is domain specific. 2.4.3.3 BaseClassifier CostBenefit Retraining The ensemble baseclassifiers in the experiments ar e not trained using cost sensitive learning and they are not retrained for each costtobenefit treatment level. The ensemble results are likely to change if the baseclassifiers are retrained for different costtobenefit ratios. However, since PAGE 35 23 all four combiner methods are tested using the same baseclassifiers, I do not believe that this will systematically bias the relative performance of the combiner methods. Nevertheless, I perform an experiment where the classification performances of various combiner methods are evaluated at five different costtobenefit ratios using an ense mble of five crisp baseclassifiers used in two different modes: costtobenefit ratio retrained or not retrained. The retrained crisp baseclassifiers are obtained by hardening measurement l evel baseclassifiers at optimal thresholds for the different combinations of datasets and costben efit ratios. The baseclassifiers that are not retrained are obtained by hardening the same basec lassifiers using a threshold of 0.5. Holding the number of agents constant at five, I evaluate the e ffect of the interaction between baseclassifier mode and combiner method on combiner method perform ance. 2.5. Results 2.5.1 Relative Combiner Method Performance 2.5.1.1 Overview Table 2.4 provides an overview of the result data o rganized by the three result datasets used in the experiments. Two of the statistical analysis da tasets are based on the 5 11 (combiner method by number of agents) and the 5 13 (combiner method by costtobenefit) factorial d esigns, while the third dataset is obtained by pooling the two st atistical analysis datasets (possible as the interactions are not significant, as discussed in 5 .1.3). I report twotailed pstatistics throughout the chapter. For significance testing I use an alph a of 0.05, and 0.1 for marginal significance. To retain an experimentwise error rate of 0.05, while balancing the risk of type II errors, I use a modified Bonferroni procedure (Jaccard and Wan 1996 ). 2.5.1.2 Combiner Method Main Effect The combiner method main effect is tested using the model shown in (18) and the pooled result set described earlier. Note that for each co mbination of UCI dataset*number of agents and UCI dataset*costtobenefit ratio, the same four co mbiner methods are tested. I therefore block for the dataset, number of agents, costtobenefit ratio, dataset*number of agents and dataset*costtobenefit ratio effects. ln(net benefit) = 0 + 1*combiner method + block (18) The combiner method main effect is significant (p<0 .0001) and the posthoc analysis show that IMF significantly outperforms AVG (p=0.0042), WAVG (p=0.0229) and MAJ(p<0.0001). See Table 2.5 for parameter estimates and standard errors. PAGE 36 24 2.5.1.3 Sensitivity Analysis The sensitivity of the relative performance of the combiner methods to the number of agents in the ensemble and the costtobenefit ratio are resp ectively tested using the model shown in (19) for the 5 11 factorial design, and model (20) for the 5 13 factorial design. Thus each combiner method is tested for all combinations of UCI datase t*number of agents and UCI dataset*costtobenefit ratio in the respective models. I therefore block for these interactions in the respective models. ln(net benefit) = 0 + 1combiner method + 2 number of agents + (19) 3combiner method*number of agents + block ln(net benefit) = 0 + 1combiner method + 2costtobenefit ratio + (20) 3combiner method*costtobenefit ratio + block The combiner method*number of agents (p=0.1407) and combiner method*costtobenefit ratio (p=0.7552) interactions are insignificant. Th is indicates that the performance advantages of IMF over AVG, WAVG and MAJ are not moderated by the number of agents in the ensemble or by the domain dependent costtobenefit ratio. Beca use prior research has not evaluated AVG, MAJ and WAVG under different cost assumptions, I pe rform further inspections of the interaction results using scatter plots and find th at the performance differences among the combiner methods are stable over the different cost tobenefit ratios tested. Thus, the earlier conclusion based on the statistical results is corr oborated. The sensitivity of the combiner method performance result to dataset average agent accuracy, size, positive ratio and ensemble diversity are tes ted using the same blocking factor and result set used for (18): ln(net benefit) = 0 + 1 combiner method + 2 dataset average agent (21) accuracy + 3 dataset size + 4 dataset positive ratio + 5 ensemble diversity + 6 combiner method*dataset Table 2.4 Statistical Analysis Data Combiner Method x Number of Agents Combiner Method x CosttoBenefit Ratio Combiner Method net benefit ln (benefit) net benefit ln (benefit) net benefit ln (benefit) Low 350 2.54 30 1.48 30 1.48 High 71,274 4.85 763,879 5.88 763,879 5.88 Mean 18,748 3.82 31,602 3.50 25,946 3.64 Standard Deviation 24,471 0.68 96,569 1.00 74,323 0 .89 Number of Treatments 44 52 96 N 748 884 1,632 PAGE 37 25 average agent accuracy + 7 combiner method*dataset size + 8combiner method*dataset positive ratio + 9 combiner method*ensemble diversity + block The results do not show that relative combiner meth od performance is sensitive to the dataset size (p=0.7325) or dataset average agent accuracy ( p=0.9803). I do however find that the combiner method*dataset ensemble diversity (p=0.034 2) and combiner method*dataset positive ratio (p<0.0001) interactions are significant. The interaction involving diversity appears to be d riven by MAJ, as MAJ has a significant parameter estimate for the interaction (p=0.0039), while AVG (p=0.6363) and IMF (p=0.3227) are insignificant. This is verified by noting that the combiner method*diversity interaction is insignificant (p=0.7813) when MAJ is excluded from the analysis. When only including MAJ and IMF in the analysis the interaction is significant (p=0.0099). We, therefore, only perform a detailed analysis of IMF vs. MAJ. Based on visual comparison (Figure 2.5) it appeared that IMF outperforms MAJ at all diversity levels, however, the performance differen ce is less at low diversity levels (high Yule Q). However, even at low diversity levels (Q>75) IMF ou tperforms MAJ (p<0.0138). Thus, at all diversity levels IMF outperformsMAJ as per this tes t, and AVG and WAVG as per the insignificant interaction and significant main effe ct. I explore the significant combiner method*positive ratio interaction (p=0.0342) by dividing the datasets into two groups based on the dataset p ositive ratio, a high group with about half the datasets, positive ratio (>40%) and a low group wit h the remaining datasets (<=40%). In each group, a model with the combiner method factor and the blocking variables as in (18) are then tested. IMF significantly outperforms AVG (p=0.0005 ), MAJ (p<0.0001) and WAVG (p=0.0021) in the low group. In the high group, IMF significan tly outperforms MAJ (p=0.0139), but the performance advantage is insignificant with respect to AVG (p=0.3439) and WAVG (p=0.6023). 2.5.1.4 Investigating the True Class of All Objects The results that are obtained when the true classes of both positive and negative classifications are revealed are statistically equivalent to the re sults presented in Sections 2.5.1.2. and 2.5.1.3 with the following exceptions: 1) results are not s ensitive to either diversity (p=0.4599) or positive ratio (p=0.0847), 2) IMF still significant ly outperforms MAJ (p=0.0078) and WAVG (p=0.0767), and the performance advantage over AVG is now only marginally significant (p=0.1264). Note, that the pvalues are twotailed. The results also show that IMF outperforms aWAVG (p<0.0001). Table 2.5 summarizes these result s. PAGE 38 26 2.5.2 Time Lag, IMF Parameters and BaseClassifier CostBenefit Retraining Overview The impact of time lag on net benefit is tested using the model shown in (22) and a statistical analysis dataset derived from holding the number of agents and costtobenefit ratio constant at 10 and 1:10 respectively. Since all UCI datasets are used for all the treatments in the model I block for the dataset effect. ln(net benefit) = 0+ 1v +block (22) The laglevel main effect (p=0.9962) is insignificant, thereby indicating that time lag does not impact IMF performance. Using the model shown in (23), I do not find any ev idence that the value of the binary search stopping parameter within the tested range (0.01, 0.001, 0.00000000 001), impacts the performance of IMF (p=0.5071). The impact of on the performance of IMF is also not domain dependent. More specifically, while blocking for th e dataset effect on net benefit, the *dataset positive ratio (p=0.1248), *dataset size (p=0.1856), *dataset average agent accuracy (p=0.5989) and *dataset diversity (p=0.8897) interactions are insi gnificant. Based on the results, in all experiments is set to the middle value tested = 0.000001 ln(net benefit) = 0 + 1 + 2dataset average agent accuracy + 3dataset (23) positive ratio + 4dataset size + 5dataset diversity + 6 *dataset average agent accuracy + 7 *dataset positive ratio+ 8 *dataset size+ 9 *dataset diversity +block In order to choose appropriate values for the maxim um bet multiplier k and to investigate if the choice of k is domain dependent, I use the model shown in (24) where the block factor is dataset: ln(net benefit) = 0 + 1 k + 2dataset average agent accuracy + 3dataset (24) positive ratio + 4dataset size + 5dataset diversity + 6k *dataset average agent accuracy + 7k *dataset positive ratio + 8 k *dataset size + 9 k *dataset diversity + block Figure 2.5: Combiner Method (MAJ and IMF) x Diversity Interaction 7.5 7.75 8 8.25 log(Net Benefit) 0 2550 75 100 IMF MAJ Yules Q PAGE 39 27 The k *dataset average agent accuracy (p=0.0036) and k *dataset diversity (p=0.0012) interactions are significant, while the k *dataset positive ratio (p=0.1581) and k *dataset size (p=0.1812) interactions are insignificant. Scatter plots with trend lines and the raw data tables for Table 2.5 Summary of Primary Results Effects Results Siga. Net Benefit (only true class of objects classified as positive known) Net BenefitIMF > Net BenefitAVG Net BenefitIMF > Net BenefitWAVG Net BenefitIMF > Net BenefitMAJ p=0.0042 p=0.0229 p<0.0001 Parameter Estimates (standard errors) AVG IMF MAJ 0.00101 (0.00154) 0.00621 (0.00154) 0.00568 (0.00154) Sensitivity Analysis Number of Agents CosttoBenefit Ratio Dataset Size Average Agent Accuracy Not Sensitive Not Sensitive Not Sensitive Not Sensitive p=0.1407 p=0.7552 p=0.7325 p=0.9803 Diversity*Method > without MAJ > without AVG/WAVG Net Benefit: Low Diversity Net Benefit: High Diversity Not Sensitive Sensitive Net BenefitIMF > Net BenefitMAJ Net BenefitIMF > Net BenefitMAJ p=0.7813 p=0.0099 p=0.0138 p<0.0001 Net Benefit: Low Positive Ratios Net Benefit: High Positive Ratios Net BenefitIMF > Net BenefitAVG Net BenefitIMF > Net BenefitWAVG Net BenefitIMF > Net BenefitMAJ Net BenefitIMF > Net BenefitAVG Net BenefitIMF > Net BenefitWAVG Net BenefitIMF > Net BenefitMAJ p=0.0005 p=0.0021 p<0.0001 p=0.3439 p=0.6023 p=0.0139 Parameter Estimates (standard errors) AVG IMF MAJ 0.00210 (0.00180) 0.00799 (0.00180) 0.00509 (0.00180) Net Benefit (true classes of all objects known) Sensitivity Analysis Number of Agents CosttoBenefit Ratio Dataset Size Positive Ratio Average Agent Accuracy Ensemble Diversity Net BenefitIMF > Net BenefitAVG Net BenefitIMF > Net BenefitWAVG Net BenefitIMF > Net BenefitMAJ Net BenefitIMF > Net BenefitAWAVG Not Sensitive Not Sensitive Not Sensitive Not Sensitive Not Sensitive Not Sensitive p=0.1264 p=0.0767 p=0.0078 p<0.0001 p=0.9848 p=0.9944 p=0.8293 p=0.0847 p=0.4030 p=0.4599 a all pvalues are twotailed PAGE 40 28 the standardized log net benefit of the different d atasets at the 15 different k values indicate that the significant interactions are driven by extreme values of k For low k values net benefit decreases as the diversity decreases or the average agent accuracy increases, and vice versa for high k values. However, k =50 consistently provides relatively good results, even when compared to extreme k values at their best performance levels. Furthermo re, k=25 and k=75 also perform well. Results show that when only using k =25, k =50 and k =75, the k *dataset average agent accuracy (p=0.5697) and k *dataset diversity (p=0.9212) interactions are no l onger significant. Based on these results, I set k =50 in all experiments. Using the model shown in (25), the baseclassifier costbenefit retraining experiment results show an insignificant (p=0.2037) baseclassifier mo de and combiner method interaction, blocking for dataset, costtobenefit ratio and dataset*cost tobenefit ratio effects. Thus, as expected I do not find evidence of relative combiner method perfo rmance being moderated by baseclassifier mode6. ln(net benefit) = 0+ 1 combiner method + 2 baseclassifier mode + (25) 3 combiner method*baseclassifier mode + block 2.6. Discussion Overview 2.6.1 Combiner Method Performance Based on the coefficients and standard errors from the main eff ects test (Table 2.5), IMF on average provides a 0.72, 1.19 and 0.55 percentage7 greater impact on Net Benefit than AVG, MAJ and WAVG, respectively. These results are not s ensitive to the number of agents in the ensemble, costtobenefit ratio, dataset size, data set average agent accuracy or ensemble diversity. I however do find that the relationship between com biner method and net benefit is moderated by the dataset positive ratio when assuming that the t rue class of objects is only revealed for objects classified as positive. The results show that IMF o utperforms the other combiner methods at low to medium positive ratios, and that there is no sig nificant difference at medium to high positive 6 Nevertheless, the reader should be aware that some of the crisp baseclassifiers used in the experime nts under extreme costbenefit ratios could reduce the benefits of MCC. The effect of this potential probl em is unknown, but based on the experiment just discussed it does not bias the relative performance results one way or the other. 7 Based on estimation using coefficients and standard errors of: 0.00101 and 0.00154 for AVG, 0.00621 and 0.00154 for IMF, and 0.00568 and 0.00154 for M AJ (Kennedy 1981). PAGE 41 29 ratio levels as theoretically expected. Thus, IMF p erforms well at all positive ratio levels and outperforms the other combiner methods when it matt ers the most, i.e., in skewed datasets with low to medium positive ratios for which, given the tested costbenefit ratios, trivial rules classifying all objects as either positive or negat ive are likely to be ineffective. For low to medium positive ratios, IMF on average has a 1.01%, 1.31% and 0.71%8 greater impact on Net Benefits than AVG, MAJ and WAVG, respectively. To put this into perspective assume a fraud classif ication task where the average cost savings from a fraud detection is $20,000, the average cost of investigation is $500, the positive rate is 1% (a low positive rate), there are 40,000 transact ions per year, and using IMF 50% of the positive instances and 98% of the negative instance s are classified accurately. The benefit from this classification is $4,000,000 ($20,000*200), th e cost is $496,000 (200*500+792*500), and the netbenefit is $3,504,000. In this example, IMF pro vides an additional benefit per year of $35,390, $45,902 and $24,878 over AVG, MAJ and WAVG respectively. However, note that IMF also consumes more resources, an average of 6.2 1 msec of CPU time9 to classify one object as compared to 5.07 msec for WAVG, 1.68 msec for AV G, and 1.69 msec for MAJ. The slightly larger CPU time consumed by IMF is minor given that in most settings IMF classifies on average, 579,415 objects per hour of CPU time on an offtheshelf PC. IMF also outperforms MAJ, WAVG, and aWAVG, and AVG (at a marginal level of significance), when true classes of objects classif ied as positive as well as negative are revealed. These results are robust within a wide range of cos ttobenefit ratios, number of agents in the ensemble, ensemble diversity, and dataset size, pos itive ratio and average baseclassifier accuracy. To understand why IMF has superior performance comp ared to the other combiner methods, we need to understand the workings of IMF. Because of the log utility function, IMF should perform on par with AVG if all the agents have the same amount of funds available for placing 8 Based on estimation using coefficients and standard errors of: 0.002053 and 0.001751 for AVG, 0.0079922 and 0.001751 for IMF, and 0.005092 and 0 .001751 for MAJ (Kennedy 1981). 9 I used GetProcessTimes from Kernel32.lib, which me asures CPU time used, rather than actual time to ru n the algorithms. CPU time excludes time that the pro cess is waiting for other processes to complete. Th e resource consumption experiment is performed on com puters ranging from a desktop computer with a Pentium 4 2.0 GHz processor with 256 MB of RAM to a personal laptop with an AMD Turion 64 X2 Mobile Technology TL56 processor and 2,048 MB of R AM. The computer is held constant within each treatment group. PAGE 42 30 bets in the market, i.e., if the aggregation is not wealth weighted (Wolfers and Zitzewitz 2006). However, as more accurate agents become wealthier, these agents end up influencing the market prices to a greater degree than the less accurate a gents, and as the equilibrium prices represent the aggregated probabilities of the ensemble, the more accurate agents have a greater impact on the ensembles decision than the less accurate agents. Thus the ensemble decision in IMF is a performanceweighted average, which explains why th ere is a difference between IMF and AVG, and also perhaps why IMF outperforms AVG, and to so me extent MAJ, since MAJ is also a nonweighted combiner method with performance similar t o AVG. When comparing IMF to WAVG, we need to examine thre e major differences between IMF and WAVG: 1) WAVG assigns weights solely based on t he precision of the baseclassifiers relative to the precision of the other baseclassif iers. IMF in contrast, places progressively greater weight on better performing agents decisions, as a gents with wealth above what they are allowed to bet hedge their bets to a lesser degree than oth er agents; 2) In IMF, weights are adjusted based on the degree of agents performance as opposed to WAVG where the weights are adjusted solely based on the ratio of an individual classifiers pr ecision to the total precision of all the classifie rs in the ensemble. To clarify, in IMF, an agents wea lth increases (decreases) to a greater degree the more (less) accurate the agent is in each bet a s agent bets are increasing in agent probability estimates. Thus, agents that are correct and more c ertain, receive a higher payout than agents that are correct but less certain, since the bets of mor e certain agents are higher, and vice versa; 3) The weights in IMF, but not in WAVG, are adjusted based on agents relative contribution to the ensemble diversity. In IMF, agents with correct bet s receive a greater payout if the odds are higher for that class, which occurs when the bets a re higher for the other class. 2.6.2 Time Lag, IMF Parameters and BaseClassifier CostB enefit Retraining The results do not show that time lag between objec t classification and object determination impact the performance of IMF within the range test ed (0 to 50% of the records in the dataset). Thus, there is no evidence to suggest that the perf ormance of IMF deteriorates with time lags between object classifications and object true clas s determination. The results also do not indicate that the binary search stopping parameter and maximum bet parameter k should be set to different values for different classification domai ns. In the experiments was held constant at 0.000001 and k was held constant at 50. I also do not find any ev idence that there is a systematic bias in the relative performance of the combiner me thods from not retraining the baseclassifiers for the different costtobenefit ratios. PAGE 43 31 2.6.3 Combiner Method Design Considerations For multiagent system MCC implementations, IMF han dles changes in ensemble composition and baseclassifier performance. The ma rket mechanism used in IMF functions independently of any specific agents that participa te in the market. Furthermore, changes in an agents relative performance impact the agents wea lth, and therefore also the weight given to the agents decisions in the decision fusion process. I MF also provides market participants incentives to truthfully provide their private decisions. Thi s is especially useful in multiagent systems based on competitive agents (Ygge and Akkermans 199 9). 2.7. Conclusions and Future Research Directions In this essay, I present IMF, a new and novel combi ner method based on information markets for multiclassifier combination. I show through ex tensive experimentation that IMF provides additional utility compared to three benchmark comb iner methods AVG, WAVG and MAJ. For future research, the effectiveness of IMF can b e compared to other combiner methods in other multiclassifier combination architectures, s uch as bagging and boosting. Other research extensions include: investigating the performance i mpacts of other types of agent behavior using utility functions such as Constant Absolute Risk Av ersion, Constant Relative Risk Aversion, etc; modeling agents to update their beliefs based on ma rket signals or ensemble consensus; mixing agents with different utility functions; using a co mbination of human and software agent experts. IMF can also be extended for the more general kcla ss classification problem using the parimutuel betting mechanism. Finally, future research can explore the possibility of integrating the costbenefit ratio into IMF itself. PAGE 44 32 Chapter 3. The Effect of Discretionary Accruals, Earnings Expe ctations and Unexpected Productivity on Financial Statement Frau d 3.1. Introduction The Association of Certified Fraud Examiners (ACFE 2006) estimates that occupational fraud totals $652 billion per year in the U.S. Within oc cupational fraud, financial statement fraud (henceforth fraud) has the highest per case cost an d total cost to the defrauded organization, with an estimated total cost of $340 billion per year in the U.S.10 In addition to the direct impact on the defrauded organizations, fraud adversely impacts em ployees, investors and lenders. Fraud also has broader, indirect negative effects on market pa rticipants by undermining the reliability of corporate financial statements, which results in hi gher risk premiums. Despite recent legislation aimed at reducing fraud, fraud remains a prevalent problem and is considered to have remained at about the same level (ACFE 2006) or to have even ri sen lately (Oversight 2005). Accounting professionals are increasingly assuming, through mandates and selfregulation, the responsibility for detecting fraud. Statement on Au diting Standard (SAS) No. 53, did not directly address the auditors responsibility for providing a reasonable assurance that the financial statements are free of material misstatements due t o fraud, but did so indirectly through reference to irregularities (AICPA 1988). However, starting with SAS No. 82, auditing standards refer to fraud directly; auditors should provide reasonable assurance about whether the financial statements are free of material misstatement, wheth er caused by error or fraud (AICPA 1997, AU 110.02). SAS No. 99 reiterates this responsibili ty and further requires that analytical procedures be used specifically for the purpose of identifying risks related to fraud (AICPA 2002). Auditing Standard (AS) 2 (PCAOB 2004) specif ies that managers should design and implement internal controls to address fraud risk ( primarily for fraud prevention and detection), 10 The ACFE (2006) report provides estimates of total fraud cost, mean cost per fraud category and numbe r of cases. To derive the estimate for total cost of financial statement fraud, I assumed that the rela tive difference in mean is similar to the relative diffe rence in median cost among the different occupation al fraud categories. PAGE 45 33 and auditors should evaluate these internal control s. Finally, AS5 (PCAOB 2007) adopts a topdown audit approach and highlights that a fraud ris k assessment should be taken into account when planning and performing the audit of internal control over financial reporting, which in turn impacts the audit. To summarize, the importance of fraud from an audit perspective has shifted from auditing standards only containing implicit re ference to fraud, to fraud being one of the primary considerations of auditing standards. Research that adds to our knowledge about fraud ant ecedents and detection is important to defrauded organizations, their employees, investors lenders and financial markets, in general, as this knowledge can help curb costs associated with fraud and improve market efficiency. This knowledge is also important to auditors when provid ing a reasonable assurance about whether the financial statements are free of material misstatem ents caused by fraud, especially during client selection and continuation judgments, and audit pla nning. My research objective is to improve our understanding of antecedents of fraud, and ther eby improve our ability to detect fraud. More specifically, I address three research questions no t previously examined in the fraud literature: (1) what is the relation between the usage of discretio nary accruals in prior years and fraud; (2) are managers that meet or exceed analyst forecasts more likely to have committed fraud; and (3) are firms with unexpected increases in revenue per empl oyee more likely to have committed fraud? The results of my research confirm that the likelih ood of fraud is significantly higher for firms that meet or exceed analyst forecasts, are constrai ned by prior year earnings management, or have high labor productivity. These findings add to our theoretical understanding of fraud and at the same time make a practical contribution by improvin g our ability to detect fraud. The chapter is organized as follows. A brief defini tion of earnings management, financial statement fraud and earnings manipulation as used i n this study is provided in Section 3.2 along with a review of related fraud research. The resear ch hypotheses are developed in Section 3.3. I present the research design, including descriptions of the sample selection, measures and descriptive statistics, in Section 3.4. The results are reported in Section 3.5 and additional analyses are provided in Section 3.6. Section 3.7 c oncludes the chapter with a discussion of research contributions, limitations and future rese arch opportunities. 3.2. Related Research Healy and Wahlen (1999) state that: earnings manag ement occurs when managers use judgment in financial reporting and in structuring transactions to alter financial reports to either mislead some stakeholders about the underlying econ omic performance of the company or to influence contractual outcomes that depend on repor ted accounting numbers (p. 368). While PAGE 46 34 fraud has the same objective as earnings management i.e., to alter financial reports with the intention of misleading its users, it differs from earnings management in that fraud is outside of generally accepted accounting principles (GAAP), wh ereas, earnings management is within GAAP. While this definition is clear cut, the dist inction in reality is less clear. Rather than defining earnings management and fraud as two disti nct classes, I view earnings management and fraud as being on opposite ends of a continuum, whe re the extremes are represented by earnings alterations that are either within or outside of GA AP. I furthermore use the term earnings manipulation to refer to the entire continuum, i.e. I consider both earnings management and fraud to be subcategories of earnings manipulation. Frau d and earnings management also differ on two other important dimensions: 1) earnings management reverses over time but fraud does not; and 2) there are potential legal costs associated with fraud but not with earnings management. These dimensions will be further discussed in subsequent sections. I now turn to the prior literature. Because of the importance of understanding fraud antecedents and improved fraud detection, a stream of research has focused on developing new predictors that explain and predict fraud. This res earch stream has taken either a confirmatory or exploratory approach. The confirmatory predictor r esearch, the approach followed in this essay, has focused on testing specific fraud hypotheses pr imarily grounded in earnings management and corporate governance literature. The exploratory pr edictor research has taken a large number of variables, for example red flags proposed in SAS No 53 and No. 82, and financial statement ratios, and either mapped these variables to fraud frameworks and/or tested their explanatory power. There has, however, been relatively little a greement in the results from the exploratory research as to what variables are significant predi ctors of fraud. To reduce the risk of obtaining statistically significant findings with low general izability I follow the confirmatory predictor research approach, and propose and evaluate three n ovel fraud predictors. The next section (Section 3.2.1) reviews research e xamining the impact of earnings management on fraud. This research proposes, but on ly partially tests, that the act of earnings management increases the likelihood of subsequent f raud (Beneish 1997, Lee et al. 1999). Section 3.2.2 reviews fraud predictor research that leverag es earnings management hypotheses, more specifically the debt covenant and the bonus plan h ypotheses. This research has examined whether earnings management motivations also provid e incentives for managers to commit fraud (Dechow et al. 1996, Beneish 1999). Finally, Sectio n 3.2.3 describes research that has examined predictors related to the revenue account, which is the most commonly manipulated financial statement account (Beneish 1997). PAGE 47 35 3.2.1 Fraud Motivated by Prior Years Earnings Management Prior fraud research has made the argument that as incomeincreasing accruals at some point reverse (Healy 1985), managers with income increasi ng accruals in prior years either have to deal with the consequences of the accrual reversals or c ommit fraud to offset the reversals (Dechow et al. 1996, Beneish 1997, Beneish 1999, Lee et al. 19 99). Prior year incomeincreasing discretionary accruals might also cause the manager s to run out of ways to manage earnings. When faced with these earnings reversals and decrea sed earnings management flexibility, managers can resort to fraudulent activities to ach ieve objectives that were earlier accomplished by managing earnings. A positive relation is, there fore, expected between prior discretionary accruals and fraud. I name this relation, and hence forth refer to it as the earnings reversals hypothesis. The earnings reversals hypothesis was graphically d epicted in Dechow et al. (1996) (see Figure 3.1). Fraud firms appeared to have greater t otal and discretionary accruals to assets in the three years, t3, t2 and t1, leading up to the first fraud year, t0, than did non fraud firms. The statistical analysis in Dechow et al. (1996), howev er, only examined the relation between total accruals in year t0 and fraud in year t0, rather than in the years prior to t0, as predicted by the earnings reversals hypothesis and indicated graphic ally by Dechow et al. (1996). Dechow et al. (1996) found a significant positive relation betwee n total accruals in year t0 and fraud in year t0. Contrary to this result, Beneish (1997) found a neg ative relation between total accruals in year t0 Figure 3.1: IncomeIncreasing Discretionary Accrual s of Fraud and NonFraud Firms 0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 t3 t2 t1 t0 t1 Non Fraud Firms Fraud Firms Relative to First Fraud Year t0 Discretionary Accruals PAGE 48 36 and fraud in year t0. The likelihood of fraud in year t0 was, however, positively related to a dummy variable measuring whether the firm had posit ive accruals in both year t1 and t0 (Beneish 1997). A more recent paper by Beneish (1999), repor ted a positive relation between total discretionary accruals in year t1 and fraud in year t0. Lee et al. (1999) found a positive and significant relation between fraud and the differen ce between operating accruals summed over a threeyear time span prior to the fraud being disco vered by the SEC. However, the SEC fraud discovery on average lags the first fraud occurrenc e by 28 months (Beneish 1999). Thus, for the average firm, the discretionary accruals measure us ed in Lee et al. (1999) was for total accruals summed over years t1, t0 and t+1. More recently, Dechow et al. (2007) found indicat ions, though not supported based on a statistical test, of accru als reversing subsequent to t0, thus providing further support for the earning reversal hypothesis To summarize, prior fraud research examining the ea rnings reversals hypothesis has primarily used current accruals, but also accruals one year p rior to the first year of the fraud. However, the earnings reversals hypothesis used in these studies does not specify a relation between current incomeincreasing discretionary accruals and fraud. Furthermore, the pressure to commit fraud due to accrual reversals should be higher when the firms have used incomeincreasing accruals to boost income over multiple years rather than just o ne year. The graphical analysis (See Figure 3.1 for a similar analysis based on this studys data) in Dechow et al. (1996) indicates that an appropriate time period to measure incomeincreasin g accruals is three years prior to the first fraud year rather than only one year prior to the f irst fraud year. I extend the fraud literature by validating the previously discussed, but not yet hy pothesized and fully examined, relation between positive discretionary accruals in prior ye ars and fraud. 3.2.2 Fraud and Earnings Management Motivations Given the shared objective of fraud and earnings ma nagement, fraud research has examined whether the same incentives that motivate earnings management also motivate fraud. This research has focused on examining incentives relate d to the debt covenant hypothesis and the bonus plan hypothesis in a fraud context. Beneish (1999) and Dechow et al. (1996) examine fra ud incentives related to the debt covenant hypothesis. In earnings management, the de bt covenant hypothesis predicts that when firms are close to violating debt covenants manager s will use incomeincreasing discretionary accruals to avoid violating the covenants (Dichev a nd Skinner 2002). Beneish (1999) and Dechow et al. (1996) hypothesize a positive relatio n between demand for external financing and fraud, and between incentives related to avoiding d ebt covenant violations and fraud. Demand for PAGE 49 37 external financing is measured in both studies as w hether the difference between cash flow from operations and average capital expenditures to curr ent assets is less than 0.5, and whether securities were issued in the fraud period. Incent ives related to avoiding debt covenant violations are measured in both studies using leverage and act ual instances of technical default. The results of the studies are mixed with one study (Dechow et al. 1996) finding support for the hypothesized relationships and the other (Beneish 1999) finding no support. Beneish (1999), Summers and Sweeney (1998), and Dec how et al. (1996) also examine fraud incentives related to the bonus plan hypothesis. In earnings management, the bonus plan hypothesis predicts that earnings based bonuses pro vide managers with an incentive to manage earnings to increase their bonuses over multiple ye ars. More specifically, if bonuses are (not) increasing in earnings then managers will use incom eincreasing (incomedecreasing) discretionary accruals to increase their current (f uture) bonuses (Healy 1985). In terms of the bonus plan hypothesis, Dechow et al (1996) and Beneish (1999) posit that managers have greater incentives to commit fraud wh en they can benefit from the fraud either through their compensation agreements or through in sider trading. Both Dechow et al. (1996) and Beneish (1999) measure compensation agreement using a dummy variable of whether a bonus plan exists. Beneish (1999) also uses stock appreci ation rights. Neither study finds support for the hypothesis that the existence of a bonus plan incre ases the likelihood of fraud. Beneish (1999), however, does find support for a positive relation between fraud likelihood and whether managers redeem stock appreciation rights. While both Dechow et al. (1996) and Beneish (1999) also examine insider trading incentives, they use different measures for this construct. In Dechow et al. (1996) insider trading, measured as insider sales divided by market value of equity, is not found to be a significant predictor of fraud. Beneish (1999) argues that in addition to in sider sales, insider purchases should be included as fraudsters have incentives to both sell more and purchase less of their companies stock when committing fraud. Beneish (1999), theref ore, uses the difference between insider purchases and sales, and divides this difference by total trading activity. Beneish (1999) also uses the percentage of firm security offerings sold by i nsiders. Unlike Dechow et al. (1996), Beneish (1999) obtains significant results for insider trad ing. Beneish (1999) also finds support for whether managers redeem stock appreciation rights. In a similar study, Summers and Sweeney (1998) examine insider sales and purchases. In addi tion to dollar amounts sold and purchased, Summers and Sweeney (1998) measure the number of sh ares and number of transactions in insider sales and purchases. They find that only th e number of shares sold is a moderately significant predictor of fraud. PAGE 50 38 As shown, prior fraud research focusing on earnings management motivations in a fraud context has examined compensation and debt incentiv es but not fraud incentives related to capital market expectations; specifically, the relation bet ween analyst forecasts and fraud. In earnings management, capital market expectation hypotheses p redict that managers have incentives to manipulate earnings to meet or exceed analyst forec asts when these forecasts would not otherwise have been met or exceeded. These incentives are rel ated to manager performance and compensation, and firm performance in general, whic h are often evaluated based on meeting or exceeding analyst expectations (Burgstahler and Eam es 2006). I extend fraud research by examining fraud incentives related to capital marke t expectations. 3.2.3 Fraud in the Revenue Account Prior fraud literature has identified the revenue a ccount as being the primary target for financial statement fraud (Beneish 1997). Given tha t the revenue account is typically manipulated, unusual revenue levels or changes in r evenue might be indicative of revenue fraud. However, considering that revenue varies from year to year and among firms for reasons other than fraud, straight revenue is a relatively noisy measure of fraud. For example, it is very difficult to disentangle differences in revenue due to fraud from differences in revenue due to the size of the firm and the successfulness of the firm. To det ect revenue fraud, SAS No. 99 highlights the need to analyze and identify unusual relationships involving revenue, for example between revenue and production capacity. Prior research has included sales in various ratios that are not, typically, designed for the purpose of detecting revenue fraud. Nevertheless, t he results from these studies are largely consistent with fraud firms manipulating the revenu e account. For example, sales growth, used as a proxy for firm growth, has been used as a predict or of fraud based on the idea that highgrowth firms have incentives to sustain their high growth levels and that slowgrowth firms have incentives to increase growth (Erickson et al. 2006 ; Brazel et al. 2007). Erickson et al. (2006) found a positive relation between sales growth and fraud. Brazel et al. (2007) examined the relation between performance improvements and fraud in more detail and found a negative relation between sales growth and fraud, and a posi tive relation between sales growth minus growth measured using a nonfinancial measure and f raud. Together these results indicate that firms that increase revenue fraudulently are more l ikely to have abnormally high growth rates, and that poorly performing firms, i.e., firms with low actual growth rates, are more likely to commit fraud. PAGE 51 39 Both Chen and Sennetti (2005) and Fanning and Cogge r (1998) examine the relation between gross profit margin and fraud. Chen and Sennetti (2 005) examine this relation to detect inflated sales, while Fanning and Cogger (1998) argued that it is an indication of deflated cost of goods sold. Both these studies find a positive relation b etween gross profit margin and fraud, providing an indication of revenue fraud or manipulation of c ost of goods sold. Chen and Sennetti (2005) also find that fraud firms have lower ratios of res earch and development expenditures to sales, and sales and marketing expenditures to sales than nonfraud firms. These results seemingly argue that fraud firms are more likely to be financ ially distressed and therefore, less likely to invest in research and development, and sales and m arketing. Alternatively, these relations could indicate revenue manipulation as revenue fraud decr eases both these ratios. While these studies support the conjecture that man agers fraudulently increase sales, both Summers and Sweeney (1998) and Fanning and Cogger ( 1998) examine ratios of sales that do not show evidence of revenue manipulation. Summers and Sweeney (1998) find a positive relation between change in inventory to sales and fraud, whi ch they interpret to be evidence of fraudulent inventory manipulation. Note that a fraudulent incr ease in sales would reduce the ratio of inventory to sales in the fraud year. Fanning and C ogger (1998) examine the ratio of sales to assets with the idea that firms with relatively low sales to asset ratios are in financial distress an d therefore, more likely to commit financial statemen t fraud. As they expect, they find a negative relation between the ratio of sales to assets and f raud. Note that a fraudulent increase in sales would increase sales to assets, if it is assumed th at assets is not changed. I extend this research by developing a productivity based measure that is designed specifically for the purpose of detecting financial statement fr aud. I use a productivity measure because firms use resources, for example assets, to generate reve nue. Thus, some of the noise associated with using revenue as a predictor can be removed by defl ating revenue by the resources used to produce the revenue. Because resources are used to generate sales, the relation between sales and resources should be relatively stable over time com pared to straight sales. Given the identified importance of the revenue account in fraud, the dif ficulty in using straight revenue for fraud detection, the inability of sales to assets (capita l productivity) to detect fraudulent revenue manipulation (Fanning and Cogger 1998) and the appe al of using a productivity measures to detect revenue fraud, I examine the use of a differ ent productivity measure, labor productivity. The rationale for using labor productivity rather t han capital productivity is provided in the discussion leading up to the third hypothesis. PAGE 52 40 3.3. Hypotheses Development 3.3.1 Prior Years Discretionary Accruals and Fraud Managers can use discretionary accruals to transfer earnings between periods but over time discretionary accruals sum to zero (Healy 1985). Th us, incomeincreasing behavior in one period decreases the amount of discretionary accruals that can be used to increase earnings in subsequent periods as the prior incomeincreasing discretionar y accruals eventually reverse (Dechow et al. 1996; Beneish 1997). For example, managers make jud gments about the amount of outstanding accounts receivables that are uncollectible and adj ust allowance for uncollectible accounts based on this judgment by debiting bad debt expense. The manager can manage earnings by deciding to establish the allowance level below the managers a ctual estimate, thereby lowering bad debt expense and increasing earnings. However, assuming that the initial judgment about the correct allowance level was more accurate than the establis hed allowance, the allowance account will not be sufficient and has to be increased at some point to cover actual receivables that could not be collected, thereby increasing future bad debt expen se and decreasing future earnings. When confronted with accrual reversals, managers ca n choose to either face the consequences of net incomedecreasing accruals or fraudulently m anipulate earnings to offset or more than offset the reversals (Beneish 1997). Given that man agers facing accrual reversals can resort to fraudulent activities to achieve similar objectives that were earlier accomplished by managing earnings, I expect a positive relation between prio r discretionary accruals and fraud. This relation was graphically depicted but not tested in Dechow e t al. (1996), where fraud firms appeared to have higher total and discretionary accruals in the three years leading up to the first fraud year than did nonfraud firms (see Figure 3.1). Based on this I posit11 that the pressure of accruals reversal is greater and that earnings management flexibility is reduced the more earnings were managed in prior years. The pressure 11 Note that firms with strong performance are less l ikely to resort to fraudulent activities to offset earnings reversals as their strong performance offsets the r eversals and viceversa for firms with poor perform ance. However, on average, firms facing accrual reversals are more likely to commit fraud than firms that ar e not facing accrual reversals. Although the posited rela tion could be further refined by taking into consid eration firm performance, I do not hypothesize an interacti on between performance and accrual reversals as fir ms that commit fraud also report higher performance. T hat is, while firms with low performance are more likely to commit fraud when faced with accrual reve rsal, firms that commit fraud are also more likely to report better performance. PAGE 53 41 from earnings reversals provides an incentive to ma nipulate earnings and the earnings management inflexibility increases the likelihood t hat fraud, rather than earnings management, is used to manipulate earnings. I define total prior d iscretionary accruals as discretionary accruals summed over three years prior to the first fraud ye ar. In accordance with the earnings reversals hypothesis: H1 : Total prior discretionary accruals is positively re lated to the likelihood of fraud. 3.3.2 Capital Market Expectations and Fraud Firm performance, and consequently market value, is partially determined by the firms ability to meet or exceed analyst expectations. Managers, t herefore, have incentives to manipulate earnings to meet or exceed analyst forecasts when t hese forecasts would not otherwise have been met or exceeded (Burgstahler and Eames 2006). Manag ers can manipulate earnings to meet or exceed analyst forecasts by managing earnings or by committing fraud. When earnings are manipulated using earnings manage ment, managers are likely to manage earnings to just meet analyst forecasts (Burgstahle r and Eames 2006). While there are incremental benefits associated with exceeding forecasts, manag ers prefer to just meet analyst forecasts as the costs of earnings management also increase when for ecasts are exceeded (Burgstahler and Eames 2006). One such cost relates to future earnings bei ng negatively impacted by current earnings management. To be able to meet analyst forecasts in future periods managers are, therefore, likely to manage earnings to meet, rather than exceed anal yst forecasts. While prior research has not examined the relation between analyst forecasts and fraud, Dechow et al. (2007) show that fraud firms have unu sually strong stock price performance prior to committing fraud, and indicate that this may put pressure on the firm to commit fraud to avoid disappointing investors and losing their high stock prices. Additionally, a large number of SEC Accounting and Auditing Enforcement Releases (AAER) provide anecdotal evidence of specific cases where fraud was committed to meet or exceed a nalyst forecasts. Thus, there are reasons to believe that managers may fraudulently manipulate e arnings to meet or exceed analyst forecasts. As in earnings management, both the incremental ben efits from meeting or exceeding analyst forecasts and expected costs associated with fraud are increasing in the magnitude of the fraud. However, earnings manipulated using fraud, as oppos ed to earnings manipulated using earnings management, do not reverse in future periods; there fore, it is difficult to predict whether managers prefer to fraudulently manipulate earnings to meet or to exceed forecasts. Since the exact nature of the utility managers derive from me eting or exceeding analyst forecasts when committing fraud is unknown, I define meeting or ex ceeding analyst forecasts as a dummy PAGE 54 42 variable that equals one if analyst forecasts are m et or exceeded rather than attempting to define a cutoff as is done in earnings management research (Burgstahler and Eames 2006). Based on this discussion I hypothesize: H2: Firms that meet or exceed analyst forecasts are mor e likely to have committed fraud than firms that fail to meet analyst forecasts. 3.3.3 Unexpected Labor Productivity and Fraud The revenue account is the most commonly manipulate d account in fraud (Beneish 1997). Thus, unusual increases in revenue could be an indi cation of fraud. To reduce some of the noise associated with this measure, revenue can be deflat ed by assets (capital productivity). Prior research has found capital productivity to be a sig nificant predictor of fraud (Fanning and Cogger 1998; Kaminski et al. 2004). However, capital productivity is still a somewhat n oisy measure given constant changes in assets that do not directly impact revenue. Further more, and more importantly, given that accounting information systems are doubleentry bas ed, the utility of this measure in detecting fraud is reduced; for example, fictitious revenue w ill increase both the numerator (sales) and the denominator (assets) in capital productivity. The d irection and magnitude of change in capital productivity resulting from revenue fraud depends o n the level of a firms actual capital productivity and profit margins. As an illustration take firm A and firm B that both fraudulently increase sales by $10 million, which in turn increa ses assets by $5 million. Further assume that: (1) both firms have $100 million in assets before m anipulating sales; (2) firm A has premanipulation sales of $50 million; and (3) firm B h as premanipulation sales of $250 million. Under these assumptions, sales to asset increases from 0.5 to 0.57 for firm A and decreases from 2.5 to 2.48 for firm B. Thus, because revenue fraud increases both the numerator and the denominator of capital productivity, the ability of capital productivity to predict revenue manipulations is reduced. In support of this discussion, Fanning and Cogger ( 1998) did not find a positive relation between capital productivity and fraud. They instea d found a negative relation, which was described as showing that firms in financial distre ss are more likely to commit fraud. Thus, it is questionable whether it is possible to use sales to assets as evidence of revenue manipulation. Labor productivity, another form of productivity, i s measured as the amount of output per employee. Like capital productivity, labor producti vity reduces the noise associated with sales by scaling sales by the input that is used to generate the output. However, unlike capital productivity, the denominator in labor productivity is not impact ed by doubleentry systems. Therefore, labor PAGE 55 43 productivity should be a less noisy predictor of re venue fraud. A recent working paper by Brazel et al. (2007) provides additional support for use o f the number of employees as the denominator. This study examines the efficacy of nonfinancial me asures, including the number of employees, in predicting fraud. They argue that nonfinancial m easures that are strongly correlated to actual performance and at the same time relatively difficu lt to manipulate, like number of employees, can be used to assess the reasonableness of perform ance changes. The results in Brazel et al. (2007) show a positive relation between fraud and t he difference between change in revenue and change in the nonfinancial measures. Based on this discussion, I propose that firms that have high unexpected labor productivity are more likely to have committed fraud. I measure unex pected labor productivity as the percentage change in firm labor productivity from year t1 to year t0, minus the percentage change in industry labor productivity from year t1 to year t0, and hypothesize that: H3: Unexpected labor productivity is positively related to the likelihood of fraud. 3.4. Research Design 3.4.1 Variable Construction 3.4.1.1 Total Discretionary Accruals To test H1 a measure of total prior discretionary a ccruals that captures the pressure of earnings reversals and earnings management inflexibility is needed. I define Total Discretionary Accrualsj,t as the total amount of discretionary accruals in t he three years prior to the first fraud year deflated by assets at the beginning of each ye ar: Total Discretionary Accrualsj,t = 1 3 1/tt j,t j,tA DA, (26) where discretionary accruals DAj,t is calculated as the difference between total accr uals TAj,t and estimated accruals, typically referred to as nondis cretionary accruals, tjA D N,: DAj,t 1 ,/tjA= TAj,t 1 ,/tjA tjA D N,1 ,/tjA, (27) where total accruals, TAj,t, is defined as income before extraordinary items ( #18)12 minus cash flow from operations (#308). Nondiscretionary accru als, NDAj,t, for firm j in year t0 is estimated using the extended version of the modified Jones mo del (Jones 1991; Dechow et al. 1995) proposed in Kasznik (1999). To derive NDAj,t the regression parameters in model (28) are 12 Numbers in parentheses refer to the Compustat numb er for the variable identified and is provided firs t time the variable is used in the essay and in footn otes in tables. PAGE 56 44 estimated for firm j using all firms in J where J is the twodigit SIC code industry of j These estimates are then used to calculate estimated NDAj,t for firm j using model (29): + D D + =1 , 1 1 0 1 ,/) ( / /tj tj tj tj tj tjA REC REV A A TAa a (28) 1 , 3 1 , 2/ /D +tj tj tj tjA CFO A PPEa a + D D + =1 , ,1 1 ,0 1 ,/) ( / / tj tj tj J tj J tj tjA REC REV A A A D Na a (29) 1 , ,3 1 , ,2/ / D +tj tj J tj tj JA CFO A PPEa a, where REVj,t is the change in revenue (#12), RECj,t is the change in receivables (#2) and CFOj,t is the change in cash flow from operations of firm j from year t1 to year t0; PPEj,t is firm j s gross property, plant and equipment (#7) at time t0; and all values are deflated by Aj,t1, firm j s assets (#6) at time t1. 3.4.1.2 Forecast Attainment I develop a measure of whether firms meet or exceed analyst forecasts to test H2. I define Forecast Attainmentj,t as a dummy variable that measures whether or not a nalyst forecasts were met or exceeded: < =,0 ) ,0 0 ) ,1 _, , ,tj tj tj tj tjAF EPS AF EPS Attainment Forecast (30) where for firm j EPSj,t is actual earnings per share in year t0, AFj,t is the first one year ahead analyst consensus forecast of earnings per share fo r firm j in year t0 based on mean I/B/E/S earnings forecasts. 3.4.1.3 Unexpected Revenue per Employee To test H3 I develop an unexpected labor productivi ty measure. I define Unexpected Revenue per Employeej,t as the difference in percentage change in revenue per employee between firm j and industry J : Unexpected Revenue per Employeej,t = t J tjRE RE, ,% %D D, (31) where revenue per employee, RE defined as total revenue to total number of emplo yees (#29), is measured for firm j and for firm j s industry J in year t0 and year t1. The percentage difference between revenue per employee in year t0 and t1 is labeled percentage change in revenue per employee: 1 1%= Dt t t tRE RE RE RE (32) PAGE 57 45 3.4.1.4 Control Variables Confirmatory fraud research typically relies on mat ching nonfraud firms to fraud firms based on size and year of fraud, and includes measured va riables, to control for potential omitted variable bias. However, the use of control variable s is not standard. For example, Beneish (1999) and Summers and Sweeney (1998) included additional control variables, while Dechow et al. (1996) did not. Further, the control variables have not been used consistently. Without a theoretical basis or empirical support for the most appropriate set of control variables, I rely on variables that, given my hypotheses, conceptually a re omitted variables. Exploratory fraud research offers a rich set of var iables from which to select controls that could conceptually be omitted variables in my study Fanning and Cogger (1998) investigated the predictive value of 62 potential fraud indicators. Using stepwise logistic regression13 they derived a model with eight significant fraud predictors: pe rcent of outside directors; nonBig 4 auditor; whether CFO changed in the last three years; whethe r LIFO was used; debt to equity; sales to assets; whether accounts receivable was greater tha n 1.1 of last years accounts receivable; and whether gross margin percentage was greater than 1. 1 of last years. Bell and Carcello (2000) matched 305 nonfraud cases to the 77 fraud cases i n Loebbecke et al. (1989) and evaluated the discriminatory power of the indicators used in Loeb becke et al. (1989). All variables were measured using a survey of auditors with questions asking about the existence of risk factors.14 Based on univariate results and testing a number of different logistic regression models, the final model contained five significant risk factors: weak internal control environment, rapid company growth, undue emphasis on meeting earnings projecti ons, management lied or was overly evasive, and whether the company is public. Using u nivariate tests Kaminski et al. (2004) found that five out of 20 financial ratios tested were si gnificant predictors of fraud during the year of fraud and one, two and three years prior to the fra ud: fixed assets to assets, sales to accounts receivable, inventory to current assets, inventory to sales, and sales to assets. However, for these four years, a total of 84 tests were calculated, th ereby, greatly increasing the chances of finding some significant relations by chance alone. Further it is likely that many of these ratios were 13 Fanning and Cogger (1998) also used an artificial neural network and two versions of discriminant analysis in their multivariate analyses. However, s tatistical significance was not reported for the se lected variables for these models. 14 Many of the examined factors cannot be obtained fr om public sources and require actual audits to be conducted, for example, the risk factor indicating whether the internal control environment was weak. PAGE 58 46 correlated, possibly giving rise to multicollineari ty issues. Based on their findings, Kaminski et al. (2004) concluded that the redflag approach onl y provides limited utility in detecting fraud. Chen and Sennetti (2005) examined 17 computer indus try specific fraud predictors selected based on the most common fraud types in the computer indu stry and found significance for eight variables: research and development to sales, gross profit margin, net profit margin, sales and marketing to sales, tax benefits from exercising of employee stock options to operating cash flows, changes in free cash flow, accounts receivab le turnover, and return on assets. In general, these studies find significant predicto rs of fraud that can provide utility in the detection of financial statement fraud. However, 87 % of the tested variables in this literature are insignificant predictors of fraud, and there is ver y little overlap between studies as to the variable s that are identified as significant predictors of fr aud. I, therefore, select variables from one study and supplement these variables with variables that could conceptually be considered omitted variables from this study. I select controls from F anning and Cogger (1998), who compared a relatively comprehensive set of 62 potential predic tors covering a wide number of potential fraud predictor types ranging from corporate governance t o financial ratios. From the eight significant predictors in Fanning and Cogger (1998) I use three variables, CFO Change Auditor and Sales to Assets I also add two controls that were not among the 6 2 variables examined in Fanning and Cogger (1998), Asset Growth and Current Discretionary Accruals CFO Change is a dummy variable that measures whether the CFO has changed during the three years leading up to the first fraud year. Whi le Fanning and Cogger (1998) were expecting a positive relation based on the idea that some CFOs committing fraud will leave their firms to avoid getting caught or are fired because of fraud suspicion, they found a negative relation but do not provide an explanation for this finding. A poss ible explanation for the negative relation is that CFOs committing fraud are less likely to leave as b y leaving they relinquish control over evidence of the fraud and expose themselves to scru tiny by the incoming CFO. I include CFO Change to control for the possibility that both Total Discretionary Accruals and Fraud are related to ineffective corporate governance. Based on the e mpirical results in Fanning and Cogger (1998) I expect a negative relation between CFO Change and Fraud Note that Fanning and Cogger (1998) examined 31 variables related to corporate g overnance and found in multivariate analysis that only CFO Change Auditor and the percentage of insiders on the board were s ignificant predictors of fraud. To reduce the number of contro l variables I chose to include CFO Change to control for corporate governance effectiveness in t he main analysis but not the percentage of insiders on the board. I made this selection after empirically comparing the predictive ability of the two variables: CFO Change (p=0.113) and percentage of outsiders on the board (p=0.267). PAGE 59 47 Auditor is a dummy variable that measures whether the firm s auditor is a Big 4 auditor. Big 4 auditing firms are believed to provide a higher qua lity audit, which in turn is expected to increase the effectiveness of the monitoring function provid ed by the auditors and thereby decrease the likelihood of fraud. Thus, Auditor is expected to be negatively related to Fraud (Fanning and Cogger 1998). Like CFO Change Auditor is included to provide a measure of a corporate governance mechanism that could conceptually explai n the hypothesized relation between Total Discretionary Accruals and Fraud Sales to Assets is the ratio of sales to assets (capital productivity). Sales to Assets is expected to be negatively related to fraud given that low Sales to Assets is an indicator of financial distress (Fanning and Cogger 1998). I include Sales to Assets to examine my argument in H3 that Unexpected Revenue per Employee is a better predictor of revenue fraud than Sales to Assets The inclusion of Sales to Assets also allows me to examine whether Sales to Assets and Unexpected Revenue per Employee capture different dimensions of productivity that can lead to fraud Sales to Assets capturing low productivity and financial distress that drives fraud, and Unexpected Revenue per Employee capturing productivity that is artificially high a s a result of revenue fraud (H3). I add two additional variables, Current Discretionary Accruals and Asset Growth Current Discretionary Accruals is the discretionary accruals in the first fraud ye ar, t0, calculated using the extended version of the modified Jones model (Jone s 1991; Dechow et al. 1995) proposed in Kasznik (1999), see (27), (28) and (29) in section 3.4.1.1. As an indication of management attitude towards fraud, I expect Current Discretionary Accruals to be positively related to fraud. Attitude (henceforth management character) is diffi cult to measure and as in prior fraud research, I have to assume that management character is not a n omitted variable. However, Current Discretionary Accruals might proxy for management character given that ma nagement character is positively related to managements use of discre tionary accruals. This argument is based on the assumption that a managers attitude towards earnin gs management is an indication of the managers attitude towards fraud. I include this m anagement character proxy to control for the possibility that management character, i.e., poor s et of ethical values, explains both Total Discretionary Accruals and Fraud Asset Growth is a dummy variable that measures whether assets e xceed 110% of the previous years assets. Asset Growth is expected to be positively related to Fraud given that small growth firms are more likely to be investigated by the SEC (Beneish 1999) than larger slower growing firms. I include Asset Growth to control for the possibility that Asset Growth explains the positive PAGE 60 48 relations between Unexpected Revenue per Employee and Fraud or Forecast Attainment and Fraud Firm age and firm size are controlled in the matc hing procedure employed in my study. 3.4.2 Model for Hypotheses Testing Model 33 is used to evaluate the hypotheses. More s pecifically, H1, H2 and H3 predict that 1, 2 and 3, respectively, are positive and significant. Fraud = 0 + 1Total Discretionary Accruals + 2Forecast Attainment + (33) 3Unexpected Revenue per Employee + ncontrol variables + where Fraud is a dependent dichotomous variable, equal to 1 if the firm was investigated by the SEC for fraud and otherwise 0, Total Discretionary Accruals is the total discretionary accruals in years t1, t2 and t3, Forecast Attainment is a dummy variable, equal to 1 if analyst forecasts were met or exceeded and 0 otherwise, and Unexpected Revenue per Employee is the difference between a firm and its industry in the p ercentage change in revenue per employee from year t1 to t0. Five controls are used: Current Discretionary Accruals Sales to Assets Auditor CFO Change and Asset Growth Please refer to the previous section (Section 3.4 .1.4) for descriptions of the control variables. 3.4.3 Data Sample 3.4.3.1 Experimental Sample The fraudulent observations were located based on f irms investigated by the SEC for fraud and reported in AAER from the 4th quarter of 1999 through 2005. I searched AAERs for explicit reference to section 10(b) and rule 10b5, or descr iptions of fraud. From this search a total of 745 potential observations were obtained (see Table 3.1 ). This initial selection was then reduced by eliminating: duplicates, financial companies, firms without the first fraud year specified in the SEC release, nonannual fraud, foreign corporations AAERs focusing on auditors, notforprofit organizations, and fraud related to registration st atements, 10KSB or IPO. Financial companies were, as is typically done, excluded from the sampl e as the rules and regulations governing financial firms are substantially different from ot her firms. An additional 75 fraud firms15 from Beasley (1996) were added to the remaining 139 frau d firms, for a total of 214 fraud firms. Finally, 160 firms with missing data in Compustat f or the fraud year or four prior years, Compact 15 These fraud firms were kindly provided by Mark Bea sley. Beasley (1996) collected the data from 348 AAERs released between 1982 and 1991 (67 of the 75 fraud firms) and from the Wall Street Journal Index caption of CrimeWhite Collar Crime between 1980 and 1991 (8 additional fraud firms). PAGE 61 49 D/SEC for the fraud year or three prior years, or I /B/E/S for the fraud year, were deleted for a total of 54 useable fraud firms. Seventyfour of th e 75 companies provided by Beasley (1996) were part of the 160 deleted fraud observations. Th e fraud year for these 75 companies ranged from 1978 to 1990. Governance, analyst forecasts, o r financial statement data were missing for these firms. The governance data, gathered from Com pact D/SEC, were only available from 1988 and forward, and analyst forecast data, obtained fr om I/B/E/S, were only available from 1980 and forward, and were relatively sparse until 1995. Not e that both governance and financial statement data were needed for the three years prior to the f irst fraud year. Table 3.2 shows the industry distribution of the fr aud firms by onedigit SIC groups. Manufacturing is the largest group, making up 35.19 % of the sample, followed by Personal and Business Services (24.07%) and Wholesale and Retail (16.67%). This industry distribution is similar to distributions of prior fraud research (B eneish 1997). The remaining 54 fraud firms were then matched with 54 nonfraud firms based on twodigit SIC code, firm age group and firm size, as measured by total assets in year t0. Three age groups, over 10 years, five through 10 years, and four year s were created so that a number of firms would be available for selection when matching on size. N ote that the smallest firm age is four as Table 3.1 Sample Selection Panel A: Fraud Firms Firms investigated by the SEC for fraudulent finan cial reporting from 4Q 1998 through 3Q 2005 745 Less: Financial companies (35) Less: Not annual (10K) fraud (116) Less: Foreign companies (9) Less: Notforprofit organizations (10) Less: Registration, 10KSB and IPO related fraud (78) Less: Fraud year missing (13) Less: Duplicates (287) Remaining Fraud Observations 197 Add: Fraud firms from Beasley (1996) 75 Less: Not in Compustat or CompactD for first frau d year or four prior years or I/B/E/S for first fraud year (218) Usable Fraud Observations 54 Panel B: NonFraud Firms Firms in the same SIC industry as fraud firm in th e year the fraud was committed (firms included in count once for each year matched to one or more fraud firms) 12,423 Less: Firms with missing data in fraud year or in four years prior to the fraud (2,705) Less: Firms not most similar in age and size to t he fraud firms (9,664) Usable NonFraud Observations 54 PAGE 62 50 Compustat and Compact/D data were required for the fraud year and the three years leading up to the first fraud year. The matching was based on fir m age before firm size based on Beneishs (1999) finding that matches based on age reduce the potential for omitted variable problems. The SEC typically targets young growth firms for invest igation. Thus, an omitted variable problem can be introduced when such a firm is compared to o ther firms of similar size that are not young growth firms (Beneish 1999). For example, a young g rowth firm could have both high Unexpected Revenue per Employee and increased fraud likelihood. By matching based on age and size, Beneish (1999) found that differences in term s of age, growth and ownership structure between fraud and nonfraud firms were better contr olled than when matched on only size, while both types of matches controlled for size, liquidit y, leverage, profitability and cash flows. Because young firms are more likely to be growth fi rms the pairwise matching should, at least partially, control for growth in addition to age (B eneish 1999). In addition to matching, I also include Asset Growth to more directly control for growth as not all hig h (low) growth firms are young (old). For the 54 matched pairs, financial statement data for the first year of the fraud and each of the four years leading up to the first fraud year, were collected from Compustat. Oneyearahead analyst earnings per share forecasts and actual ear nings per share in the fraud year were collected from I/B/E/S and matched to financial statement dat a collected from Compustat. Finally, CFO Change and percentage of outsiders on the board, collecte d for use in sensitive analysis, were collected from Compact D/SEC and manually from prox y statements. 3.4.3.2 Comparing Treatment and Control Samples Table 3.3 contains descriptive statistics for the t wo samples. There was no statistical difference between fraud and nonfraud firms for me dian Age (p=0.347) or Assets (p=0.702). Table 3.2 Industry Distribution of Fraud Firma 2digit SIC Industry Description Number of Firms % 1019 Mining and Construction 0 0.00% 2029 Commodity Production 6 11.11% 3039 Manufacturing 19 35.19% 4049 Transportation and Utilities 2 3.70% 5059 Wholesale and Retail 9 16.67% 6069 Financial Services (excl. 6063) 0 0.00% 7079 Personal and Business Services 13 24.07% 8089 Health and Other Services 4 7.41% 99 Nonclassifiable Establishments 1 1.85% 54 100.00% a Table adapted from Beneish (1997), industry names a re from the Standard Industrial Classification Manual (1987) PAGE 63 51 Fraud firms were, however, more likely have high as set growth; 61% of the fraud firms versus 46% of nonfraud firms had high asset growth (p=0.0 62). Thus, the matching procedure effectively matched fraud firms with similar nonfr aud firms in terms of firm age and size. However, the matching procedure was not as effectiv e at eliminating differences in growth, and I, therefore, include the variable Asset Growth to control for any possible difference in growth between fraud and nonfraud firms. There was no statistical difference16 between fraud and nonfraud firms for median Total Discretionary Accruals (p=0.125), Unexpected Revenue per Employee (p=0.125), Current Discretionary Accruals (p=0.222), Sales to Assets (p=0.222), Auditor (p=1.000) and CFO Change (p=0.110). Forecast Attainment was significant (p=0.010), thus fraud firms were m ore likely than nonfraud firms to meet or exceed analy st forecasts. Fiftytwo percent of the fraud firms had earnings equal to or greater than consens us forecasts as opposed to 30% of nonfraud firms. These univariate results provide initial sup port for H2, Forecast Attainment (p=0.010), but not for H1, Total Discretionary Accruals (p=0.125), and H3, Unexpected Revenue per Employee (p=0.125). The correlation matrix in Table 3.4 shows positive significant (alpha<0.05) correlations between Fraud and three independent variables: Total Discretionary Accruals (r=0.17), Forecast Attainment (r=0.23) and Unexpected Revenue per Employee (r=0.16); and marginally significant relation between Asset Growth and Fraud (r=0.15). Firms are seemingly more likely to have committed fraud if they have high Total Discretionary Accruals meet or exceed analyst forecasts, have high Unexpected Revenue per Employee or have high Asset Growth 3.5. Results 3.5.1 Hypotheses Testing The dependent variable, whether a firm has committe d fraud, is dichotomous; therefore logistic regression is used to evaluate the model. The primary assumptions for logistic regression are as follows. (1) Binomial Distribution the dep endent variable must follow a binomial distribution. As there are only two potential outco mes, fraud and not fraud, this assumption is satisfied. (2) Bernoulli Distribution the depende nt variable classes must be mutually exclusive. This assumption is satisfied, as financial statemen ts are either fraudulent or nonfraudulent, and 16 Onetailed tests reported for estimates in the dir ection predicted, all other twotailed, unless note d otherwise. PAGE 64 52 Table 3.3 Sample Descriptive Statistics for Study Variables Fraud Observations (n=54) NonFraud Observations (n =54) Diff Variablesb Mean Std Min Median Max Mean Std Min Median Max pstata Total Discretionary Accruals 0.15 0.51 1.25 0.07 2 .65 0.02 0.23 0.58 0.03 0.50 0.125 Forecast Attainment 0.52 0.50 0.00 1.00 1.00 0.30 0 .46 0.00 0.00 1.00 0.010 Unexpected Revenue per Employee 0.04 0.38 1.12 0.0 0 1.29 0.07 0.26 0.69 0.02 1.07 0.125 Current Discretionary Accruals 0.00 0.20 1.03 0.01 0.67 0.00 0.13 0.32 0.00 0.53 0.222 Sales to Assets 1.16 0.64 0.09 1.09 3.42 1.24 0.76 0.30 1.16 4.13 0.222 Auditor 0.96 0.19 0.00 1.00 1.00 0.96 0.19 0.00 1.0 0 1.00 1.000 Asset Growth 0.61 0.49 0.00 1.00 1.00 0.46 0.50 0.0 0 0.00 1.00 0.062 CFO Change 0.15 0.36 0.00 0.00 1.00 0.07 0.26 0.00 0.00 1.00 0.110 Firm Age 15.3 10.1 4.00 13.0 33.0 11.1 5.76 4.00 11 .5 22.0 0.347 Assets 3254 6993 21.8 386 33381 2595 5802 25.83 361 31749 0.702 a Median 2 pairwise comparison between fraud and nonfraud s ample for continuous variables, Pearson 2 for dichotomous variables. Onetailed tests repor ted for estimates in the direction predicted, all other twotailed. b Total Discretionary Accrualsj,t, is the total amount of discretionary accruals def lated by assets in the beginning of the year in the three years leading up to the fraud year. Discretionary accruals in year t0 is estimated using the extended version of the mod ified Jones model (Jones 1991; Dechow et al. 1995; Kasznik 1999). Discretionary accruals DAj,t is calculated as estimated nondiscretionary accrua ls minus total accruals. Total accruals is income b efore extraordinary items (#18) minus cash flow from operations (#308). To ob tain nondiscretionary accruals, NDAj,t, for firm j in year t0 regression parameters are first estimated in cross section for all firms in the same major industry gr oup J (twodigit sic): tj tj tj tj tj tjCFO PPE REC REV A TA, 3 2 , 1 1 0 ,) ( / D + + D D + =a a a a. These parameter estimates are then used to derive estimated nondisc retionary accruals: tj j tj j tj tj j j tjCFO PPE REC REV A D N, ,3 ,2 , ,1 ,0 , ) ( D + + D D + =a a a a, where REVj,t is the change in revenue (#12), RECj,t is the change in receivables (#2) and CFOj,t is the change in cash flow from operations from ti me t1 to t0; and PPEj,t is gross property, plant and equipment (#8) at time t0. All values are deflated by A j,t1, firm j s assets (#6) at time t1. Forecast Attainment is a dummy variable, equal to 1 if analyst forecast were met or exceeded and 0 otherwise (I/B/E/S). Unexpected Revenue per Employee for firm j in industry J is the difference between the % change in revenue per employee, RE =total sales (#12) divided by the number of employe es (#29), of j and the % change in revenue per employee of J : Unexpected Revenue per Employee = ( REjt REjt 1)/ REjt 1 ( REJt REJt 1)/ REJt 1. Current Discretionary Accruals is the discretionary accruals in year t0, see definition in Total Discretionary Accruals. Sales to Assets = net sales / assets. Auditor is a dummy variable equal to 1 if auditor was a Bi g 4 audit firm (#149) and 0 otherwise. Asset Growth is a dummy variable equal to 1 if total assets exce eds 110% of the previous years value and 0 otherwi se. CFO Change is a dummy variable equal to 1 if CFO has changed in the three years leading up to the first fraud year and 0 oth erwise. Firm Age is the number of years between t0 and the first year data are reported for the company in Com pustat. Assets is total assets of firm j. PAGE 65 53 Table 3.4 Pearson and Spearman Correlations a for Study Variablesb Fraud Total Discretionary Accruals Forecast Attainment Unexpected Revenue Per Employee Current Discretionary Accruals Sales to Assets Auditor Asset Growth CFO Change Fraud 0.14 0.23 0.15 0.04 0.06 0.00 0.15 0.12 1.00 (0.14) (0.02) (0.11) (0.69) (0.57) (1.00) (0.13) (0 .23) 0.17 0.05 0.02 0.28 0.01 0.12 0.08 0.07 Total Discretionary Accruals (0.08) 1.00 (0.59) (0.82) (0.00) (0.90) (0.23) (0.43) (0.46) Forecast Attainment 0.23 0.02 0.11 0.06 0.09 0.06 0.09 0.05 (0.02) (0.86) 1.00 (0.25) (0.55) (0.35) (0.52) (0.36) (0.58) 0.16 0.01 0.08 0.05 0.13 0.11 0.03 0.06 Unexpected Revenue per Employee (0.10) (0.93) (0.43) 1.00 (0.64) (0.17) (0.24) (0.76) (0.55) 0.01 0.34 0.02 0.01 0.08 0.01 0.13 0.02 Current Discretionary Accruals (0.90) (0.00) (0.82) (0.92) 1.00 (0.40) (0.92) (0.18) (0.82) Sales to Assets 0.06 0.06 0.04 0.05 0.03 0.03 0.09 0.03 (0.54) (0.52) (0.68) (0.64) (0.75) 1.00 (0.77) (0.38) (0.77) Auditor 0.00 0.04 0.06 0.12 0.01 0.03 0.11 0.24 (1.00) (0.72) (0.52) (0.21) (0.96) (0.75) 1.00 (0.25) (0.01) Asset Growth 0.15 0.03 0.09 0.09 0.10 0.02 0.11 0.09 (0.13) (0.79) (0.36) (0.38) (0.32) (0.84) (0.25) 1.00 (0.38) CFO Change 0.12 0.13 0.05 0.07 0.02 0.01 0.24 0.09 (0.23) (0.19) (0.58) (0.49) (0.81) (0.94) (0.01) (0 .38) 1.00 a Pearson correlations are below and Spearman correla tions are above the diagonal. Twotailed pvalues r eported within parentheses. b Please refer to footnotes in Table 3.3 for variable definitions. PAGE 66 54 cannot be both fraudulent and nonfraudulent. (3) I ndependent Observations the observations are independent as the order of the observations in the sample is irrelevant, i.e., I do not have timeseries data, and there is only one observation per firm. The model estimates in logistic regression can be s ensitive to outliers and multicollinearity. Although the firms were matched based on major indu stry, age and size, the descriptive statistics indicate the possibility of outliers in the sample, for example the median Assets for fraud (nonfraud) firms is $386 ($361) as compared to a mean o f $3,254 ($2,595), with a minimum of $22 ($26) and a maximum of $33,381 ($31,749). To evaluate the impact of potential outliers I used Pearson residuals. One observation had Pearson residuals above 2. For this observation all continuous measures were truncated at a plus minus two standard deviations.17 I did not find any evidence of multicollinearity; the highest Variance Inflation Factor (VIF) was 1.11, which is relatively low, and the highest condition index among the continuous variables was 6.00. Based on t hese findings I did not discard any variables due to multicollinearity. The results in Table 3.5 show that: (1) Total Discretionary Accruals is positively related to Fraud (p=0.009); (2) Forecast Attainment is positively related to Fraud (p=0.004); and (3) Unexpected Revenue per Employee is positively related to Fraud (p=0.016). The positive relation between total prior discretionary accruals and fraud, supports H1, which states that total prior discretionary accruals is positively related to the likelihood of Fraud (p=0.009) H2, hypothesizing that firms that meet or exceed analyst forecasts are more likely to have committed fraud than firms that fail to meet analyst forecast s, is supported by the positive relation between Forecast Attainment and Fraud (p=0.004). Finally, the results, showing a positive relation between Unexpected Revenue per Employee and Fraud (p=0.016), provide support for H3, stating that unexpected labor productivity is positively re lated to the likelihood of fraud. Thus, H1, H2 and H3 are supported. The logit estimates for Total Discretionary Accruals Forecast Attainment and Unexpected Revenue per Employee are 1.641, 0.573 and 1.445, respectively. Thus, as Total Discretionary 17 I also examined the hypotheses after: (1) deleting the outlier from the sample; (2) deleting the outl ier and the outliers matched nonfraud firm from the sampl e; and (3) including the outlier in the model witho ut truncating it. The results obtained from these thre e sensitivity analyses were equivalent to the repor ted results. PAGE 67 55 Accruals increases by one unit, the odds of fraud increase by a factor of 5.1618, holding the other variables constant. The odds of fraud are higher by a factor of 1.77 for companies that meet or exceed analyst forecasts than for companies that do not meet or exceed analyst forecasts, holding the other variables constant. As Unexpected Revenue per Employee increases by one unit, the odds of fraud increase by a factor of 4.24, holding the other variables constant. In terms of the control variables, the positive relation between CFO Change and Fraud (p=0.031) indicate that firms that have had CFO turnover in the three years leading up to t he first fraud year are more likely to have committed fraud than firms that have had the same CFO during this period. Asset Growth was also positively related to Fraud (p=0.047), which indicates that high growth firms are more likely to have committed fraud than firms that are not high growth firms. Sales to Assets (p=0.148), Current Discretionary Accruals (p=0.560) and Auditor (p=0.759) were insignificant. It 18 Odds ratios of fraud are calculated by applying th e exponential function to the logit estimates, i.e. the base of the natural logarithm is raised by the logi t estimates. For example, 5.16= e ^1.641, where e 2.718. Table 3.5 The Effect of Total Discretionary Accruals, Forecas t Attainment and Unexpected Revenue per Employee on Financial Statement Fraud L ikelihood Logistic Regression Resultsa Variableb Prediction Estimate Standard Error 2 prob> 2 Intercept (?) 0.065 0.746 0.01 0.931 Tests for Hypotheses 1, 2 and 3 Total Discretionary Accruals (+) 1.641 0.789 5.66 0 .009 Forecast Attainment (+) 0.573 0.225 6.86 0.004 Unexpected Revenue per Employee (+) 1.445 0.713 4.5 5 0.016 Control Variables Current Discretionary Accruals (+) 0.875 1.554 0.3 4 0.560 Sales to Assets () 0.327 0.319 1.09 0.148 Auditor () 0.192 0.629 0.09 0.759 Asset Growth (+) 0.359 0.217 2.80 0.047 CFO Change (+) 1.304 0.726 3.49 0.031 Pseudo R2 0.135 2test of model fit 20.15 (p=0.010) n 108 a Effect Likelihood Ratio Tests, onetailed tests re ported for estimates in the direction of the prediction, all other twotailed. b Dependent variable is financial statement fraud li kelihood; please refer to footnotes in Table 3.3 for variable definitions. PAGE 68 56 is interesting to note that Total Discretionary Accruals is positive and significant and that Current Discretionary Accruals is insignificant. This indicates, tentatively, tha t the relation between Total Discretionary Accruals and fraud is not driven by management character, i .e., poor set of ethical values, explaining both earnings management and cur rent year fraud. While Sales to Assets is insignificant (p=0.148) the direction is negati ve as expected. When removing Sales to Assets from the model, Unexpected Revenue per Employee remains positive and significant (p=0.019). When removing Unexpected Revenue per Employee from the model, Sales to Assets remains negative an insignificant (p=0.186). Thus, it appears, as discussed earlier, that Unexpected Revenue per Employee and Sales to Assets capture different aspects of productivity. The positive and significant result f or Unexpected Revenue per Employee indicates that Unexpected Revenue per Employee captures fraudulent revenue manipulation. While th e negative, though insignificant, result for Sales to Assets provides some tentative indication of Sales to Assets capturing the relatively poor actual productivity of fraud firms, which puts pressure on these firms to commit fraud. 3.6. Additional Analyses I next examine the sensitivity of the reported resu lts (Section 3.6.1) and the appropriateness of variable design choices (Section 3.6.2). The sensit ivity of the relation between Total Discretionary Accruals and Fraud to the use of a different discretionary accruals m easure and to the inclusion of two real activities manipulation m easures is examined in Section 3.6.1.1 and Section 3.6.1.2, respectively. Section 3.6.1.3 eval uates the sensitive of the results to the inclusion of additional control variables, while Section 3.6. 1.4 evaluates the sensitivity of the results to the exclusion of industries. The labor productivity measures used in Unexpected Revenue per Employee is compared to two similar measures that have been proposed in con current research in Section 3.6.2.1. Section 3.6.2.2 compares the aggregation period used to cal culate Total Discretionary Accruals to two shorter periods. Section 3.6.2.3 provides an evalua tion of the appropriateness of using the earliest oneyear ahead analyst consensus forecast for calcu lating Forecast Attainment 3.6.1 Sensitivity Analyses 3.6.1.1 Discretionary Accruals As a sensitivity analysis of the relation between Total Discretionary Accruals and Fraud I used an alternative cash flow statement based measu re of discretionary accruals from Hribar and Collins (2002). This measure, Total Cash Based Discretionary Accruals calculates total accruals PAGE 69 57 as net income (#172) minus cash flow from operation s. Discretionary accruals and nondiscretionary accruals are estimated following equa tions (27), (28) and (29), respectively. The results (see Table 3.6) were qualitatively the same more specifically Total Discretionary Accruals derived using this alternative accruals measure re mains positively (p=0.014) related to Fraud 3.6.1.2 Real Activities Manipulation Research has shown that in addition to using discre tionary accruals to manage earnings, managers use real activities manipulation (Roychowd hury 2006). Real activities manipulation could, conceptually, be positively related to both Total Discretionary Accruals and Fraud if real activities manipulation is captured in discretionar y accruals and if this manipulation is subsequently detected or leads to fraud. Thus, I ev aluate if real activities manipulation is an omitted variable. This evaluation also provides so me insight into whether the earnings reversal hypothesis can be applied to real activities manipu lation. I add two real activities manipulation measures to model 33, Abnormal Production Costs and Abnormal Discretionary Expenditures (Roychowdhury 2 006), each summed over the three years Table 3.6 Alternative Total Discretionary Accruals Measure Logistic Regression Resultsa Variableb Predictio n Estimate Standard Error 2 prob> 2 Intercept (?) 0.081 0.745 0.010 0.913 Cash Based Discretionary Accruals (+) 1.527 0.78 5 4.827 0.014 Forecast Attainment (+) 0.565 0.224 6.728 0. 005 Unexpected Revenue per Employee (+) 1.422 0.710 4.443 0.018 Current Discretionary Accruals (+) 0.761 1.485 0.278 0.598 Sales to Assets () 0.330 0.319 1.120 0.145 Auditor () 0.187 0.628 0.090 0.765 Asset Growth (+) 0.351 0.215 2.706 0.050 CFO Change (+) 1.345 0.738 3.609 0.029 Pseudo R2 0.129 2test of model fit 19.32 (p=0.013) n 108 a Effect Likelihood Ratio Tests, onetailed tests rep orted for estimates in the direction of the predict ion, all other twotailed. b Dependent variable is financial statement fraud lik elihood; Total Cash Based Discretionary Accruals is calculated using a cash based measure of total accr uals, net income (#172) minus total cash from operations. This cash based measure of TA is then u sed to estimate discretionary accruals and nondiscretionary accruals following equations (27), (2 8) and (29), respectively. Please refer to footnote s in table 3.3 for definitions of all other variables. PAGE 70 58 leading up to the first fraud year. Production cost s are the sum of cost of goods sold and change in inventory. Abnormal Production Costs is the residua l from a regression model estimating normal production costs using current sales, change in sal es between t0 and t1 and change in sales between t1 and t2, all variables are deflated by beginning of the pe riod assets. Discretionary Expenditures are the sum of advertising expenses, R &D expenses, and selling, general and administrative expense. Abnormal Discretionary Expe nditures is the residual from a regression model estimating normal discretionary expenditures using sales in t1, all variables are deflated by t1 assets. Please refer to Roychowdhury (2006) for de tails on how to compute these measures. The results in Table 3.7 show that Total Discretion ary Accruals remain positive and significant (p=0.002), and that both Abnormal Discr etionary Expenditures (p=0.027) and Abnormal Production Costs (p=0.033) are positive an d significant. The positive relation between Abnormal Discretionary Expenditures and fraud was n ot expected19. Based on the idea that managers that manipulate earnings using real activi ties will reduce discretionary expenditures, I was expecting a negative relation between Abnormal Discretionary Expenditures in prior years and Fraud. Managers will over time run out of ways to manipulate earnings using real activities manipulation just like they do when they manipulate earnings using discretionary accruals. For example, if discretionary expenditures, such as res earch and development, are reduced to increase earnings then further reductions will eventually be come difficult as there are limits to how much these real activities can be manipulated. Furthermo re, by manipulating earnings using real activities manipulation the firm does not operate a t an optimal level, at least not what management would consider optimal, and the firm bec omes less likely to perform well in subsequent years. The deterioration in performance will pressure management to increase revenue and as the flexibility to manipulate earnings using real activities manipulation is reduced through earlier manipulation, it becomes more likely that t he manager will commit fraud to increase revenue. A potential explanation for the unexpected positive relation could be that abnormally high discretionary expenditures in prior years indi cate inefficient use of resources in prior years that lead to poor performance in subsequent years, and this poor performance puts pressure on management to commit fraud to manipulate earnings. The relation between Abnormal Production Costs is in the direction that the earnings reversa l hypothesis would predict, indicating that as managers manipulate earnings using real activities in prior years they are more likely to commit 19 I did not find any evidence of multicollinearity; the highest VIF was 1.48, which is relatively low. PAGE 71 59 fraud in subsequent years. Finally, note that the r elation between Total Discretionary Accruals and Fraud is robust to the inclusion of the two rea l activities manipulation measures. 3.6.1.3 Additional Control Variables As a sensitivity analysis I also added the other fi ve variables found to be significant predictors of fraud in Fanning and Cogger (1998), and total as sets, total sales and sales growth, as controls. The five additional variables in Fanning and Cogger 1998 are as follows. (1) Accounts Receivable Growth measured as a dummy variable equa l to one if accounts receivable exceeds 110% of the previous years value and zero otherwis e. Given that accounts receivables often increases as a result of fraud, a positive relation is expected between Accounts Receivable Growth and Fraud. Table 3.7 Total Discretionary Accruals, Real Activities Manipulation and Financial Statement Fraud Logistic Regression Resultsa Variableb Predictio n Estimate Standard Error 2 prob> 2 Intercept (?) 0.331 0.832 0.160 0.691 Total Discretionary Accruals (+) 2.286 0.870 8.766 0.002 Forecast Attainment (+) 0.527 0.247 4.793 0.014 Unexpected Revenue per Employee (+) 1.300 0.744 3.2 62 0.035 Abnormal Production Costs (+) 0.802 0.437 3.361 0.0 33 Abnormal Discretionary Expenditures () 0.885 0.431 4.888 0.027 Current Discretionary Accruals (+) 0.433 1.660 0.0 70 0.791 Sales to Assets () 0.435 0.357 1.574 0.105 Auditor () 0.077 0.697 0.012 0.912 Asset Growth (+) 0.465 0.237 3.993 0.023 CFO Change (+) 1.300 0.745 3.274 0.035 Pseudo R2 0.183 2test of model fit 25.14 (p=0.005) n 100c a Effect Likelihood Ratio Tests, onetailed tests rep orted for estimates in the direction of the predict ion, all other twotailed. b Dependent variable is financial statement fraud lik elihood; Abnormal Production Costs is the residual from a regression model estimating normal productio n costs using production costs (defined as the sum of cost of goods sold and change in inventory), cur rent sales, change in sales between t0 and t1 and change in sales between t1 and t2, all variables are deflated by t1 assets. Abnormal Discretionary Expenditures is the residual from a regression model estimating normal discretionary expenditures using discretionary expenditures (defined as the sum of a dvertising expenses, R&D expenses, and selling, general and administrative expense) and sales in t1, all variables are deflated by t1 assets. Please refer to footnotes in table 3.3 for definitions of all ot her variables. c The sample was reduced by four matched pairs becaus e discretionary expenditures data were not available for all firms. PAGE 72 60 Note that this information is captured in Current D iscretionary Accruals. (2) Debt to Equity is the ratio of debt to equity (leverage) and is expec ted to be positively related to Fraud given that higher levels of leverage put more pressure on mana gement to meet debt covenants. (3) Gross Margin Percentage is a dummy variable that is one i f the gross margin percentage exceeds 110% of the previous years value and zero otherwise; as suming that the gross margin percentage improves as a result of fraud, a positive relation is expected between Gross Margin Percentage and Fraud. (4) LIFO is a dummy variable that is one if the lastinfirstout inventory method is used and zero otherwise. Given that prices were gen erally rising during the examined period, LIFO is expected to be negatively related to Fraud as the usage of lastinfirstout, relative to firstinfirstout, decreases earnings when prices are rising. (5) The Percentage of Executive Directors on the board of directors is expected to be positively related to Fraud as the independence and monitoring effectiveness of the bo ard of directors is reduced by including company executives on the board. The other addition al control variables in the sensitivity analysis are defined as follows. (1) Assets is expected to b e negatively related to Fraud given that small growth firms are more likely to be investigated by the SEC (Beneish 1999) than larger slower growing firms. (2) Sales is expected to be negative ly related to Fraud given that small growth firms are more likely to be investigated by the SEC (Beneish 1999) than larger, slower growing firms. (3) Sales Growth is a dummy variable that me asures whether sales exceeds 110% of the previous years sales and is expected to be positiv ely related to Fraud given that small growth firms are more likely to be investigated by the SEC (Beneish 1999) than larger, slower growing firms. The results (see Table 3.8) obtained from th e sensitivity analysis were equivalent to the reported results with the exception that Asset Grow th became insignificant (p=0.463). Note that both Sales Growth (p=0.157) and Asset Growth are me asures of firm growth. Debt to Equity was additionally positive and marginally significant at (p=0.062). 3.6.1.4 Industry Clustering I next investigate potential industry differences i n the effect of Total Discretionary Accruals, Forecast Attainment and Unexpected Revenue per Empl oyee on Fraud. I first add dummy variables for the seven onedigit SIC industries re presented in the sample to model 33. The results (Table 3.9) show insignificant industry dummies (p> 0.398) and that the interpretation of the other variables does not change when the industry dummys are included in the main model, except for that Sales to Assets becomes marginally significant (p=0.090). Due to sample size limitations I cannot, however, t est the significance and direction of Total Discretionary Accruals, Forecast Attainment and Une xpected Revenue per Employee within each industry. I instead create seven subsamples by excl uding, from each subsample, all firms PAGE 73 61 belonging to one of the industries. Using model 33, I then examine the significance and direction of Total Discretionary Accruals, Forecast Attainmen t and Unexpected Revenue per Employee within each subsample. The results (untabulated) sh ow that Total Discretionary Accruals remains positive and significant (p<0.028) in six subsample s, and positive and marginally significant (p=0.054) when Personal and Business Services firms are excluded. Forecast Attainment remains positive and significant (p<0.008) in six subsample s, and positive but insignificant (p=0.106) when Wholesale and Retail firms are excluded. Unexp ected Revenue per Employee remains positive and significant (p<0.034) in six subsample s, and positive but insignificant (p=0.161) Table 3.8 Additional Control Variables Logistic Regression Resultsa Variableb Prediction Estimate Standard Error 2 prob> 2 Intercept (?) 0.273 1.148 0.060 0.812 Total Discretionary Accruals (+) 1.619 0.817 5.023 0.013 Forecast Attainment (+) 0.486 0.251 3.877 0.025 Unexpected Revenue per Employee (+) 1.226 0.725 2.9 93 0.042 Current Discretionary Accruals (+) 0.851 1.599 0.3 04 0.581 Sales to Assets () 0.334 0.385 0.772 0.190 Auditor () 0.170 0.646 0.071 0.790 Asset Growth (+) 0.032 0.339 0.009 0.463 CFO Change (+) 1.336 0.777 3.149 0.038 Account Receivable Growth (+) 0.279 0.274 1.040 0.1 54 Debt to Equity (+) 0.135 0.109 2.357 0.062 Gross Margin Percentage (+) 0.259 0.434 0.356 0.275 LIFO () 0.013 0.495 0.001 0.979 Percentage of Executive Directors (+) 1.200 1.276 0 .889 0.173 Assets () 0.000 0.000 0.001 0.490 Sales () 0.000 000 0.008 0.929 Sales Growth (+) 0.320 0.319 1.013 0.157 Pseudo R2 0.171 2test of model fit 25.56 (p=0.061) n 108 a Effect Likelihood Ratio Tests, onetailed tests re ported for estimates in the direction of the prediction, all other twotailed. b Dependent variable is financial statement fraud li kelihood; Account Receivable Growth is a dummy variable equal to one if accounts receivable exceeds 110% of the previous years value and zero otherwise. Debt to Equity is the ratio of debt to equity. Gross Margin Percentage is a dummy variable that is one if the gross margi n percentage exceeds 110% of the previous years value and zero otherwise. LIFO is a dummy variable that is one if the last in first out inventory method is used and zero othe rwise. The Percentage of Executive Directors is the percentage of all directors on the board of directors that are company executives. Sales is total sales. Sales Growth is a dummy variable that measures whether assets exceeds 110% of the previous years assets. Please refer to footnotes in table 3.3 for definitions of all other variables. PAGE 74 62 when Manufacturing firms are excluded. Note that th e results only changed to marginally significant or insignificant in three out of 21 tes ts, and that the significance levels only dropped when one of the three industries with the largest n umber of observations was removed from the sample, i.e., when the sample size was the smallest Thus, the effects of Total Discretionary Accruals, Forecast Attainment and Unexpected Revenu e per Employee on Fraud appear to be relatively robust for different industries. 3.6.2 Alternative Measure Design 3.6.2.1 Revenue Fraud The measure difference between revenue growth and e mployee growth (DiffEmp) introduced in a concurrent working paper (Brazel et al. 2007) is similar to percentage change in revenue per employee (% RE), which is used to derive Unexpected Revenue per Employee. However, the Table 3.9 Major Industry, Total Discretionary Accruals, Forecast Attainment and Unexpected Revenue per Empl oyee Logistic Regression Resultsa Variableb Prediction Estimate Standard Error 2 prob> 2 Intercept (?) 0.514 0.875 0.350 0.557 Total Discretionary Accruals (+) 1.576 0.773 5.532 0.009 Forecast Attainment (+) 0.612 0.232 7.453 0.003 Unexpected Revenue per Employee (+) 1.781 0.775 5.9 18 0.008 Current Discretionary Accruals (+) 1.210 1.664 0.5 87 0.444 Sales to Assets () 0.530 0.409 1.793 0.090 Auditor () 0.047 0.670 0.005 0.944 Asset Growth (+) 0.405 0.227 3.297 0.035 CFO Change (+) 1.318 0.739 3.437 0.032 Major Industry = 2 ? 0.221 0.628 0.120 0.724 Major Industry = 3 ? 0.226 0.447 0.250 0.614 Major Industry = 4 ? 0.122 0.994 0.020 0.902 Major Industry = 5 ? 0.398 0.677 0.350 0.556 Major Industry = 7 ? 0.449 0.531 0.710 0.398 Major Industry = 8 ? 0.578 0.753 0.590 0.443 Pseudo R2 0.1475 2test of model fit 22.08 (p=0.077) n 108 a Effect Likelihood Ratio Tests, onetailed tests rep orted for estimates in the direction of the prediction, all other twotailed. b Dependent variable is financial statement fraud lik elihood; Major Industry are dummy variables for the seven onedigit SIC code industries represe nted in the sample. Please refer to footnotes in table 3.3 for definitions of all other variables. PAGE 75 63 conceptual basis for using the measures differs, le ading to differences in definitions and what is actually measured. DiffEmp, defined as 1 1 1 1 t t t t t temp emp emp rev rev rev, is based on the idea that nonfinancial measures that are highly correlated to performance and that at the same time are difficult to manipulate can be used to evaluate the reasonableness of changes in firm performance. Unexpected Revenue per Employee starte d with the idea that revenue manipulation is difficult to detect in the revenue account, as r evenue varies for reasons other than fraud, and that some of this variation can be removed by defla ting revenue by a production process input variable. The number of employees was selected as t he deflator rather than assets as the number of employees is not impacted by revenue fraud when the fraudulent revenue is recorded following the doubleentry system. The primary difference bet ween the two measures is how they adjust revenue growth using employee growth. Note that bot h measures are based on the idea that there is a relatively constant relation between the numbe r of employees and revenue. Thus, if the number of employees grows by 10%, both measures ass ume that revenue should also grow by 10%, and vice versa. The two measures, however, dif fer in how the difference between expected and actual revenue is measured. Diffemp is increasi ng in the absolute difference between expected revenue growth and actual revenue growth, while RE D %is increasing in the ratio of expected revenue growth to actual revenue growth. T o clarify, take company A that is growing at a rate of 10% as indicated by the number of employe es growing by a rate of 10% and company B that is growing at a rate of 100%. Further assume t hat both companies fraudulently increase revenue by 30% over what could be expected given pr ior revenue, prior number of employees and current number of employees. Thus, in absolute terms, company B manipulated revenue more than company A, but as a percentage of expected rev enue there was no difference between the two firms. In this situation Diffemp is 0.33 for co mpany A and 0.6 for company B, while RE D %is 0.3 for both company A and company B. Assuming a constant percentage manipulation over expected revenue Diffemp is, while RE D %is not, increasing in the percentage change in the number of employees. Based on this discussion I exp ect that RE D % models will provide better fit and predictive ability than Diffemp models, whe n the models do not control for firm growth, and that their performance will be similar when the models control for firm growth using employee growth. PAGE 76 64 Using model 33 without a control for firm growth (r emoving Asset Growth from the model) and replacing Unexpected Revenue per Employee with RE D %, RE D % is in the expected direction and significant (p=0.016), see column 1 i n Table 3.10. When replacing Unexpected Revenue per Employee with Diffemp, Diffemp is in th e expected direction, but only marginally significant (p=0.067), see column 2 in Table 3.10. Using the same models but adding a control for employee growth, RE D % is in the expected direction and significant (p=0. 003) and Diffemp is in the expected direction and significant (p=0.0 03), see Table 3.10 columns 3 and 4, respectively. These results support the analytical analysis by indicating that when controlling for employee growth the two measures are equivalent, bu t that without a control for employee growth, the percentage change in revenue per employ ee RE D % is a better predictor of fraud than Diffemp. Table 3.10 Comparison of % RE and Diffemp Logistic Regression Resultsa (1) (2) (3) (4) Variableb Prediction Estimate prob > 2 Estimate prob> 2 Estimate prob > 2 Estimate prob > 2 Intercept (?) 0.572 0.448 0.635 0.384 0.385 0.614 0.419 0.574 Total Discretionary Accruals (+) 1.551 0.011 1.524 0.011 1.519 0.016 1.556 0.014 Forecast Attainment (+) 0.575 0.004 0.575 0.004 0.518 0.009 0.537 0.007 % RE (+) 1.690 0.016 2.552 0.003 Diffemp (+) 0.882 0.067 2.357 0.003 Current Discretionary Accruals (+) 0.743 0.614 0.690 0.634 0.818 0.624 1.11 0.529 Sales to Assets () 0.333 0.146 0.312 0.160 0.322 0.151 0.31 0.159 Auditor () 0.218 0.722 0.200 0.737 0.178 0.773 0.149 0.804 CFO Change (+) 0.597 0.041 0.589 0.041 0.644 0.032 0.669 0.028 Employee Growth (+) 1.282 0.029 1.989 0.008 Pseudo R2 0.119 0.103 0.1426 0.1417 2test of model fit 17.74 (p=0.013) 15.41 (p=0.031) 21.35 (p=0.006) 21.22 (p=0.007) n 108 108 108 108 a Effect Likelihood Ratio Tests, onetailed tests rep orted for estimates in the direction of the predict ion, all other twotailed. b Dependent variable is financial statement fraud lik elihood; % RE is defined as ) /() (1 1 1 1 t t t t t temp rev emp rev emp rev, DiffEmp is defined as 1 1 1 1 t t t t t temp emp emp rev rev rev, and Employee Growth is defined as 1 1 t t temp emp emp. Please refer to footnotes in table 3.3 for definitions of all other variables. PAGE 77 65 To further substantiate this claim I perform a nest ed Ftest where the fit of a reduced model is compared to full models. The reduced model includes all variables reported in Table 3.10 except for RE D % or Diffemp, see column 3 in Table 3.11. The full m odels include RE D % or Diffemp, see columns 1 and 2 in Table 3.11, respectively. Th e log likelihood values of the full models are then compared to the log likelihood value of the re duced model. The results show that only the RE D % model (column 1, Table 3.11) significantly (p=0.03 2) improves the fit of the reduced model. I also evaluate the predictive ability of the RE D % model to the predictive ability of the Diffemp model. The prediction errors of the RE D % model are significantly lower than the Table 3.11 Comparison of Model Fit and Predictive Ability of % RE and Diffemp Logistic Regression Resultsa (1) (2) (3) Variableb Prediction Estimate prob> 2 Estimate prob> 2 Estimate prob> 2 Intercept (?) 0.572 0.448 0.635 0.384 0.632 0.367 Total Discretionary Accruals (+) 1.551 0.011 1.524 0.011 1.443 0.013 Forecast Attainment (+) 0.575 0.004 0.575 0.004 0.530 0.006 % RE (+) 1.690 0.016 Diffemp (+) 0.882 0.067 Current Discretionary Accruals (+) 0.743 0.614 0.690 0.634 0.557 0.689 Sales to Assets () 0.333 0.146 0.312 0.160 0.271 0.182 Auditor () 0.218 0.722 0.200 0.737 0.171 0.767 CFO Change (+) 0.597 0.041 0.589 0.041 0.591 0.040 Pseudo R2 0.119 0.103 0.0879 2test of model fit 17.74 (p=0.013) 15.41 (p=0.031) 13.16 (p=0.041) Diff 2 full 2 reduced 4.58 (p=0.032) 2.29 (p=0.134) Predictive Ability (Mean Diff.) p=0.045 Predictive Ability (Wilcoxon) z=0.006 n 108 108 108 a Effect Likelihood Ratio Tests, onetailed tests re ported for estimates in the direction of the prediction, all other twotailed. b Dependent variable is financial statement fraud lik elihood ; % RE is defined as ) /() (1 1 1 1 t t t t t temp rev emp rev emp rev, and DiffEmp is defined as 1 1 1 1 t t t t t temp emp emp rev rev rev. Please refer to footnotes in table 3.3 for definiti ons of all other variables. PAGE 78 66 prediction errors of the Diffemp model (Wilcoxon, z =0.006; mean difference, p=0.045). The nested Ftest evaluation and the predictive ability comparison provide further support that RE D % is a better predictor of Fraud than Diffemp. Recal l that Unexpected Revenue per Employee is defined as the difference between a fir ms RE D %and the % RE of the firms industry. Diffemp is, however, not adjusted for ind ustry differences and as such I used % RE instead of Unexpected Revenue per Employee in the c omparison just described. The specific form of the capital productivity measure and whethe r the industry adjustment adds value are empirical questions that I leave unanswered for fut ure research to explore. Dechow et al. (2007) also introduce an employee bas ed fraud indicator, Abnormal Change in Employees defined as the percentage change in emplo yees minus the percentage change in assets. This measure is very similar to the measure introdu ced in Brazel et al. (2007), though neither study discusses the other measure. I focus on compa ring Unexpected Employee Revenue to Diffemp in this supplemental analysis as Diffemp is closer in nature than Abnormal Change in Employees to Unexpected Employee Revenue. I neverth eless provide some brief results from analyses of Abnormal Change in Employees. Since the correlation between Abnormal Change in Employees and Unexpected Revenue per Employee is lo w (r=0.103), I include these two variables in the same model. The results in Table 3 .12 show that Unexpected Revenue per Employee is in the direction expected and significa nt (p=0.018), while Abnormal Change in Employees is insignificant (p=0.955). 3.6.2.2 Total Discretionary Accruals Aggregation Periods The results of my research confirm that the likelih ood of fraud is significantly (p=0.009) higher for firms that are pressured and constrained by earnings management in prior years (Table 3.5). Total Discretionary Accruals was measured as a firms total discretionary accruals during the three years leading up to the first fraud year. I chose to aggregate discretionary accruals over multiple years based on the idea that over time dis cretionary accruals reverse. Three years was used since it conceptually seems to be an appropria te time frame and based on the graphical depiction of the relation between discretionary acc ruals and fraud in (Dechow et al. 1996). To evaluate the appropriateness of this decision, I ex amine the relation between fraud likelihood and discretionary accruals aggregated over the two year s leading up to the first fraud year, Total Discretionary Accruals2, and discretionary accruals in the year prior to the first fraud year, Total Discretionary Accruals1. Using model 33 and one of the prior year(s) total discretionary accruals measures at a time, Total Discretionary Accruals2 i s in the expected direction and marginal significant (p=0.068) and Total Discretionary Accru als1 is also in expected direction but insignificant (p=0.108), see Table 3.13 columns 2 a nd 3, respectively. The Total Discretionary PAGE 79 67 Accruals2 and the Total Discretionary Accruals1 mod els have Pseudo R2 values of 0.110 and 0.107, respectively. The higher pvalue of Total Di scretionary Accruals (p=0.009) and the higher Pseudo R2 value of the Total Discretionary Accruals model (0. 135) (Table 3.13 column 1) indicates that Total Discretionary Accruals is a st ronger predictor of Fraud than either Total Discretionary Accruals2 or Total Discretionary Accr uals1. To further substantiate this claim I perform nested Ftests where the fit of a reduced model is compared to a full model. The reduced model exclude s the Total Discretionary Accruals variables (column 4, Table 3.13). The full models are the thr ee models described above. The log likelihood of these models (columns 1, 2, and 3, Table 3.13) a re then compared to the log likelihood of the reduced model. The results show that only the Total Discretionary Accruals model (column 1, Table 3.13) significantly (p=0.017) improves the fi t of the reduced model. I also evaluate the predictive ability of the Total Discretionary Accruals model against the Total Discretionary Accruals2 model and Total Discr etionary Accruals1 model. The prediction errors of the Total Discretionary Accruals model ar e significantly lower than the prediction errors of the Total Discretionary Accruals2 model (Wilcoxo n, z=0.0007; mean difference, p=0.0102) Table 3.12 Unexpected Revenue per Employee and Abnormal Change in Employees Logistic Regression Resultsa Variableb Predictio n Estimate Standard Error 2 prob> 2 Intercept (?) 0.702 0.800 0.770 0.380 Total Discretionary Accruals (+) 1.641 0.791 5.600 0.009 Forecast Attainment (+) 0.571 0.227 6.737 0.005 Unexpected Revenue per Employee (+) 1.440 0.718 4.3 98 0.018 Abnormal Change in Employees (+) 0.042 0.747 0.003 0.955 Current Discretionary Accruals (+) 0.852 1.612 0.2 91 0.590 Sales to Assets () 0.320 0.342 0.901 0.171 Auditor () 0.192 0.629 0.095 0.758 Asset Growth (+) 0.354 0.232 2.376 0.062 CFO Change (+) 0.650 0.364 3.441 0.032 Pseudo R2 0.183 2test of model fit 25.14 (p=0.005) n 100 c a Effect Likelihood Ratio Tests, onetailed tests re ported for estimates in the direction of the prediction, all other twotailed. b Dependent variable is financial statement fraud li kelihood; Abnormal Change in Employees is defined as the percentage change in employees minus the percentage change in assets. Please refer to footnotes in table 3.3 for definitions of all other variables. PAGE 80 68 Table 3.13 Three Years, Two Years and One Year Total Discretio nary Accruals Logistic Regression Resultsa (1) (2) (3) (4) Variable b Prediction Estimate prob> 2 Estimate prob> 2 Estimate prob> 2 Estimate prob> 2 Intercept (?) 0.065 0.931 0.034 0.963 0.054 0.941 0.181 0.800 Total Discretionary Accruals (+) 1.641 0.009 Total Discretionary Accruals2 (+) 1.390 0.068 Total Discretionary Accruals1 (+) 1.701 0.108 Forecast Attainment (+) 0.573 0.004 0.524 0.007 0.5 09 0.009 0.530 0.006 Unexpected Revenue per Employee (+) 1.445 0.016 1.3 93 0.018 1.466 0.015 1.341 0.020 Current Discretionary Accruals (+) 0.875 0.560 0. 923 0.527 0.656 0.655 0.054 0.482 Sales to Assets () 0.327 0.148 0.280 0.179 0.26 2 0.193 0.254 0.200 Auditor () 0.192 0.759 0.207 0.736 0.187 0.762 0.1 15 0.848 Asset Growth (+) 0.359 0.047 0.344 0.051 0.368 0.04 2 0.332 0.056 CFO Change (+) 1.304 0.031 1.165 0.046 1.074 0.057 0.990 0.073 Pseudo R2 0.135 0.110 0.107 0.097 2test of model fit 20.15 (p=0.010) 16.72 (p=0.033) 16.03 (p=0.042) 14.50 (p=0.043) Diff 2 full 2 reduced 5.65 (p=0.017) 2.22 (p=0.136) 1.53 (p=0.216) Predictive Ability (Mean Difference) p=0.0102 p=0.0098 Predictive Ability (Wilcoxon) z=0.0007 z=0.0087 n 108 108 108 108 a Effect Likelihood Ratio Tests, onetailed tests re ported for estimates in the direction of the predic tion, all other twotailed. b Dependent variable is financial statement fraud li kelihood; Total Discretionary Accruals is the total amount of discretionary accruals defla ted by assets in the three years leading up to the frau d year, see table 3.3 for calculation of discretion ary accruals, while Total Discretionary Accruals2 ( Total Discretionary Accruals1 ) is the total amount of discretionary accruals in the two (one) years leading up to the fraud year. Diff2 full 2 reduced is twice the difference in negative loglikelihoods between the reduced model in column (4) and the fu ll models in columns (1), (2) and (3). Predictive Ability compares the prediction errors of the full model i n column (1) to the prediction errors of the full m odels in column (2) and (3). Please refer to footnotes in t able 3.3 for definitions of all other variables. PAGE 81 69 and the Total Discretionary Accruals1 model (Wilcox on SignRank, z=0.0087; mean difference, p=0.0098). To summarize, in addition to being supported concep tually and by prior empirical research, the individual variable statistics, the nested Fte st evaluation and the predictive ability comparison, support the use of Total Discretionary Accruals over Total Discretionary Accruals1 and Total Discretionary Accruals2. 3.6.2.3 Analyst Forecast Period I use the first forecast rather than the most curre nt forecast because financial statement fraud can be an on going activity occurring throughout th e year. To evaluate the appropriateness of this decision I also examined the average of all consens us forecasts made throughout the year leading up to the first fraud year. The results were weaker but in the same direction, more specifically meeting or exceeding analyst forecasts was positive ly (p=0.056) related to the likelihood of fraud (Table 3.14). Thus, it appears that using the first forecast in the period is preferable in fraud detection. Table 3.14 Alternative Analyst Forecast Measure Logistic Regression Resultsa Variableb Prediction Estimate Standard Error 2 prob> 2 Intercept (?) 0.112 0.720 0.020 0.877 Total Discretionary Accruals (+) 1.576 0.742 6 .001 0.007 Forecast Attainment (average) (+) 0.336 0.214 2.514 0.056 Unexpected Revenue per Employee (+) 1.206 0.688 3.318 0.034 Current Discretionary Accruals (+) 0.875 1.481 0.375 0.540 Sales to Assets () 0.255 0.305 0.717 0.199 Auditor () 0.228 0.609 0.141 0.707 Asset Growth (+) 0.382 0.213 3.314 0.034 CFO Change (+) 1.269 0.719 3.368 0.033 Pseudo R2 0.106 2test of model fit 15.81 (p=0.045) n 108 a Effect Likelihood Ratio Tests, onetailed tests re ported for estimates in the direction of the prediction, all other twotailed. b Dependent variable is financial statement fraud li kelihood; Forecast Attainment (average) is a dummy variable equal to 1 if analyst forecast were met or exceeded and 0 otherwise (I/B/E/S), where the analyst forecast is the average of all co nsensus forecasts made throughout the year leading up to the first fraud year. Please refer to footnotes in table 3.3 for definitions of all othe r variables. PAGE 82 70 3.7. Conclusions My research objective was to improve our understand ing of antecedents of fraud, and thereby improve our ability to detect fraud. To realize thi s objective I developed a new measure that aggregates discretionary accruals over the three ye ars leading up to the first fraud year to capture the pressure of earnings reversals and earnings man agement inflexibility. My results show that firms that are running out of ways to manage earnin gs, and are facing accruals reversals as a result of earnings management in prior years are mo re likely to commit financial statement fraud. I also perform more in depth analyses of the earnin gs reversal hypothesis that provides some initial indications that (1) the earnings reversal hypothesis also applies to real activities manipulation, and (2) discretionary accruals should be summed over three years, rather than two years or one year. This study also adds to fraud research by examining whether capital market expectations provide incentives to commit financial statement fr aud. My results show that evidence of a firm meeting or beating analyst forecasts can be used to detect financial statement fraud. Additionally, this study adds to earnings management research inv estigating capital market expectation, which typically assumes that distributional inconsistenci es in reported earnings around analyst forecasts indicate that some firms manipulated earnings to me et analyst forecasts. The results provide more direct evidence of earnings manipulation incentives related to capital market expectations and corroborate the findings of earnings management res earch. I also develop a new productivity based measure des igned to capture revenue fraud. The results show that this measure provides utility in fraud prediction and that the inclusion of this measure in fraud detection models improves model fi t and model predictive ability more than similar measures being proposed in contemporary res earch. This measure might also provide insights to the development of new and improved dis cretionary accrual measures. These results should, however, be interpreted knowi ng that the sample of fraud firms was taken from SEC Accounting and Auditing Enforcement Releases. Thus, I am really improving our understanding of fraud firms investigated by th e SEC and this knowledge might not fully generalize to fraud in general. Future research can extend this study in a number o f ways. I proposed that total discretionary accruals increase the likelihood of fraud through t wo processes, prior earnings management puts pressure on management as the accruals reverse and prior earnings management constrains current earnings management flexibility. Future res earch can explore these two dimensions further and examine if only one, or if both, proces ses increase the likelihood of fraud. Earnings reversals can be viewed as providing an incentive t o commit fraud, while earnings management PAGE 83 71 inflexibility is a condition that increases the lik elihood of fraud given a set of incentives. Thus, earnings management inflexibility should interact w ith other incentives. Future research can also examine fraud incentives related to other capital m arket expectations than analyst forecasts. For example, do firms commit fraud so they do not have to report small losses or small earnings growth declines? Future research can also develop a dditional account specific fraud measures, or even focus on specific types of fraud. I developed a measure for detecting revenue fraud because revenue is the most commonly manipulated account. H owever, other accounts are also fraudulently manipulated. Future research could des ign new measures for detecting fraud in these accounts as well. These account specific measures c ould then be combined with nonspecific measures that measure general incentives, condition s and management character, to improve the utility of fraud classification models. Different m odels could be built to identify different types of fraud and the classifications of these individual m odels could then be combined into an overall classification. Multiclassifier combination is a r ich research stream in machine learning that could provide the foundation for this research. To summarize, the results show that the likelihood of fraud is significantly higher for firms that meet or exceed analyst forecasts, are running out of ways to manage earnings and are facing accrual reversals, or have high labor productivity. These findings can help the SEC, auditors, financial institutions and others improve fraud pre vention and detection, and thereby curb costs associated with fraud and improve market efficiency These findings are also important to auditors that need to provide reasonable assurance about whether the financial statements are free of material misstatements caused by fraud, especial ly during client selection and continuation judgments, and audit planning. PAGE 84 72 Chapter 4. Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms 4.1. Introduction Undetected financial fraud is one of the greatest r isks to an organizations viability and corporate reputation, and it has the capacity to draw into its sphere all associated people, not only the guilty (KPMG 2006) Jeffrey Lucy, Chairman, Australian Securities and Investments Commission The cost of financial statement fraud to defrauded organizations is estimated at $340 billion20 per year in the U.S. (ACFE 2006). In addition to th ese direct costs, employees and investors are impacted negatively by financial statement fraud. F raud also undermines the trustworthiness of corporate financial statements, which results in hi gher transaction costs and a less efficient market. Auditors have, both through self regulation and legislation, assumed the responsibility of providing reasonable assurance that the financial s tatements are free of material misstatement due to error or fraud. Earlier auditing standards, i.e. Statement on Auditing Standards (SAS) No. 53, only indirectly addressed this responsibility throu gh references to irregularities (AICPA 1988). However, more recent auditing standards, SAS No. 82 and later, make this responsibility explicit. The auditor must provide reasonable assurance abou t whether the financial statements are free of material misstatements, whether caused by error or fraud (AICPA 1997, AU 110.02). A stream of accounting research focuses on testing various statistical and data mining models with the goal of improving fraud detection. Data mi ning research that focuses specifically on financial statement fraud detection is important si nce the financial statement fraud domain is 20 The ACFE (2006) report provides estimates of total fraud cost, mean cost per fraud category and number of cases. To derive the estimate for total cost of financial statement fraud, I assumed that the relat ive difference in mean is similar to the relative diffe rence in median cost among the different occupation al fraud categories. PAGE 85 73 unique. Distinguishing characteristics that make th is domain unique include: (1) the ratio of fraud to nonfraud firms is small (high class imbalance); (2) there are different types of fraud (target class disjunct); (3) the ratio of false positive to false negative classification error cost is small (cost imbalance); (4) the attributes used to detect fraud are relatively noisy where similar attribute values can signal both fraudulent and nonfraudulen t activities; and (5) fraudsters actively attempt to conceal the fraud by making the fraud firms att ribute values look like the attribute values of nonfraud firms. Because of the unique characterist ics it is not clear, without empirical evaluation, whether classifiers that perform well i n other domains will also perform well in the financial statement fraud domain. The financial statement fraud research typically us es logistic regression as the baseline model against which data mining models are tested. The da ta mining algorithm used in this line of research has almost exclusively been artificial neu ral networks (ANN), for example Green and Choi (1997), Lin et al. (2003), Fanning and Cogger (1998) and Feroz et al. (2000). A number of data mining algorithms that have been found to be g ood predictors in other domains have not been investigated in financial statement fraud rese arch. Two more recent financial statement fraud studies have included additional algorithms i n their comparisons (Kotsiantis et al. 2006; Kirkos et al. 2007). However, these comparisons wer e based on unrealistic ratios of fraud to nonfraud firms (1:1: and 1:3); assumed equal costs of Type I and Type II errors; and used accuracy and error rate to measure performance, which genera lly are considered inappropriate for a domain like fraud detection. Furthermore, the Kotisantis e t al. (2006) and Kirkos et al. (2007) studies included only financial ratios as predictor variabl es, leaving out many measures developed in confirmatory financial statement fraud research. Th us, we know very little about what algorithms are useful for detecting financial statement fraud, under what specific circumstances one algorithm might be better than another and what pre dictors are useful for the different algorithms. My research objective is to compare the utility of a fairly comprehensive set of data mining algorithms in financial statement fraud prediction. More specifically, my research questions are: 1. What algorithm(s) provide the most utility given di fferent assumptions about classification costs and prior probabilities of fraud? 2. What predictors are useful to these algorithms when detecting financial statement fraud? The answers to these questions are of practical val ue to auditors and institutions, like the Securities and Exchange Commision (SEC). The result s provide guidance as to what algorithms and predictors to use when creating new models for financial statement fraud detection. Auditors can use these algorithms and predictors to improve client selection, audit planning and analytical PAGE 86 74 procedures; while the SEC can leverage the findings to target audit engagements where the auditee is more likely to have committed financial statement fraud. I provide an overview of related research in Sectio n 4.2. Section 4.3 contains a description of the experimental variables and data used to evaluat e the classification algorithms, while Section 4.4 describes the experimental procedure and report s preprocessing results. The experimental results are reported in Section 4.5. The results ar e summarized in Section 4.6, along with a discussion of research contributions and limitation s, and suggestions for future research. 4.2. Related Research Research focusing on evaluating the effectiveness o f different fraud prediction algorithms has typically introduced different variations of ANN an d compared these algorithms to logistic regression. Green and Choi (1997) evaluated the per formance of three ANNs with input variables preprocessed in different ways: simple percentage c hange, plain sumoftheyearsdigit weighted average, and incremental sumoftheyearsdigit we ighted average. In an experiment with 86 SEC fraud cases matched with 86 nonfraud cases the ANNs were compared to random guessing, defined as Type I and Type II error rates of 0.5, a nd summed error rate below 1. The results showed that the ANNs performed better than random g uessing on the training sample. On the evaluation sample, however, the ANNs did not perfor m significantly better in terms of either Type I or Type II errors. The summed error rate com parison did show that the ANNs performed significantly better than random guessing, but this comparison used classification results from a combined sample of both the training and evaluation samples. Starting with 62 potential fraud predictors, Fannin g and Cogger (1998) compared an ANN to quadratic discriminant analysis, linear discriminan t analysis and logistic regression. The ANN correctly classified 63% of the cases, as compared to 52% for the highest benchmark algorithm (linear discriminant analysis). However, the ANN ha d a lower true positive rate, defined as the number of true positive classifications divided by the number of positive instances in the dataset, than all three benchmark algorithms. In both Green and Choi (1997) and Fanning and Cogge r (1998) the experiments assumed that the costs of false positive and false negative were equal and that the dataset was balanced with a prior probability of 0.5. In reality, the probabili ty of fraud is much smaller than 50%21, and the cost of false negatives is often much larger than t he cost of false positives. The optimal threshold 21 In one estimate the probability of fraud is 0.6% (B ell and Carcello 2000). PAGE 87 75 for classifying a financial statement as fraudulent in discriminant analysis and logistic regression analyses is, therefore, likely to be much lower tha n the default of 0.5 used in these studies. As false positive and false negative rates do not rema in constant over different threshold levels and the relative performance of algorithms is often dif ferent at different thresholds, the assumptions made in these studies limit our understanding of th e performance of these algorithms to one specific scenario that is unrealistic. Using seven SAS No. 53 redflags, Feroz et al. (200 0) focused on comparing the utility of an ANN model with logistic regression based on hitrat e, overall error rate22 and estimated relative costs of misclassification.23 The results did not show that one algorithm was do minant at all treatment levels where prior probabilities24 and the relative cost of different classification errors25 were manipulated. Lin et al. (2003) introduced an e xisting data mining algorithm, fuzzy ANN, into the accounting domain and compared the fuzzy A NN model to logistic regression using hit rate,26 overall error rate and estimated relative costs of misclassification. The results, as in Feroz et al. (2000), did not show that either of the algo rithms dominated. The analysis was performed using seven financial ratios that were related spec ifically to the revenue cycle. The sample was, however, not adjusted accordingly, i.e., nonrevenu e based fraud was included in the sample. In the overall error rate and estimated relative costs of misclassification analyses in Feroz et al. (2000) and Lin et al. (2003), the optimal fraud cla ssification probability cutoff level was not determined for each treatment group, i.e., algorith m, relative error cost and prior probability treatment combination. The analyses instead used er ror rates obtained based on optimal cutoffs 22 Overall error rate takes into account differences in priorprobabilities of the different outcomes an d the type of classification error. 23 Estimated relative costs of misclassification take s into account: priorprobabilities, classification costs of different outcomes, and the type of classification error (false positive or false negative). Note, how ever, that if the priorprobabilities and relative costs are n ot adjusted during model building, i.e., training, then the models might not perform optimally for the differen t priorprobability and relative classification cos ts combinations. Thus, this performance measure might be misleading if the different models are not rebuilt/retrained for each priorprobability and re lative classification cost combination examined usi ng the measure. 24 The priorprobability refers to the percentage of fraud firms. 25 The cost of making a Type I error (false positive) compared to the cost of making a Type II error (fa lse negative). 26 Hit rate is the percentage of objects accurately c lassified. PAGE 88 76 when assuming no differences in classification cost s and without finding the best prior training fraud probabilities for different evaluation prior fraud probability levels. More recently, Kotsiantis et al. (2006) used 41 fra ud and 123 nonfraud firms in Greece to examine seven classification algorithms: C4.5 (Deci sion Tree), RBF (ANN), K2 (Bayesian networks), 3NN (nearestneighbor), RIPPER (rulele arner) and SMO (support vector machine) and logistic regression. They also examined four en semble based algorithms27: MP stacking (stacking with seven baseclassifier and a tree lea rner metaclassifier that only learns from true class probabilities), MLR stacking (stacking with s even baseclassifier and a multiresponse linear regression metaclassifier that learns from all class probabilities), Grading (IBk baseclassifier with 10NN metaclassifier) and Simple V oting (stacking with seven baseclassifier and a tree learner metaclassifier). The algorithms wer e trained using 10fold cross validation using financial statement ratios as input variables. The results in terms of overall accuracy showed that MP stacking provided the best performance (95.1%) f ollowed by MLR stacking (93.9%), and that all ensemble based methods outperformed the best no nensemble algorithm. The best nonensemble algorithm was C4.5 (91.2%), followed by RI PPER (86.8%). The accuracy of logistic regression (75.3%) and the ANN (73.4%) was relative ly low. Accuracy was also reported for each class, i.e., the equivalent of Type I and Type II error rates. While Kotsiantis et al. (2006) evaluated a relatively comprehensive set of classif iers, the percentage of fraud firms in their dataset (25%) was much higher than estimates indica ting that around 0.6% of all organizations are fraud firms (Bell and Carcello 2000). Furthermo re, they assumed that costs associated with Type I and Type II errors were equivalent, while in reality it is likely that Type II errors were much more expensive than Type I errors. Based on fi nancial statement fraud effects reported in Beneish (1999), Bayley and Taylor (2007) estimated that the ratio of Type I error classification costs to Type II error classification costs was bet ween 1:20 and 1:40. These assumptions were also reflected in their training data, in that they did not examine the performance of the classifiers at different classification thresholds. Kirkos et al. (2007) compared an ANN, a Bayesian be lief network and a decision tree learner using 35 fraud and 38 nonfraud firms. The reported class accuracies (fraud accuracy, nonfraud accuracy) indicated that the Bayesian belief networ k (91.7%, 88.9%) outperformed the ANN (82.5%, 77.5%) and the decision tree (75%, 72.5%). As in Kotsiantis et al. (2006) the 27 Ensemble based classification algorithms combines the decision output from multiple classifiers, i.e. they use an ensemble of classifiers (see Section 4. 3.1.2. for further detail). PAGE 89 77 classification costs were assumed to be the same fo r Type I and II errors, and the dataset contained almost the same number of fraud firms as nonfraud firms. I extend this literature by evaluating the performa nce of a relatively extensive set of algorithms selected based on their classification p erformance in both prior financial statement fraud research and in domains similar to the financ ial statement fraud domain. I also examine under what specific, realistic circumstances these algorithms perform well and what predictors provide utility to these algorithms in terms of imp roving classification performance. 4.3. Experimental Variables and Data To answer my research questions, I ran experiments to evaluate the performance of a relatively comprehensive and representative set of classification algorithms. The classification algorithms were used to classify firms as either fr aud or nonfraud firms based on firm attributes derived from the financial statements, analyst fore casts and proxy statements of the firms. Classifier performance was measured using estimated relative cost under different assumptions of relative costs of Type I and Type II errors and pri or probability of fraud and nonfraud (Dopuch et al. 1987). The performance of each classifier co nfiguration and training scenario combination was measured 10 times using 10fold cross validatio n for each combination of classification costs and prior fraud probability. Section 4.3.1, 4.3.2 a nd 4.3.3 describe the three factors that were manipulated in the experiment: classification algor ithms, classification costs and prior probability of fraud, respectively. The dependent measure, esti mated relative cost, is described in Section 4.3.4. The dataset, which includes objects (fraud a nd nonfraud firms) and object features (fraud predictors), is described in Section 4.3.5. 4.3.1 Classification Algorithms 4.3.1.1 Overview One of the goals of this research is to examine cla ssification algorithm performance in fraud detection. The primary experimental factor of inter est was, therefore, classification algorithms. The classification algorithms were obtained from We ka, an open source data mining tool that contains components for (1) preprocessing data, inc luding data loading, data filtering and transforming object attributes, (2) object classifi cation, clustering and association rule mining, (3) object attribute evaluation, and (4) result analysi s and visualization. Using an open source tool facilitates the replication and extension of this s tudy by future financial statement fraud data mining research. Weka implements a relatively compl ete set of classification algorithms, including many of the most popular algorithms. Base d on prior financial statement fraud research PAGE 90 78 and prior data mining research in domains with imba lanced datasets, I selected six algorithms from Weka: J48, SMO, MultilayerPerceptron, Logistic s, stacking and bagging. J48 is a decision tree learner and Wekas implementation of C4.5. SMO is a support vector machine (SVM) and Logistics is Wekas logistic regression implementat ion, both of these classifiers are linear functions. MultilayerPerceptron is Wekas backpropa gation ANN implementation, and stacking and bagging are two ensemble based methods. Section 4.3.1.3 provides more indepth descriptions of these algorithms. In addition to th e algorithms implemented in Weka, I also included Information Marked based Fusion (IMF), des cribed in Essay I. Logistic regression, ANN and stacking were included as they had performed well in prior fraud research (Feroz et al. 2000; Lin et al. 2003; Kotsiantis et al. 2006). Note that it was, however, not clear if these algorithms would perfor m well under realistic conditions and relative to not yet examined classifiers. Bagging, J48, SMO and IMF were included because prior data mining research (Fries et al. 1998; Phua et al. 200 4; West et al. 2005) and the research in Chapter 2 found that these algorithms performed well in dom ains with imbalanced data, i.e., where the majority class was larger than the minority class, which is true in the financial statement fraud domain. It was, however, not known how these classi fiers would perform in fraud detection. In the next section, I describe in greater detail w hy these specific algorithms were selected (Section 4.3.1.2). I then provide more indepth des criptions of each algorithm and experimental classifier parameter settings (Section 4.3.1.3). 4.3.1.2 Algorithm Selection Of the classifiers I selected, four were individual classifiers: J48 (C4.5), SMO (SVM), MultilayerPerceptron (ANN) and Logistics (logistic regression). I included logistic regression and ANN algorithms to allow for comparisons with prior financial statement fraud research (Green and Choi 1997; Fanning and Cogger 1998; Feroz et al 2000). I included SVM and C4.5 as these algorithms were found, in domains other than fraud, to provide good classification performances (Fries et al. 1998; Fan and Palaniswami 2000; Phua et al. 2004). Fries et al. (1998) examined the performance of SVM on a dataset with a 34.5% prior probability of patients having breast cancer. The a ccuracy of SVM (99.5%) was higher than that reported previously for CART (94.2%), RBF ANN (95.9 %), linear discriminant (96.0%) and multilayered ANN (96.6%). Fan and Palaniswami (200 0) compared a SVM to an ANN, MDA and learning vector quantization in bankruptcy pred iction, with a 49.4% prior probability of bankruptcy, and found that the SVM outperformed the other classifiers. In another bankruptcy prediction evaluation, Shin et al. (2005) used a ba lanced dataset of 1160 bankrupt firms and 1160 nonbankrupt firms and found that the SVM outperfor med the ANN. PAGE 91 79 Data mining research has also investigated the effi cacy of various algorithms in the medical field. Weiss and Kapouleas (1990) found that two ru lebased classifiers, CART (0.006428) and PVM (0.0067), outperformed ANN (0.0146), Bayes Inde pendence (0.0394), Nearest Neighbor (0.0473), DA linear (0.0615), Bayes 2nd order (0.0756), and DA Quadratic (0.1161) on a thy roid disease dataset with a 7.7% prior probability of th yroid disease. They also found similar results on a breast cancer dataset with a 70% prior probabi lity of the existence cancer. CART (0.229) and PVM (0.229) again had the best overall performance and outperformed ASSISTANT Tree (0.280), Bayes Independence (0.282), ANN (0.285), D A linear (0.294), Bayes 2nd order (0.344), DA Quadratic (0.344), and Nearest Neighbor (0.347). Based on these results they concluded that rule based classifiers provided the best performanc e, especially on imbalanced data. In addition to performance benefits, rulebased classifiers also g enerate output that is interpretable by humans. Of the rulebased classifiers, and perhaps of all m achine learning algorithms, C4.5 and its commercial successor C5 have arguably become the mo st commonly used offtheshelf classifiers (Witten and Frank 2005). Using a 1:3 unbalanced dat aset of Greek financial statement fraud firms, Kotsiantis et al. (2006) found that C4.5 (91 .2%), followed by RIPPER (86.8%), two rulebased classifiers, were the best nonensemble based algorithms in their experiment. Their experiment also included five other individual clas sifiers: an ANN (RBF), a Bayesian network (K2), a nearestneighbor (3NN), a SVM (SMO) and lo gistic regression. Phua et al. (2004) provided further support for including the C4.5 alg orithm in this study. They found that C4.5 performed relatively well in a dataset with a 6% pr ior probability of auto insurance claims fraud. Using a cost sensitive performance measure (cost sa vings within parentheses), C4.5 ($165) performed better than bagging ($127), stacking ($10 4), Naive Bayesian ($94) and ANN ($89). Furthermore, the performance of C4.5 was relatively robust with respect to the sampling method. In addition to these four individual classifiers, I also examined three ensemble based methods29: bagging, stacking and IMF. By combining the resul ts of a group or ensemble of individual classifiers, classification performance can be improved over the best individual classifier in the ensemble. The basic idea behind e nsemble based methods is that different 28 Error rates, defined as the number of false positi ve and false negative classifications divided by al l instances in the dataset, are reported within the p arentheses. 29 As the name indicates an ensemble based method is a type of classification method that uses a group (ensemble) of individual classifiers, so called bas eclassifiers, to classify objects. Different ensem ble based methods include different baseclassifiers, train t he baseclassifiers differently and use different a lgorithms to combine baseclassifier decision or probability outputs into an overall ensemble decision or probab ility. PAGE 92 80 classifiers have different strengths and weaknesses and therefore provide complementary information about the classification problem. These differences can be leveraged to improve classification performance by combining the individ ual classifiers decisions (Kittler et al. 1998). Ensemble research has primarily focused on two area s: (1) ensemble architecture, examining what classifiers to include in the ensemble and how to train these classifiers; and (2) combiner method, examining how to combine the baseclassifie rs decisions. Using an ensemble of ANN baseclassifiers, West et al. (2005) compared the performance of crossvalidation (CV), bagging and boosting, three p opular methods from ensemble architecture research. Three datasets were used in this comparis on: Credit Rating Australian (307 no credit given and 383 credit given), Credit Rating German ( 300 no credit given and 700 credit given), and Bankruptcy (93 bankrupt and 236 nonbankrupt co mpanies). The results (error rates) for the three datasets showed that bagging performed relati vely well. More specifically, in the Australian Credit Rating dataset, bagging (0.128) outperformed CV (0.131), the single best baseclassifier (0.132) and boosting (0.148). In the German Credit Rating dataset, CV (0.242) outperformed bagging (0.251), but these two ensemble methods bot h performed better than the single best baseclassifier (0.253) and boosting (0.255). In the Ban kruptcy dataset, bagging (0.126) outperformed boosting (0.128), CV (0.129) and the single best ba seclassifier (0.131). Note that in addition to performing well, bagging was the best performing cl assifier in the Bankruptcy dataset. This dataset is the most similar of the three to the fra ud domain; it is highly imbalanced with object attributes derived largely from financial data and with a classification objective similar to that in the fraud domain. Based on these findings I include d bagging in my experiment. I additionally included stacking, since prior fraud research using Greek fraud firms found stacking to perform well (see the literature review in Section 4.2, in particular Kotsiantis et al. 2006). Ensemble combiner method research has found that re latively simple methods like Majority Vote and Average perform either at the same level o r significantly better than more complex methods (Duin and Tax 2000), but that Information M arket based Fusion (IMF) outperforms these two combiner methods and Weighted Average whe n the true classes of all objects are revealed (see Chapter 2). Further, Chapter 2 shows that when only the true classes of objects classified as positive are known, which is a more r ealistic assumption in certain domains such as fraud detection, IMF outperforms Majority Vote, Ave rage and Weighted Average for datasets with low prior probabilities of the minority class (defined as datasets with prior probabilities below 40%). Based on these findings I include IMF i n my experiment. PAGE 93 81 4.3.1.3 Algorithm Overview and Tuning J4.8 is Wekas implementation of C4.5 version 8. C4 .5 generates a decision tree, which is a divide and conquer classification method. The algor ithm examines the information gain provided by each attribute and splits the data using the att ribute that provides the highest information gain. The created branches are then split further by agai n examining the information gain, within each branch, provided by the different attributes. If an attribute creates a split with a branch with no or only a small number of instances, then this attribu te is not used. The minimum number of instances permissible at a leaf is set by default t o two, but can be changed. To avoid overfitting, the branches are pruned using subtreeraising where internal decision nodes (branch splits) are subsumed by lower nodes, thereby eliminating one no de. The pruning is performed based on estimates of classification errors established usin g a confidence value that is set by default to 25% (Witten and Frank 2005). The reader is referred to Quinlan (1993) for further detail on C4.5. Witten and Frank (2005) suggest tuning C4.5 by test ing lower confidence values and higher minimum number of instances. I examined three confi dence values, 15%, 20% and 25%, and three minimum number of instances at a leaf, 2, 3 a nd 5, for a total of nine C4.5 configurations. Logistic regression is a statistical algorithm that estimates the probability of a certain event occurring by applying maximum likelihood estimation after transforming the dependent variable into the natural log of the odds of the firm being fraudulent. ANNs are nonlinear machine learning algorithms designed based on biological ne ural networks with interconnected input, hidden and output nodes.30 Both of these classifiers have been used extensive ly in prior accounting and fraud research. Following prior rese arch, logistic regression was not tuned, meaning that logistic regression was used with para meters set to their default values. I manipulated the learning time (epochs), learning ra te, the number of hidden nodes and momentum for the ANN (Feroz et al. 2000; Green and Choi 1997). A good learning time was first determined without manipulating the other set tings, which were set at their default Weka values (learning rate = 0.3; momentum = 0.2; the nu mber of hidden nodes = the sum of the number of attributes and the number of classes divi ded by 2). The learning time was determined by comparing the performance of ANNs created using different learning times, starting with 500 epochs (iterations through the training data) and t hen increasing the number of epochs by 500 in each new evaluation round. The learning time evalua tion was terminated when the performance did not improve over three consecutive learning tim e levels. At this point the learning time was 30 See Green and Choi (1997) for a good description o f ANNs. PAGE 94 82 set to the lowest learning time generating the high est performance, and the other settings were manipulated. The learning rate and momentum were bo th manipulated at three levels: 0.1, 0.3 and 0.5. The number of hidden nodes was manipulated at four levels: 4, 8, 12 and 16. Thus, after the learning time was determined, a total of 27 ANN con figurations were included in the experiment. SVM algorithms classify data points by learning hyp erplanes (linear models) that provide the best separation of a set of positive instances from a set of negative instances. In a classification context with n object features, a hyperplane is a l inear model with n1 dimensions intersecting a space of n dimensions into two parts. For example, in a twodimensional space the hyperplane is a line, and in a threedimensional space, the hyper plane is a plane. The objective is to find the hyperplane that maximizes the separation of the dat a points of the different classes. While the hyperplane is a linear model, the input data can be transformed before the hyperplane is constructed. The effect of learning the hyperplane on transformed data is a nonlinear decision boundary in the original space (Witten and Frank 20 05). The hyperplane that provides the best separation is found by solving a large quadratic pr ogramming optimization problem. To improve training speed, sequential minimal optimization (SM O) solves the quadratic programming problem by breaking it up into a series of smaller problems that are then solved analytically (Platt 1999). Weka implements SVM using SMO. Following Shi n et al. (2005), the complexity parameter C was manipulated at five values: 1, 10, 50, 75 and 100. Shin et al. (2005) also manipulated a radial basis kernel parameter, but si nce I used a polynomial kernel function I instead manipulated the exponent of the polynomial kernel at five values: 0.5, 1, 2, 5 and 10. Thus, 25 potential SVM configurations were included in the experiment. Furthermore, buildLogisticModels was set to true. This enabled p roper probability estimates by fitting logistic regression models to the SVM outputs. Stacking is an ensemble based method that combines the output of heterogeneous baseclassifiers, i.e., different types of classifiers, trained on the same data. The baseclassifier outpu t is combined using a metaclassifier. The metaclass ifier can be any classification algorithm, but is typically a relatively simple linear model or decis ion tree. To avoid overfitting the training data, the metaclassifier is trained on baseclassifiers evaluation output generated using test data rather than training data. This is typically accomplished using kfold cross validation (Wolpert 1992; Witten and Frank 2005). In the experiment, stacking was configured using the default Weka setting for the number of crossvalidation folds (s et at 10). In selecting baseclassifiers, I followe d prior research (Kotsiantis et al. 2006) and used al l the other experimental classifiers, including bagging, but excluding IMF. Note that IMF was imple mented using all the other classifiers in the experiment except for stacking. Thus, the baseclas sifiers selected were C4.5, SVM, ANN, PAGE 95 83 logistic regression and bagging. I included all cla ssifier configurations that provided the best performance for a given classifier at one or more e xperimental treatment levels. Based on recommendations to use a relatively simple metacla ssifier (Wolpert 1992), and experiments performed by Chan et al. (1999) and Prodromidis et al. (2000), I used a Bayesian classifier as the metaclassifier. In a dataset with a 0.2 prior prob ability of credit card fraud, Chan et al. (1999) evaluated the performance of four individual classi fiers, C4.5, Ripper, CART and Bayesian, and a metaclassifier, Bayesian. They found that the Baye sian metaclassifier provided the best performance followed by CART. In a similar credit c ard fraud detection study, Prodromidis et al. (2000) evaluated C4.5, Ripper, CART, Bayes and ID3, and a Bayesian metaclassifier. In addition to the dataset with the 20% fraud cases fr om Chan et al. (1999), they also included a dataset with 15% fraud cases. As in Chan et al. (19 99), Prodromidis et al. (2000) found that the Bayesian metaclassifier performed better than the other classifiers. I, therefore, used Bayesian as the metaclassifier in the stacking implementation. In Weka, NaiveBayes can be configured to use either kernel estimation or a single normal distrib ution for modeling numeric attributes. There is also an option to use supervised discretization to process numeric attributes. I manipulated these parameter settings in the experiment for a total of four stacking configurations. Bagging is an ensemble based method that combines t he output of homogenous baseclassifiers, i.e., all classification algorithms ar e of the same type, trained using different data. T he training data for the baseclassifiers are generate d by sampling with replacement from the original training data. Thus, the baseclassifiers learn on different subsets of the original training data and therefore, predict test cases differently. By combi ning multiple baseclassifiers trained on different data subsets, bagging reduces the varianc e of the individual baseclassifiers, which is especially beneficial for unstable baseclassifier algorithms (Witten and Frank 2005). Bagging combines the baseclassifiers using average. I base d my bagging implementation on Breiman (1996) and used decision trees as the baseclassifi ers, more specifically C4.5, and set the number of sampling iterations to 50. I then manipulated th e size of each bag at 75%, 100% and 125% (the default, 100% was used in Breiman 1996), and whethe r to calculate the outofbag error (yes or no), for a total of six bagging configurations. IMF is an ensemble combiner method that combines th e baseclassifier output using an information market based approach. The baseclassif iers are implemented as software agents that participate in an information market. The informati on market uses a parimutuel betting market mechanism. In this market agents place bets on the true class of objects. The bets are based on agents private probability estimates of object cla ss membership and market odds, which specify the potential payout for winning bets. These market odds are a function of the total bets placed in PAGE 96 84 the market, thus IMF has to solve a recursive probl em where agents place their bets based on market odds and where market odds are updated based on agents bets. This problem is solved in IMF using binary search. The reciprocals of the mar ket odds that minimize the difference between the total betting amount and the potential payouts for different classes represent the ensemble probability class estimates (see Chapter 2 ). Following Chapter 2, IMF was implemented using all the other experimental classifiers, inclu ding bagging, but excluding stacking. 4.3.2 Classification Cost Given a binary problem like fraud, there are four p otential classification outcomes: (1) True Positive (TP), a fraud firm is correctly classified as a fraud firm; (2) False Negative (FN), a fraud firm is incorrectly classified as a nonfraud firm; (3) True Negative (TN), a nonfraud firm is correctly classified as a nonfraud firm; and (4) F alse Positive (FP), a nonfraud firm is incorrectly classified as a fraud firm. Different c lassification costs are associated with TP, FN, TN and FP. TP and FP classifications have investig ation costs Ci that are incurred in order to find out whether the firm was actually fraudulent. FN cl assifications have fraud costs Cf, i.e., we missed some fraudulent activity and the fraud is co stly. FP classifications might have Cw costs related to wrongfully accusing a firm of fraud. All classifications have overhead costs, for example computer equipment, data loading, running t he classification algorithm, etc. The ratio of these costs impact training and evaluation of classifiers (Provost et al. 1998). Two classifi ers based on the same algorithm can produce different c lassification results if they are trained on data with different costs. The classifier configurations described in Section 4.3.1 were, therefore, further tuned by manipulating the relative error co st used when training the classifiers. However, as the Weka implementation of the classifiers exami ned in this essay were not costsensitive I undersampled the class with the lower relative err or cost. That is, I included fewer nonfraud firms in the sample, to achieve this objective. When evaluating classifiers using specific assumpti ons about relative costs, the results might not generalize to the population of interest. The r elative error cost used in the evaluation therefore, has to be estimated to reflect the relat ive error costs in the population. These costs are very difficult to estimate accurately. Researchers, therefore, typically examine the classification performance over a wide range of relative error cos ts (Feroz et al. 2000; Lin et al. 2003), which reduces the risk of cost misspecification, and prov ides richer information to other researchers and practitioners. I followed prior research (Lin et al 2003) and evaluated the classification performance over a wide range of relative error cos ts, specifically from 1:1 through 1:100. I also performed a focused analysis of classification perf ormance using relative error cost estimates PAGE 97 85 from Bayley and Taylor (2007). Bayley and Taylor (2 007) estimated that relative error costs were on average from 1:20 through 1:40. They based this estimate on an analysis of market reactions to fraud announcements reported by Beneish (1999). In the focused analysis, I used these estimates and examined the performance of the classifiers on three relative error costs 1:20, 1:30 and 1:40. Thus, this analysis provides insights into the rela tive performance of the classifiers under what is estimated to be realistic circumstances. 4.3.3 Prior Probability of Fraud Like classification costs, the prior probability of fraud impacts both classifier training and evaluation. Two homogeneous classifiers can produce different results if they are trained on data with different prior probabilities, and the perform ance of a trained classifier can change if the prior probabilities change. The classifiers, theref ore, have to be tuned by using different prior probabilities in the training sample The classifier configurations described in Sectio n 4.3.1 were, therefore, further tuned by manipulating the prior probability of fraud used when training the classifiers. Furthermore, to generalize to the popu lation of interest, the prior probability of fraud in the evaluation sample should reflect the prior probability of fraud in th e population. Prior financial statement fraud research has typically as sumed that P(fraud) is 0.5 for training, evaluation or both. In reality, most organizations do not commit financial statement fraud. Bell and Carcello (2000) estimate that only around 0.6% of all firm years are fraudulent, i.e., P(fraud)=0.006. This estimate is however likely to change over time, and be different for different samples examined. I, therefore, manipulated the pri or probability of fraud in the evaluation sample at three levels: low, medium and high. I defined m edium as P(fraud) = 0.006, i.e., the estimate from Bell and Carcello (2000), low as 50% of medium or P(fraud) = 0.003, and high as 200% of medium or P(fraud) = 0.012. 4.3.4 Dependent Variable Consistent with prior financial statement fraud res earch (Feroz et al. 2000; Lin et al. 2003), performance was measured using estimated relative c ost (ERC). This measure was selected instead of net benefit (used in Essay I) in order t o stay consistent with prior financial statement fraud research. The sum of ERC and net benefit per classified firm is equal to the average fraud cost of the classified firms (see Appendix 6), whic h is constant in a given dataset. Thus, as net benefit per classified firm increases ERC decreases by the same amount, and vice versa. Given specific estimates of prior fraud probability and relative error costs for evaluation purposes, and specific classification results, ERC is calculated as: PAGE 98 86 ERC = nFN/ nP x CFN x P(Fraud) + nFP/ nN x CFP x P(NonFraud), (34) where P(Fraud) and P(NonFraud) are the evaluation prior fraud and nonfraud probabilities, respectively; CFP is the cost of false positive classifications (wro ngful accusation costs, Cw, plus investigation cost, Ci) and CFN is the cost of false negative classifications (fra ud costs, Cf, minus investigation cost, Ci), both deflated by the lower of CFP or CFN; nFP is the number of false positive classifications, nFN is the number of false negative classifications, nP is the number of positive instances in the dataset and nN is the number of negative instances in the dataset ERC is derived for each classification algorithm at the th reshold that minimizes the ERC at a specific evaluation prior fraud probability and relative err or cost. 4.3.5 Data 4.3.5.1 Classification Objects: Fraud and NonFraud Firms D ata The fraudulent observations were located based on f irms investigated by the SEC for financial statement fraud and reported in Accounting and Audi ting Enforcement Releases (AAER) from the fourth quarter of 1998 through the fourth quart er of 2005. A total of 745 potential observations were obtained from this initial search (see Table 4.1). The data set was then reduced by eliminating: duplicates; financial companies; fi rms without the first fraud year specified in the SEC release; nonannual financial statement fraud; foreign corporations; releases related to auditors; notforprofit organizations; and fraud r elated to registration statements, 10KSB or IPO. Financial companies were excluded from the sam ple as the rules and regulations governing financial firms are substantially different from ot her firms. Firms committing nonannual financial statement fraud were excluded as quarterl y financial statements report financial information covering shorter time periods. Fraud re lated to registration statements were excluded as the purpose of these statements are different fr om annual financial statements reporting, and thus are likely to provide different incentives to commit fraud and to commit different types of fraud. An additional 75 fraud firms from Beasley (1 996) were added to the remaining 197 fraud firms, for a total of 272 fraud firms. From these 2 72 fraud firms, 221 firms with missing data in Compustat or Compact D/SEC for the fraud year or fo ur prior years, or with missing data in I/B/E/S for the fraud year, were deleted from the s ample because these data were needed to create the measures described in section 4.3.5.2. For exam ple, total discretionary accruals require data for five years, the current year and the four prior years, to calculate discretionary accruals for the current year and three prior years PAGE 99 87 To these remaining 51 fraud firms, I added 15,934 n onfraud firm years31 to obtain P(fraud) 0.003 (0.00319). I used data from three sources to construct the object features: (1) financial statement data for the current year t and each of the four years leading up to the curre nt year, t1, t2, t3 and t4, were collected from Compustat; (2) oneyearahead analyst earnings per share forecasts and actual earnings per share in the frau d year were collected from I/B/E/S; and (3) executive and director names, titles and company ho ldings were collected from Compact D/SEC. 4.3.5.2 Object Features Financial Statement Fraud Predict ors Financial statement fraud predictor research has ei ther been confirmatory or exploratory in nature. Confirmatory predictor research has focused on testing specific financial statement fraud hypotheses by developing and evaluating fraud predi ctors. The exploratory predictor research has taken a large number of variables, for example red flags proposed in SAS No. 53 and No. 82, and/or financial statement ratios, and either mappe d these variables to fraud frameworks and/or tested their explanatory power. These two research streams have evaluated a large number of potential financial statement fraud predictors and found a number of significant financial statement fraud predictors, as shown in Table 4.2. I leveraged these findings and included in my experiment those predictors that had been found to be significant and that were easily available from electronic sources. Other variables were exclu ded since they were less likely to be used in practice due to the difficulty in obtaining them (t hese variables are italicized in Table 4.2). See Table 4.3 for the final selection of the 41 predict ors included in the experiment and how these predictors were calculated. 31 Note that matching is typically used to increase i nternal validity by controlling for variables not manipulated or measured in the experiment. My goal in this research is not to improve our understandin g of factors that explain financial statement fraud, but rather to establish what classification algorithms are useful in predicting financial statement fraud. I, therefore, attempt to create a dataset that allows me to examine the performance impact of changing the prio r probability of fraud during classifier training. Assuming that a lower prior probability in the trai ning dataset than what is used for evaluation purpo ses will not improve performance when the minority clas s is already sparse, the minimum number of nonfrau d firms is equal to the number of fraud firms divided by the lowest prior probability tested minus the n umber of fraud firms, i.e., (51/0.003)51 = 16,949. Highe r prior probabilities can then be obtained for trai ning purposes by undersampling the majority class, i.e. eliminating nonfraud firms from the sample. PAGE 100 88 4.4. Experimental Procedures and Preprocessing 4.4.1 Preprocessing Before comparing the classifiers I determined the f ollowing: 1) prior fraud probability to use when training the classifiers; 2) method to use to filter the input data; 3) fraud predictors or attributes to include when training and evaluating the classifiers; and 4) how to tune the classifiers. Note that all these preprocessing step s were performed independently for each classifier. Thus, different training prior fraud pr obabilities, filtering methods and attributes, coul d be selected for different classifiers. I did not ne ed to perform these steps for IMF since this technique took the output from the tuned classifier s at the evaluation stage and simply combined these outputs. 4.4.1.1 Training Data Prior Fraud Probability In order to determine prior fraud probabilities for training the classifiers, the performance of the classifiers were compared at ten different prio r fraud probability levels in the training set: 0.32%, 0.6%, 1%, 1.5%, 2.5%, 5%, 10%, 20%, 40% and 60%. Note that 0.3% was the lowest prior probability used in the evaluation data adjus ted for a relative classification error cost of 1:1 and that 60% was the highest prior probability used in the evaluation, i.e., 1.2%, adjusted for a relative classification error cost of 1:100. At thi s initial step the classifiers were not tuned and Table 4.1 Sample Selection Panel A: Fraud Firms Firms investigated by the SEC for fraudulent financ ial reporting from 4Q 1998 through 3Q 2005 745 Less: Financial companies (33) Less: Not annual (10K) fraud (116) Less: Foreign companies (9) Less: Notforprofit organizations (10) Less: Registration, 10KSB and IPO related fraud ( 78) Less: Fraud year missing (13) Less: Duplicates (287) Remaining Fraud Observations 197 Add: Fraud firms from Beasley (1996) 75 Less: Not in Compustat or CompactD for first fraud year or four prior years or I/B/E/S for first fraud year (221) Usable Fraud Observations 51 Panel B: NonFraud Firms NonFraud Observations 15,934 PAGE 101 89 Table 4.2 Prior Research Financial Statement Fraud Predictors Author Dataseta Determinants in Final Modelb Algorithmc Beasley (1996) 75 SEC fraud cases matched with 62 nonfraud cases % outside directors logit Dechow et al. (1996) 92 SEC GAAP violators matched with 92 nonviolators value of issued securities to market value; total debt to total assets; demand for financing (ex ante); whether securities were issued; % insiders on board; insider holdings to total board holdings; whether the board has an audit committee; whether board has over 50% inside directors; whether the CEO is the founder; total accruals in year of manipulation paired logit Beneish (1997) 64 SEC fraud cases; 2,658 (1,989) aggressive accruers (with increasing sales) days in receivables index; total accruals to total assets; positive accruals dummy probit Gerety and Lehn (1997) 62 SEC fraud cases matched with 62 nonfraud cases Results did not show any significant determinants paired Green and Choi (1997) 86 SEC fraud cases matched with 86 nonfraud cases AFDA to net sales; AFDA to accounts receivable; net sales to accounts receivable; gross margin to net sales; accounts receivable to total assets; net sales; accounts receivable; AFDA ANN Fanning and Cogger (1998) 102 SEC fraud cases matched with 102 nonfraud cases % of outside directors; nonbig X auditor; whether CFO changed in last three years; whether LIFO; debt to equity; sales to total assets; whethe r accounts receivable > 1.1 of last years; whether gross margin % > 1.1 of last years ANN DA logit Summers and Sweeney (1998) 51 WSJ fraud cases matched with 51 nonfraud cases current minus prior year inventory to sales; prior year ROA to total assets current year logit Beneish (1999) 49 SEC fraud cases matched with 49 nonfraud cases insider trading; whether managers redeem stock appreciation rights; holding period return in the violation period; discretionary accruals in violation period probit PAGE 102 90 Table 4.2 (Continued) Lee et al. (1999) 56 SEC fraud cases matched with 60,453 nonfraud cases total accruals to total assets; total debt to total assets; whether new securities were issued; whether firm was listed on AMEX; whether SIC code >2999 and <4000 logit Beasley (2000) 66 SEC fraud cases matched with unknown number of industry benchmark companies from National Association of Corporate Directors whether technology company and board has an audit committee;whether health care and audit committee has 100% outside directors univariate Bell and Carcello (2000) 77 PRP fraud cases matched with 305 PRP nonfraud cases weak internal control environment; rapid company growth; undue emphasis on meeting earnings projections; management lied or was overly evasive; whether company is public logit Feroz et al. (2000) 42 SEC fraud cases matched with 90 nonfraud cases industry ROE minus firm ROE; times interest earned; accounts receivable to sales; Altman Z Score; the number of CEO turnovers; the number of CFO turnovers; the number of auditor turnovers ANN logit Lin et al. (2003) 40 SEC fraud cases matched with 160 nonfraud cases net sales; accounts receivable; AFDA; AFDA to net sales; AFDA to accounts receivable; accounts receivable to net sales; accounts receivable to total assets; gross margin to net sales ANN logit Dunn (2003) 113 SEC and WSJ fraud cases matched with 113 nonfraud cases control philosophy*structure; motivation Logit Kaminski et al. (2004) 79 SEC fraud cases matched with 79 nonfraud cases fixed assets to total assets; sales to accounts receivable; inventory to current assets; inventory to sales; sales to total assets DA Uzun et al. (2004) 133 WSJ fraud cases matched with 133 nonfraud cases % of outside directors; % of gray audit committee directors; % of gray compensation committee directors; % of gray nominating committee directors Logit PAGE 103 91 Table 4.2 (Continued) Chen and Sennetti (2005) 52 SEC fraud cases matched with 52 nonfraud cases research and development to sales ; gross profit margin; net profit margin; sales and marketing to sales; tax benefits from exercising of employee stock options to operating cash flows; changes in free cash flow; accounts receivable turnover; return on assets Logit a SEC = Dataset obtained from SEC releases; WSJ = Da taset obtained from Wall Street Journal news releases; and PRP = Dataset derived from proprietar y sources b Listed are variables that were: (1) significant in the primary multivariate analysis (p<0.05); or (2) included in the primary model if pvalues were not reported in the multivariate analysis and the focus was on evaluating models or significant in univaria te analyses if pvalues were not reported in the multivariate analysis and the focus was on evaluati ng predictors. Variables in italics were relatively difficult to obtain and are, therefore, less likely to be used in actual, real world analyses. AFDA = Allowance for Doubtful Accounts; ROA = Return on As sets; and ROE = Return on Equity c ANN = Artificial Neural Network; DA = Discriminant Analysis; and Paired = Paired ttest. Table 4.3 Experimental Financial Statement Fraud Predictorsa Variable Definitionf Datasource accounts receivable (data2) CompuSTAT accounts receivable to sales (data2/data12) CompuSTAT accounts receivable to total assets (data2/data6) CompuSTAT AFDA (data67) CompuSTAT AFDA to accounts receivable (data67/data2) CompuSTAT AFDA to net sales (data67/data12) CompuSTAT Altman Z score 3.3*(data18+data15+data16)/data6+0.9 99* data12/data6+0.6*data25*data199/data181+ 1.2*data179/data6+1.4*data36/data6 CompuSTAT Big 4 auditor IF 0 < data149 < 9 THEN 1 ELSE 0 Comp uSTAT PAGE 104 92 Table 4.3 (continued) current minus prior year inventory to sales (data3)/(data12)(data3t1)/(data12t1) CompuSTAT days in receivables index (data2/data12)/(data2t1/data12t1) CompuSTAT debt to equity (data181/data60) CompuSTAT demand for financing (ex ante) IF ((data308(data128t3+data128t2+ data128t1)/ 3) /(data4) < 0.5 THEN 1 ELSE 0 CompuSTAT evidence of CEO changeb IF CEO_Name<>CEO_Namet1 OR CEO_Namet1<>CEO_Namet2 OR CEO_Namet2 <> CEO_Namet3 THEN 1 ELSE 0 CompactD evidence of CFO changec IF CFO_Name<>CFO_Namet1 OR CFO_Namet1 <>CFO_Namet2 OR CFO_Namet2 <> CFO_Namet3 THEN 1 ELSE 0 CompactD fixed assets to total assets data7/data6 CompuSTAT four year geometric sales growth rate (data12/data12t3)^(1/4)1 CompuSTAT gross margin to net sales (data12data41)/data12 Co mpuSTAT holding period return in the violation period (data199 data199t1) / data199 CompuSTAT industry ROE minus firm ROE data172 / data 60 CompuSTAT insider holdings to total board holdings SUM(IF relationship code = CB, D, DO, H, OD THEN Insider_Holdings ELSE 0 ) / SUM(Insider_Holdings) CompactD inventory to sales data3/data12 net sales data12 CompuSTAT positive accruals dummy IF (data18data308) > 0 and (data18t1data308t1) > 0 THEN 1 ELSE 0 CompuSTAT percentage officers on the board of directorsd SUM(IF Executive_Name = Director_Name THEN 1 ELSE 0) / Number_Of_Directors CompactD prior year ROA to total assets current year (data172t1 / data 6t1) / data6 CompuSTAT property plant and equipment to sales data8/data12 CompuSTAT PAGE 105 93 Table 4.3 (continued) sales to total assets data12/data6 CompuSTAT the number of auditor turnovers IF data149<>data149t1 THEN 1 ELSE 0 + IF data149t1<>data149t2 THEN 1 ELSE 0 + IF data149t2<>data149t3 THEN 1 ELSE 0 CompuSTAT times interest earned (data18+data15+data16) / data 15 CompuSTAT total accruals to total assetse (data18data308) / data6 CompuSTAT total debt to total assets data181/data6 CompuSTAT total discretionary accrual DAt1 + DAt2 + DAt3, where DA = TA/Aestimated(NDA); TA/A = (data18data308) /data6t1; NDA = 1/data6t1 + (data12 data12t1 data2 + data2t1)/data6t1+ (data308 data308t1)/data6t1 + data7/data6t1 CompuSTAT unexpected employee productivity FIRM((data12/data29 data12t1/data29t1)/(data12t1/data29t1)) INDUSTRY((data12/data29 data12t1/data29t1)/(data12 t1/data29t1)) CompuSTAT value of issued securities to market value IF data396 > 0 THEN data396*data199/(data25*data199) ELSE IF (data25data25t1)>0 THEN ((data25 data25t1)*data199) / (data25*data199) ELSE 0 CompuSTAT whether accounts receivable > 1.1 of last years IF (data2/data2 t1) > 1.1 THEN 1 ELSE 0 CompuSTAT whether firm was listed on AMEX IF ZLIST=5, 15, 16, 17, 18 THEN 1 ELSE 0 CompuSTAT whether gross margin % > 1.1 of last years IF ((data12data41) / data12) / ((data12t1 data41t1)/data12t1) > 1.1 THEN 1 ELSE 0 CompuSTAT whether LIFO IF data59=2 THEN 1 ELSE 0 CompuSTAT whether meeting or beating analyst forecast IF EPS Analyst_Forecast >= 0 THEN 1 ELSE 0 I/B/E/ S whether new securities were issued IF (data25data25t1)>0 OR data396>0 THEN 1 ELSE 0 CompuSTAT PAGE 106 94 were instead implemented using their default settin gs as described in the classifier tuning section. For each classifier, the prior fraud probability of the training dataset that produced the lowest classification error cost was selected in each eval uation prior fraud probability and relative error cost treatment group. Note that optimal decision th resholds were used when calculating the ERC for each classifier and treatment group. This thres hold was determined empirically by calculating the ERC for each classifier at each treatment group 101 times as the threshold was changed from 0 to 1 in 0.01 increments. Thus, in this experiment a total of 13,332 ERC were derived (101 decision thresholds times 11 relative error cost tr eatment levels times three evaluation prior fraud probabilities times four classifiers). As seen in Table 4.4, ANN minimized ERC at a traini ng prior fraud probability of 0.6% for evaluation cost ratios from 1:1 through 1:50, 1:1 t hrough 1:20 and 1:1 through 1:10, and at an evaluation prior fraud probability of 0.3%, 0.6% an d 1.2%, respectively. For the remaining evaluation cost ratio and prior fraud probability l evels, ANN minimized ERC using a training set with 60% prior fraud probability. All the other alg orithms also minimized ERC using two or more prior fraud probabilities in the training set, as s hown in Table 4.4. In general, the results show, as expected, that the optimal prior fraud probability level in the training set increased as the evaluation relative cost and prior probability of f raud treatments increased. Table 4.3 (continued) whether SIC code larger (smaller) than 2999 (4000) IF 2999 PAGE 107 95 Table 4.4 Training Prior Fraud Probabilities: Selected Training Prior Fraud Probabilities for eac h Classifier at Different Levels of Evaluation Prior Fraud Probability and Evaluation R elative Error Costa Evaluation Factors Classifiers Relative Error Cost Prior Fraud Probability ANN SVM C4.5 Logistic Bagging Stacking 1:1 0.003 0.006 0.2 0.05 0.015 0.6 0.6 1:10 0.003 0.006 0.2 0.05 0.015 0.6 0.6 1:20 0.003 0.006 0 .2 0.05 0.015 0.6 0.6 1:30 0.003 0.006 0.2 0.05 0.015 0.6 0.6 1:40 0.003 0.006 0.2 0.05 0.015 0.6 0.6 1:50 0.003 0.006 0.2 0.05 0.015 0.6 0.6 1:60 0.003 0.6 0.2 0.05 0.015 0.6 0.6 1:70 0.003 0.6 0.2 0.05 0.015 0.6 0.6 1:80 0.003 0.6 0.2 0.05 0. 015 0.6 0.6 1:90 0.003 0.6 0.2 0.05 0.015 0.6 0.6 1:100 0.003 0.6 0.6 0.05 0.015 0.6 0.6 1:1 0.006 0.006 0.2 0.05 0.015 0.6 0.6 1:10 0.006 0.006 0.2 0.05 0.015 0.6 0.6 1:20 0.006 0.006 0.2 0.05 0.015 0.6 0.6 1:30 0.006 0.6 0.2 0.05 0. 015 0.6 0.6 1:40 0.006 0.6 0.2 0.05 0.015 0.6 0.6 1:50 0.006 0.6 0.6 0.05 0.015 0.6 0.6 1:60 0.006 0.6 0.6 0.1 0.015 0.6 0.6 1:70 0.006 0.6 0.6 0.1 0.015 0.6 0.6 1:80 0.006 0.6 0.6 0.1 0.015 0.6 0.6 1:90 0.006 0.6 0.6 0.1 0.015 0.6 0.6 1:100 0 .006 0.6 0.6 0.4 0.015 0.6 0.6 1:1 0.012 0.006 0.2 0.1 0.015 0.6 0.6 1:10 0.012 0.006 0.2 0.1 0.015 0.6 0.6 1:20 0.012 0.6 0.2 0.1 0.015 0.6 0.6 1:30 0.012 0.6 0.6 0.1 0.015 0.6 0.6 1:40 0.012 0.6 0.6 0.1 0.015 0.6 0.6 1:50 0.012 0.6 0.6 0.4 0.015 0.6 0.6 1:60 0.012 0.6 0.6 0.4 0.1 0.6 0.6 1:70 0.012 0.6 0.6 0.4 0.1 0.6 0.6 1:80 0.012 0.6 0.6 0.4 0.1 0.6 0.6 1:90 0.012 0.6 0.6 0.6 0.2 0.6 0.6 1:100 0.012 0.6 0.6 0.6 0.2 0.6 0.6 a For each evaluation treatment group (two columns to the left) the classifiers were evaluated using different prior fraud probabilities in the training dataset. The training dataset prior fraud probabil ity that generated the lowest ERC for each classifier i n each evaluation treatment group was then selected. This probability is shown for each classi fier and treatment group (six columns to the right) PAGE 108 96 4.4.1.2 Data Filtering I continued the preprocessing by evaluating whether filtering the data, using one of three filtering methods that transformed the continuous f raud predictors, improved classifier performance. These methods normalized, discretized and standardized the data. The utility of these methods and no filter were compared for each classifier at the training prior fraud probabilities that minimized ERC (see Table 4.4) at a cost ratio of 1:50 and a prior fraud probability of 0.3%, i.e., the median treatment lev el of the two evaluation factors. To discretize the attributes, I used the PKIDiscretize procedure in Weka, which implements equal frequency binning with the number of bins set to the square r oot of the number of nonmissing values. This approach has been shown to produce improved classif ication results (Witten and Frank 2005). The standardized data were obtained by subtracting attribute means from instance values and then dividing this difference by the standard deviation of the attribute. To normalize the data, the difference between each instance value and the mini mum instance value was divided by the range of the attribute values, i.e., maximum minus minimu m value. The standardized attributes had mean of zero and a standard deviation of one, while the normalized attributes had values that were between zero and one. The results reported in Table 4.5 show a relatively clear trend indicating that classifiers trained with data that were normalized and not filtered pro duced lower ERC than classifiers trained with data that were standardized, which in turn produced lower ERC than classifiers trained with data that were discretized. More specifically, the perfo rmance benefits, measured using ERC, of normalization, standardization and no filter were t he same for both ANN and logistic regression, and were better than discretization in 23 and 27 ou t of 33 comparisons, respectively. For C4.5 no filter was better than or as good as the other meth ods in 31 out of 33 comparisons, while normalization was better than or as good as the oth er methods at all evaluation levels for SVM. Normalization was also the best approach for stacki ng, for which normalization was better than or as good as the other methods in 30 out of 33 compar isons. Finally, no filter was superior in 22 out of 33 comparisons, and inferior to normalization an d standardization in the remaining 11 comparisons for bagging. 4.4.1.3 Fraud Predictor Utility One of my research objectives was to improve our un derstanding of what predictors provide utility to the different classifiers. Answering thi s question can facilitate more efficient data collection as predictors that provide little or no utility to the classifiers do not have to be collected. Furthermore, this knowledge can provide the foundation for reducing the dataset dimensionality (reducing the number of attributes), which can improve the performance of the PAGE 109 97 Table 4.5 Data Filtering: ERC for each Combination of Classifier and Data Fil tering Method at Different Levels of Evaluation Prior Fraud Probability and Ev aluation Relative Costa ANN Logistic Regression Relative Error Cost Prior Fraud Probability Normalized Standardized Discretized No Filter Normalized Standardized Discretized No Filter 1:1 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 1:10 0.003 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030 1:20 0.003 0.060 0.060 0.060 0.060 0.060 0.060 0.060 0.060 1:30 0.003 0.090 0.090 0.0 90 0.090 0.088 0.088 0.090 0.088 1:40 0.003 0.120 0.120 0.120 0.120 0.115 0.115 0.120 0.115 1:50 0.003 0.150 0.150 0.150 0.150 0.142 0.142 0.150 0.142 1:60 0.003 0.168 0.168 0.180 0.168 0.169 0.169 0.180 0.169 1:70 0.003 0.186 0.186 0.203 0.186 0.193 0 .193 0.210 0.193 1:80 0.003 0.205 0.205 0.223 0.205 0.217 0.217 0.240 0.217 1:90 0.003 0.223 0.223 0.244 0.223 0.241 0.241 0.270 0.241 1:100 0.003 0.241 0.241 0.265 0.241 0.265 0.265 0.300 0.265 1:1 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 1:10 0.006 0.060 0.060 0.060 0.060 0.060 0.060 0.060 0.060 1:20 0.006 0.120 0.120 0.120 0.120 0.115 0.115 0.120 0.115 1:30 0.006 0.168 0.168 0.180 0.168 0.169 0.169 0.180 0.169 1:40 0.006 0.204 0.204 0.223 0.204 0.217 0.217 0.240 0.217 1:5 0 0.006 0.241 0.241 0.264 0.241 0.265 0.265 0.300 0.265 1:60 0.006 0.277 0.277 0.306 0.277 0.313 0.313 0.360 0.313 1:70 0.006 0.314 0.314 0.347 0.314 0.362 0.362 0.420 0.362 1:80 0.006 0.350 0.350 0.388 0.350 0.409 0.409 0.480 0.409 1:90 0.006 0.387 0. 387 0.429 0.387 0.454 0.454 0.540 0.454 1:100 0.006 0.411 0.411 0.470 0.411 0.500 0.500 0.600 0.500 1:1 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 1:10 0.012 0.120 0.120 0.120 0.120 0.115 0.115 0.120 0.115 1:20 0.012 0.204 0.204 0 .223 0.204 0.217 0.217 0.240 0.217 1:30 0.012 0.277 0.277 0.305 0.277 0.313 0.313 0.360 0.313 1:40 0.012 0.350 0.350 0.388 0.350 0.408 0.408 0.480 0.408 1:50 0.012 0.410 0.410 0.470 0.410 0.500 0.500 0.600 0.500 1:60 0.012 0.457 0.457 0.552 0.457 0.590 0.590 0.720 0.590 1:70 0.012 0.504 0.504 0.635 0.504 0.680 0.680 0.831 0.680 1:80 0.012 0.551 0.551 0.717 0.551 0.769 0.769 0.936 0.769 1:90 0.012 0.598 0.598 0.775 0.598 0.858 0.858 1.040 0.858 1:100 0.012 0.645 0.645 0.812 0.645 0.947 0.947 1.143 0. 947 PAGE 110 98 Table 4.5 (Continued) C4.5 Support Vector Machines Relative Error Cost Prior Fraud Probability Normalized Standardized Discretized No Filter Normalized Standardized Discretized No Filter 1:1 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.0 03 0.003 1:10 0.003 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.030 1:20 0.003 0.060 0.060 0.060 0.060 0.060 0.060 0.060 0.060 1:30 0.003 0.090 0.090 0.090 0.090 0.090 0.090 0.090 0.090 1:40 0.003 0.120 0.120 0.120 0.120 0.120 0.120 0.120 0.120 1:50 0 .003 0.150 0.150 0.150 0.150 0.150 0.150 0.150 0.150 1:60 0.003 0.180 0.180 0.180 0.180 0.180 0.180 0.180 0.180 1:70 0.003 0.210 0.210 0.210 0.210 0.210 0.210 0.210 0.210 1:80 0.003 0.240 0.240 0.240 0.240 0.239 0.240 0.240 0.239 1:90 0.003 0.270 0.270 0.270 0.270 0.257 0.270 0.270 0.257 1:100 0.003 0.300 0.300 0.300 0.300 0.276 0.300 0.300 0.276 1:1 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 1:10 0.006 0.060 0.060 0.060 0.060 0.060 0.060 0.060 0.060 1:20 0.006 0.120 0.120 0.12 0 0.120 0.120 0.120 0.120 0.120 1:30 0.006 0.180 0.180 0.180 0.180 0.180 0.180 0.180 0.180 1:40 0.006 0.240 0.240 0.240 0.240 0.238 0.240 0.240 0.238 1:50 0.006 0.300 0.300 0.300 0.300 0.276 0.300 0.300 0.276 1:60 0.006 0.360 0.360 0.360 0.357 0.314 0. 360 0.360 0.314 1:70 0.006 0.420 0.418 0.420 0.415 0.348 0.420 0.420 0.348 1:80 0.006 0.480 0.476 0.480 0.472 0.380 0.480 0.480 0.380 1:90 0.006 0.540 0.533 0.540 0.530 0.413 0.540 0.540 0.413 1:100 0.006 0.600 0.591 0.600 0.588 0.446 0.600 0.600 0.446 1:1 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 1:10 0.012 0.120 0.120 0.120 0.120 0.120 0.120 0.120 0.120 1:20 0.012 0.240 0.240 0.240 0.240 0.238 0.240 0.240 0.238 1:30 0.012 0.360 0.360 0.360 0.357 0.313 0.360 0.360 0.313 1:40 0.012 0.480 0.475 0.480 0.472 0.380 0.480 0.480 0.380 1:50 0.012 0.599 0.591 0.600 0.588 0.446 0.600 0.600 0.446 1:60 0.012 0.717 0.706 0.720 0.703 0.512 0.720 0.686 0.512 1:70 0.012 0.830 0.817 0.840 0.814 0.577 0.787 0.752 0.577 1:80 0.012 0.943 0.9 28 0.960 0.924 0.628 0.824 0.818 0.643 1:90 0.012 1.004 0.966 0.988 0.981 0.677 0.862 0.884 0.709 1:100 0.012 1.006 0.966 0.988 0.981 0.727 0.900 0.949 0.761 PAGE 111 99 Table 4.5 (Continued) Bagging Stacking Relative Error Cost Prior Fraud Probability Normalized Standardized Discretized No Filter Normalized Standardized Discretized No Filter 1:1 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 1:10 0.003 0.029 0.030 0.029 0.029 0.030 0.030 0.030 0.030 1:20 0.003 0.059 0.060 0.059 0.058 0.060 0.060 0.060 0.060 1:30 0.003 0.088 0.090 0.088 0.086 0.090 0.090 0.090 0.090 1:40 0.003 0.118 0.120 0.118 0.115 0.120 0.120 0.120 0.120 1:50 0.003 0.147 0.150 0.147 0.144 0.150 0.150 0.150 0.150 1:60 0.003 0.176 0.180 0.176 0.173 0.180 0.180 0.180 0.1 80 1:70 0.003 0.202 0.206 0.206 0.202 0.210 0.210 0.210 0.210 1:80 0.003 0.227 0.230 0.235 0.220 0.235 0.235 0.240 0.234 1:90 0.003 0.244 0.244 0.265 0.236 0.250 0.250 0.270 0.252 1:100 0.003 0.258 0.258 0.294 0.253 0.264 0.264 0.300 0.270 1:1 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 1:10 0.006 0.059 0.060 0.059 0.058 0.060 0.060 0.060 0.060 1:20 0.006 0.118 0.120 0.118 0.115 0.120 0.120 0.120 0.120 1:30 0.006 0.176 0.180 0.176 0.173 0.180 0.180 0.180 0.180 1:40 0.006 0.227 0.230 0.235 0.219 0.235 0.235 0.240 0.234 1:50 0.006 0.258 0.258 0.293 0.252 0.264 0.264 0.300 0.270 1:60 0.006 0.286 0.286 0.329 0.285 0.293 0.293 0.360 0.307 1:70 0.006 0.315 0.315 0.364 0.318 0.323 0.323 0.420 0.343 1:80 0.006 0.343 0.343 0.399 0.35 1 0.352 0.352 0.480 0.379 1:90 0.006 0.369 0.369 0.435 0.384 0.382 0.382 0.540 0.416 1:100 0.006 0.393 0.393 0.470 0.417 0.411 0.411 0.593 0.452 1:1 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.012 1:10 0.012 0.118 0.120 0.118 0.115 0.1 20 0.120 0.120 0.120 1:20 0.012 0.227 0.229 0.235 0.219 0.234 0.234 0.240 0.233 1:30 0.012 0.286 0.286 0.328 0.285 0.293 0.293 0.360 0.306 1:40 0.012 0.342 0.342 0.399 0.351 0.352 0.352 0.480 0.379 1:50 0.012 0.392 0.392 0.469 0.417 0.410 0.410 0.591 0 .452 1:60 0.012 0.442 0.442 0.540 0.482 0.469 0.469 0.657 0.525 1:70 0.012 0.491 0.491 0.610 0.548 0.525 0.528 0.723 0.574 1:80 0.012 0.541 0.541 0.681 0.614 0.563 0.582 0.789 0.619 1:90 0.012 0.590 0.590 0.700 0.672 0.600 0.622 0.854 0.664 1:100 0.01 2 0.639 0.639 0.717 0.714 0.638 0.662 0.920 0.709 a For each evaluation treatment group (two columns to the left) the classifiers were evaluated using different data filtering methods. The data fi ltering method that generated the lowest ERC for each classifier in each evaluation treatment gr oup was then selected. The ERC are displayed for each classifier, data filtering method and eval uation treatment group (eight columns to the left). The lowest ERC for each classifier and treat ment group is highlighted in grey. PAGE 112 100 classifiers. To reduce the dimensionality, I used a Wrapper attribute selection technique, which has been shown to be effective (Hall and Holmes 200 3). The Wrapper approach examines the utility of the different attributes to a specific a lgorithm, as opposed to attribute selection techniques that examine the attributes without cons idering the specific classifier that will use the attributes. To evaluate the utility of the attribut es, the Wrapper uses internal crossvalidation iterations to compare the accuracy of a classifier using different sets of attributes. However, as discussed earlier, accuracy is not a good measure o f performance in the fraud domain unless the training prior fraud probability is altered to take into account the actual prior fraud probability and relative costs in the domain. Assuming an avera ge prior fraud probability of 0.006 (Bell and Carcello 2000) and an average relative cost of 1:30 (Bayley and Taylor 2007), I used a dataset with 51 fraud firms and 283 nonfraud firms for a p rior fraud probability of 0.18 (calculated as 0.006*30). A genetic search algorithm was used with in the Wrapper to search for the optimal attribute set. To evaluate the robustness of this s earch, I used 10fold crossvalidation to examine the selected attributes (note that the Wrapper also uses 5fold crossvalidation internally). In this examination I normalized the dataset attributes for SVM, logistic regression, stacking and ANN, and used nonfiltered dataset attributes for C4.5 a nd bagging. For logistic regression, nine variables were select ed in at least 40% of the folds: the number of auditor turnovers, total discretionary accruals, Bi g 4 auditor, accounts receivables, allowance for doubtful accounts, whether meeting or beating analy st forecasts, inventory to sales, unexpected employee productivity and value of issued securitie s to market value (see Table 4.6). Note that an additional twelve variables would have been added t o this selection if variables selected in at least 30% of the folds had been included. The SVM Wrapper selected six of the variables selected for logistic regression: the number of auditor turnover s, total discretionary accruals, Big 4 auditor, accounts receivables, allowance for doubtful accoun ts and whether meeting or beating analyst forecasts. The SVM Wrapper additionally selected Al tman Z score, percentage of executives on the board of directors, property plant and equipmen t to sales, fixed assets to total assets, allowance for doubtful accounts to accounts receiva ble, and total debt to total assets in at least 40% of the folds (10 additional variables in at lea st 30% of the folds). For C4.5, auditor turnover, Big 4 auditor and whether meeting or beating analys t forecasts were again selected, as well as accounts receivable to total assets, accounts recei vable to sales, gross margin to net sales, property plant and equipment to sales, industry ROE minus firm ROE, and positive accruals dummy in at least 40% of the folds (10 additional v ariables in at least 30% of the folds). The results for ANN, bagging and stacking also showed s ome overlap with the variables selected for logistic regression, SVM and C4.5. PAGE 113 101 Table 4.6 Attribute Selection: The Percentage of Folds in which Predictor was Selected for Each Classifiera Predictor SVM Logb ANN C4.5 Bagc Stackd Avge the number of auditor turnovers 70% 40% 10% 70% 70% 0% 46% total discretionary accruals 60% 60% 40% 30% 30% 0% 39% Big 4 auditor 50% 40% 30% 40% 50% 0% 38% accounts receivable 70% 50% 30% 0% 10% 100% 35% allowance for doubtful accounts 60% 80% 10% 20% 20% 0% 34% accounts receivable to total assets 30% 30% 60% 40% 20% 0% 33% accounts receivable to sales 20% 20% 20% 60% 50% 0% 31% whether meeting or beating forecast 50% 40% 20% 40% 10% 0% 29% evidence of CEO change 20% 30% 30% 30% 10% 100% 28% sales to total assets 30% 30% 20% 30% 10% 100% 28% inventory to sales 30% 50% 10% 30% 0% 100% 28% unexpected employee productivity 20% 40% 30% 20% 30% 0% 26% Altman Z score 60% 30% 0% 20% 20% 0% 24% percentage of executives on the board of directors 40% 30% 10% 30% 20% 0% 24% demand for financing (ex ante) 30% 30% 20% 30% 10% 0% 23% if account receivable grew by more than 10% 20% 30% 50% 0% 20% 0% 23% allowance for doubtful accounts to net sales 20% 20% 10% 30% 0% 100% 21% current minus prior year inventory to sales 0% 30% 10% 10% 30% 100% 21% gross margin to net sales 20% 10% 0% 40% 10% 100% 21% evidence of CFO change 20% 20% 40% 30% 0% 0% 21% holding period return in the violation period 30% 30% 40% 10% 0% 0% 21% property plant and equipment to sales 40% 10% 20% 40% 0% 0% 21% value of issued securities to market value 30% 50% 20% 0% 10% 0% 21% fixed assets to total assets 60% 30% 0% 0% 10% 0% 19% days in receivables index 0% 20% 20% 20% 0% 100% 18% four year geometric sales growth rate 30% 30% 20% 10% 0% 0% 18% industry ROE minus firm ROE 0% 0% 20% 40% 30% 0% 18% positive accruals dummy 20% 10% 10% 50% 0% 0% 18% times interest earned 30% 10% 10% 30% 10% 0% 18% if firm was listed on AMEX 20% 20% 20% 10% 10% 0% 16% if gross margin grew by more than 10% 20% 10% 30% 10% 10% 0% 16% PAGE 114 102 It is interesting to note that no variables were se lected by all algorithms. Whether a firm had a Big 4 auditor was selected by all classifiers but s tacking, while auditor turnover, total discretionary accruals and accounts receivable were selected by four out of six classifiers. Meeting or beating analyst forecasts and unexpected employee productivity were selected by three out of six classifiers. All the other predict ors were selected by less than 50% of the classifiers. In terms of the average number of fold s, auditor turnover was selected by the Wrapper most consistently and was on average selected in ab out half the folds examined (46%). Total discretionary accruals and Big 4 auditor followed c losely at 39% and 38%, respectively. Henceforth, I used the training prior fraud probabi lities that minimized ERC at the different evaluation prior fraud probability and relative err or cost treatment levels (see Table 4.4). For the different classifiers I filtered the data as follow s: 1) normalized the data for SVM, logistic regression and ANN; 2) standardized and used no fil ter for C4.5; and 3) normalized and used no filter for stacking and bagging. Finally, for each classifier the data dimensionality was reduced by only using the classifier specific Wrapper selected attributes. These preprocessing results are summarized in Table 4.7. 4.4.2 Classifier Evaluation For classifier tuning purposes, I examined the C4.5 SVM, ANN, logistic regression, stacking, bagging and IMF classifier configurations described in Section 4.3.1.1. The different Table 4.6 (continued) if new securities were issued 30% 10% 10% 10% 20% 0% 16% allowance for doubtful accounts to accounts receivable 40% 10% 0% 20% 10% 0% 16% debt to equity 20% 10% 10% 30% 10% 0% 16% total debt to total assets 40% 10% 20% 10% 0% 0% 16% a The percentage of folds in which the Wrapper incl uded the predictor in the final set of predictors. The percentages in bold show which predictors were sele cted to be included in the final dataset for each classifier. b Log = logistic regression c Bag = bagging d Stack = stacking. Stacking is relatively computat ionally expensive as it uses all the other classifi ers as baseclassifiers. When using the Wrapper with a gen etic search algorithm, the Wrapper runs stacking using an external, in addition to the internal, cro ssvalidation with genetic search in each fold. Thi s procedure becomes very computationally expensive an d external crossvalidation was, therefore, not performed for stacking. e Average percentage of folds in which the Wrapper included the predictor in the final set of predictors. To compute this average, selected stacking predictors were assigned a percentage of 0.17 if they were not selected in the tested fold and 0.47 if they were s elected in the tests fold. These percentages were t he averages of all the selected and not selected predi ctors for the other classifiers. PAGE 115 103 configurations were evaluated using tenfold strati fied cross validation with the preprocessing dataset stratified and then randomly split into ten mutually exclusive folds of approximately equal size that each contained approximately the same pri or class probabilities as the original dataset. The classifiers were then trained and evaluated ten times, each time using a different fold for evaluation and the nine remaining folds for trainin g. The classifier tuning result set is a combination of the results from all ten evaluation folds. For each classifier type, I compared the different classifier configurations using these res ult sets and selected the configuration with the lowest ERC for each relative error costs and evalua tion prior fraud probability combination. Using the selected configurations, the tenfold str atified cross validation was repeated ten times. The crossvalidation results from the ten it erations were then used to calculate ten ERC scores for each classifier configuration, relative error cost and evaluation prior fraud probability combination. The final results set used for classif ier evaluation were generated by taking the ten ERC measures of each classifier type, relative erro r costs and evaluation prior fraud probability combination generated by the configuration selected for this specific experimental manipulation. Thus, I did not necessarily take the configuration with the best final results for a specific combination of relative error cost and evaluation p rior fraud probability, but instead used the Table 4.7 Preprocessing Result Overview: Selected Training Pr ior Fraud Probabilities, Data Filtering Methods and Predictor s Classifiers Training Prior Fraud Probability Data Filtering Predictorsa ANN 0.2, 0.6 Normalize 1, 2, 3, 4, 5, 8, 13, 14, 22 24, 33, 35 SMO 0.006, 0.6 Normalize 2, 3, 4, 6, 9, 12, 16, 20, 21, 31 C4.5 0.05, 0.1, 0.4, 0.6 No Filter, Standardize 1, 3, 6, 7, 8, 19, 22, 27, 28 Logistic 0.015, 0.1, 0.2 Normalize 1, 2, 3, 4, 5, 8 11, 12, 23 Bagging 0.6 No Filter, Normalize 1, 2, 3, 7, 12, 1 8, 27 Stacking 0.6 No Filter, Normalize 4, 9, 10, 11, 17 18, 19, 25 a Predictor numbers represent the following predictor s: 1=the number of auditor turnovers, 2=total discretionary accruals, 3=Big 4 auditor, 4=accounts receivable, 5=allowance for doubtful accounts, 6=accounts receivable to total assets, 7=accounts r eceivable to sales, 8=whether meeting or beating forecast, 9=evidence of CEO chance, 10=sales to tot al assets, 11=inventory to sales, 12=unexpected employee productivity, 13=Altman Z score, 14=percen tage of executives on the board of directors, 15=demand for financing (ex ante), 16=whether accou nt receivable grew by more than 10%, 17=allowance for doubtful accounts to net sales, 18 =current minus prior year inventory to sales, 19=gross margin to net sales, 20=evidence of CFO ch ance, 21=holding period return in the violation period, 22=property plant and equipment to sales, 2 3=value of issued securities to market value, 24=fixed assets to total assets, 25=days in receiva bles index, 26=four year geometric sales growth rate, 27=Industry ROE minus firm ROE, 28=positive a ccruals dummy, 29=times interest earned, 30=whether firm was listed on AMEX, 31=whether gros s margin grew by more than 10%, 32=whether new securities were issued, 33=allowance for doubtful accounts to accounts receivable, 34=debt to equity, and 35=total debt to total asset s. PAGE 116 104 results from the preselected classifier configurati ons. The final result set contained ten observations per classifier type, relative error co sts and prior fraud probability treatment groups. 4.5. Results Table 4.8 reports descriptive classifier performanc e statistics. The reported estimated relative cost is the average for each classifier at all trea tment levels. Thus, it is not surprising that the range of ERC is high and that the standard deviatio n is almost as high as the mean. For example, the standard deviation and mean ERC for logistic re gression are 0.2367 and 0.2916, respectively. The descriptive statistics provide an initial indic ation that logistic regression, bagging and SVM perform well. Logistic regression performs particul arly well, performing significantly32 better (p<0.05) than ANN, IMF, C4.5 and stacking. It is, h owever, important to remember that these are descriptive statistics that report on the performan ce of the classifiers on average. Thus, we do not know under what specific evaluation prior fraud pro babilities and relative cost conditions logistic regression, bagging and SVM outperform the other al gorithms, and even if perhaps these other algorithms are better performers under certain cond itions. To determine whether the differences noticed in the descriptive statistics depend on the level of evaluation prior fraud probabilities or relative cost, I examined the interactions between prior 32 Significance determined using TukeyKramer HSD and blocking for the effect of evaluation prior fraud probability and relative cost on estimated relative cost. Table 4.8 Descriptive Statistics of Classifier Estimate Rela tive Costa Classifier Min Median Max Mean Standard Deviation Connecting Letters Reportb Logistic 0.0026 0.2167 0.9100 0.2916 0.2367 D Bagging 0.0028 0.2400 0.8858 0.2978 0.2275 D C SVM 0.0025 0.2306 0.8946 0.2989 0.2453 D C ANN 0.0030 0.2400 0.8912 0.3046 0.2320 C IMF 0.0026 0.2400 0.9880 0.3053 0.2463 C C45 0.0028 0.2400 1.0614 0.3301 0.2734 B Stacking 0.0030 0.2400 0.9880 0.3414 0.2905 A a Classifier performance is measured using Estimated Relative Cost. Note that lower values are preferred over higher values. b Levels not connected by the same letter are signif icantly different at a pvalue of 0.05 using TukeyKramer HSD and blocking for the effect of eva luation prior fraud probability and relative cost on estimated relative cost. PAGE 117 105 fraud probability and classification algorithm, and between relative error cost and classification algorithm using the following regression model: ERC = 0 + 1Classification Algorithm + 2Prior Fraud Probability + (35) 3Relative Error Cost + 4Classification Algorithm* Prior Fraud Probability + 5Classification Algorithm* Relative Error Cost + The interaction between prior fraud probability and classification algorithm (p<0.001), and the interaction between relative error cost and classif ication algorithm (p<0.001) were both significant. Thus, the relative performance of the classifiers depends on the level of evaluation prior fraud probability and on the level of relativ e error cost. The parameter estimates reported in Table 4.9 show that as prior fraud probability incr eases, the performance of bagging (p=0.005), ANN (p=0.039) and logistic regression (p=0.039) imp roves relative to the other classifiers (the magnitude of the performance improvement is in the order listed), the relative performance of IMF (p=0.347) and SVM (p=0.175) does not change, wh ile the relative performance of stacking (p<0.001) and C4.5 (p<0.001) deteriorates. The chan ge in relative performance is similar when relative error cost increases; i.e., as the cost of FN errors becomes higher relative to the cost of FP. It is also interesting to note that the interce pts of bagging (p=0.007), logistic regression (p<0.001) and SVM (p=0.007) are lower than that of C4.5 and stacking. These results indicate that bagging, logistic regression and SVM outperfor m C4.5 and stacking at all levels and that the performance advantage of bagging and logistic regre ssion is increasing in both evaluation prior fraud probability and relative error cost. Panel A in Figure 4.1 shows the relative performanc e of the classification algorithms at different levels of relative cost when the evaluati on prior fraud probability is 0.003, Panel B at 0.006, and Panel C at 0.012. Figure 4.1 corroborate s the statistical findings showing that the relative performance of stacking and C4.5 deteriora tes as the relative error cost and prior fraud probability increases. While the other results are also supported, SVM appears to perform slightly better than what was indicated by the linear regres sion results. At both the low (0.003) and middle (0.006) evaluation prior fraud probability levels ( Figure 4.1, Panel A and Panel B) logistic regression appears to dominate the other classifier s except for SVM at all relative error costs except for at very low relative error costs (high F N cost), where all classifiers appear to perform similarly. Note that the lowest relative error cost level examined assumed that the cost of not detecting a financial statement fraud is the same a s the cost of wrongfully suspecting that a financial statement is fraudulent, a relatively unl ikely scenario. When the evaluation prior PAGE 118 106 probability level is high (Figure 4.1, Panel C), lo gistic regression still performs well, especially a t the nonextreme relative error costs, i.e., in the error cost range that is the most realistic. Furthermore, at high prior fraud probability levels bagging performs either on par with logistic regression or better, at all relative error costs. Thus, bagging appears to provide the best overall performance when the prior fraud probability is hig h. At the high evaluation prior fraud probability level, ANN also performs relatively wel l, but not better than bagging at any relative Table 4.9 Regression Results for Testing Interactions between Classifier and Prior Fraud Probability, and Classif ier and Relative Error Costa Variableb Estimate Std Error tratio Prob>t Intercept 0.256 0.005 52.66 <0.001 Classifier [ANN] 0.005 0.004 1.19 0.236 Classifier [Bagging] 0.01 2 0.004 2.72 0.007 Classifier [C45] 0.020 0.004 4.48 <0.001 Classifier [IMF] 0.005 0.004 1.03 0.303 Classifier [Logistic] 0.018 0.004 4.09 <0.001 Classifier [ SVM ] 0.012 0.004 2.71 0.007 Classifier [Stacking] 0.031 0.004 7.02 <0.001 Prior Fraud Probability 39.565 0.489 80.86 <0.001 Relative Error Cost 0.006 0.000 99.09 <0.001 Classifier [ANN]*Prior Fraud Prob. 0.000 0.000 2.07 0.039 Classifier [Bagging]*Prior Fraud Prob. 0.000 0.000 2.78 0.005 Classifier [C45]*Prior Fraud Prob. 0.000 0.0 00 3.40 0.001 Classifier [IMF]*Prior Fraud Prob. 0.000 0.000 0.94 0.347 Classifier [Logistic]*Prior Fraud Prob. 0.000 0.000 2.06 0.039 Classifier [ SVM ]*Prior Fraud Prob. 0.000 0.000 1.36 0.175 Classifier [Stacking]*Prior Fraud Prob. 0.001 0.000 5 .63 <0.001 Classifier [ANN]*Relative Error Cost 3.340 1.199 2.79 0.005 Classifier [Bagging]*Relative Error Cost 4.143 1.199 3.46 0.001 Classifier [C45]*Relative Error Cost 4.019 1.199 3.35 0.001 Classifier [IMF]*Relative Error Cost 1.334 1.199 1. 11 0.266 Classifier [Logistic]*Relative Error Cost 2.021 1.199 1.69 0.092 Classifier [SVM ]*Relative Error Cost 0.640 1.199 0.55 0.585 Classifier [Stacking]*Relative Error Cost 7.236 1.199 6.04 <0.001 Adjusted R 2 0.878 RMSE 0.088 n 2310 a Twotailed tests reported as directional predictio ns are not made. b Dependent variable is Estimated Relative Cost. PAGE 119 107 error cost level. Finally, IMF is consistently a ro bust middle performer that outperforms the worse performing classifiers but is outperformed by the b est performing classifiers, regardless of which specific classifiers perform well or poorly. These results corroborate what was noted earlier in the regression analysis, and additionally indicates that SVM performs relatively well at nonextreme values of relative error cost when the prior fraud probability is either low or medium, an d that IMF provides consistent, robust performance. Note that while SVM appears to perform relatively well at medium treatment levels, it appears to lose ground at high relative error costs, especially when the prior fraud probability is high. This finding explains why the interaction estimates are lower for SVM than for ANN, bagging and logistic regression. Based on these results it appears that logistic regression is a robust performer that often perform s better than and rarely falls far behind the other classifiers. SVM appears to provide good perf ormance over relevant ranges, but even so does not appear to provide any performance advantag e when compared to logistic regression. Finally, bagging and ANN appear to perform relative ly well at certain, though perhaps less relevant ranges, which explains why bagging and ANN overall performed relatively well. To validate these observations I created three rela tive error cost groups, low (1:1, 1:10, and 1:20), middle (1:30, 1:40, 1:50, 1:60, and 1:70) an d high (1:80, 1:90, and 1:100). Using the three relative error cost groups and the three original p rior fraud probability levels, nine treatment groups were created. I examined an ANOVA model wher e the only main effect was the classifier algorithm within each of these nine treatment group s: ERC = 0 + 1Classification Algorithm + (36) The posthoc analysis using TukeyKramer HSD report ed in Table 4.10, shows that SVM significantly outperforms all other classifiers and that logistic regression significantly outperforms C4.5, stacking and ANN when the relativ e error costs and the prior fraud probability are low. Logistic regression and SVM significantly outperform all the other classifiers at: (1) middle and high relative error costs when the prior fraud probability is 0.003, (2) low and middle relative error costs when the prior fraud probabili ty is 0.006 and (3) low relative error cost when the prior fraud probability is 0.012. When the prio r fraud probability is 0.006, logstic regression significantly outperforms all the other classifiers except for bagging when the relative error cost is high. SVM significantly outperforms all the other c lassifiers except for stacking at middle relative error costs and a prior fraud probability of 0.012. At high relative error cost and high prior fraud probability stacking and C4.5 perform significantly worse than all the other classifiers. Overall, logistic regression and SVM perform well at all rel ative error cost and prior fraud probability levels. PAGE 120 108 Panel A: Prior Fraud Probability = 0.003 Panel B: Prior Fraud Probability = 0.006 Panel C: Prior Fraud Probability = 0.012 Panel D: Prior Fraud Probability = 0.006 Figure 4.1: Classifier Comparison Estimated Relativ e Cost 0 0.1 0.2 0.3 0.4 0.5 0.6Mean(ERC) 0 10 20 30 40 50 60 70 80 90 100 FN_Cost Algorithm=ANN Algorithm=Bagging Algorithm=C45 Algorithm=IMF Algorithm=Logistic Algorithm=Stacking Algorithm=SVM 0 0.05 0.1 0.15 0.2 0.25 0.3Mean(ERC) 0 10 20 30 40 50 60 70 80 90 100 FN_Cost Algorithm=ANN Algorithm=Bagging Algorithm=C45 Algorithm=IMF Algorithm=Logistic Algorithm=Stacking Algorithm=SVM 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Mean(ERC) 0 10 20 30 40 50 60 70 80 90 100 FN_Cost Algorithm=ANN Algorithm=Bagging Algorithm=C45 Algorithm=IMF Algorithm=Logistic Algorithm=Stacking Algorithm=SVM 0.1 0.15 0.2 0.25Mean(ERC) 20 30 40 FN_Cost Algorithm=ANN Algorithm=Bagging Algorithm=C45 Algorithm=IMF Algorithm=Logistic Algorithm=Stacking Algorithm=SVM PAGE 121 109 Table 4.10 Comparison of Treatment Groups TukeyKramer HSD Connected Letters Report Panel A: Prior Fraud Probability = 0.003 Relative Error Cost Range Classifier Lowa Classifier Middlea Classifier Higha ANN A ANN A Stacking A Stacking A Stacking A ANN A C45 A C45 A B C45 A Bagging A B Bagging B C Bagging A B IMF A B IMF C IMF B Logistic B Logistic D SVM C SVM C SVM D Logistic C Panel B: Prior Fraud Probability = 0.006 Relative Error Cost Range Classifier Lowa Middlea Higha ANN A Stacking A Stacking A Stacking A C45 A B C45 A C45 A ANN A B IMF B Bagging A Bagging B C ANN B IMF A IMF C SVM B Logistic B SVM D Bagging B C SVM B Logistic D Logistic C Panel C: Prior Fraud Probability = 0.012 Relative Error Cost Range Classifier Lowa Classifier Middlea Classifier Higha Stacking A ANN A Stacking A ANN A B Bagging A C45 B C45 A B C45 B Logistic C Bagging A B IMF B SVM C IMF B Logistic B IMF C SVM C Stacking B C ANN C Logistic C SVM C Bagging C a Levels not connected by the same letter are signif icantly different at a pvalue of 0.05 using TukeyKramer HSD and blocking for the effect of evaluatio n prior fraud probability and relative cost on estimated relative cost. PAGE 122 110 As described earlier, relative error costs between 1:20 and 1:40, and prior fraud probability of 0.006 are believed to be good estimates of actual c osts and prior probabilities associated with financial statement fraud (Bell and Carcello 2000; Bayley and Taylor 2007). Figure 4.1, Panel D shows the estimated relative cost of the classifier s at evaluation prior fraud probability of 0.006 and at relative error cost of 1:20, 1:30 and 1:40. It appears that logistic regression and SVM are superior when compared to the other classifiers ove r this relative error cost range. To validate this observation, I examined model (36) where the only m ain effect was classifier algorithm, within each of these three treatment groups. The posthoc analysis using TukeyKramer HSD and pa irwise ttest, reported in Table 4.11, confirm that logistic regression and SVM consistent ly outperform the other classifiers at what are believed to be good estimates of actual real world prior fraud probability and relative error cost. 4.6. Discussion My experiments show that logistic regression, a rel atively wellknown and established classifier, and SVM outperform or perform as well a s a relatively comprehensive set of data mining algorithms. This result is somewhat surprisi ng considering that prior fraud research typically found ANN to either outperform or perform on par with logistic regression. However, this study differs from prior fraud studies in that it evaluates the classifiers using a highly imbalanced dataset, i.e., the minority class has a low prior probability, where the prior minority class probability is manipulated in both the traini ng and the evaluation data. It also differs from most prior fraud research by examining the performa nce using optimal classification threshold levels for the different classifiers given a specif ic evaluation manipulation. Finally, this study differs from prior fraud research that compares cla ssification algorithms by not only including a relatively complete set of attributes, but also usi ng a Wrapper method to select attributes for each classifier. Thus, while the result that logistic re gression and SVM outperform or perform as well as the other classifier is somewhat surprising it d oes not necessarily contradict these prior findings. Rather, the results show that when taking these additional factors into account logistic regression and SVM perform well in the fraud domain A potential explanation as to why logistic regression performs well in this study is that logi stic regression produces relatively accurate probability estimates (Perlich et al. 2003). Since the probability estimates generated by the different classifiers are compared in this study to various thresholds to find the threshold that minimizes ERC, the relative performance of logistic regression will be better than if performance is measured using classification results based on t he default threshold of 0.5, which has been used in a majority of prior fraud research (Fanning and Cogger 1998; Feroz et al. 2000; Lin et al. 2003; PAGE 123 111 Table 4.11 Classifier Average Estimated Relative Cost at Best Estimates of Relative Error Cost and Prior Fraud Probability Lev els Panel A: Prior Fraud Probability = 0.006 and Relati ve Error Cost = 1:20 Pairwise ttests Classifier TukeyKramer HSD a Logistic SVM IMF Bagging C4.5 Stacking ANN A 0.0100 (p<0.0001) 0.0113 (p<0.0001) 0.0037 (p=0.009) 0.0028 (p=0.0438) 0.0009 (p=0.4954) 0.0000 (p=1.000) Stacking A 0.0100 (p<0.0001) 0.0113 (p<0.0001) 0.0037 (p=0.009) 0.0028 (p=0.0438) 0.0009 (p=0.4954) C45 A 0.009 (p<0.0001) 0.0104 (p<0.0001) 0.0028 (p=0.0488) 0.0019 (p=0.1751) Bagging A 0.0072 (p<0.0001) 0.0085 (p<0.0001) 0.0009 (p=0.5263) IMF A 0.0063 (p<0.0001) 0.0076 (p<0.0001) SVM B 0.0013 (p=0.3435) Logistic B Panel B: Prior Fraud Probability = 0.006 and Relati ve Error Cost = 1:30 Pairwise ttests Classifier TukeyKramer HSD a SVM Logistic IMF Bagging C4.5 Stacking ANN A 0.0169 (p<0.001) 0.0169 (p<0.001) 0.0065 (p=0.004) 0.0042 (p=0.059) 0.0015 (p=0.490) 0.0000 (p=1.000) Stacking A 0.0169 (p<0.001) 0.0169 (p<0.001) 0.0065 (p=0.004) 0.0042 (p=0.059) 0.0015 (p=0.490) C45 A 0.0154 (p<0.001) 0.0154 (p<0.001) 0.005 (p=0.027) 0.0027 (p=0.225) Bagging A 0.0127 (p<0.001) 0.0127 (p<0.001) 0.0023 (p=0.304) IMF A 0.0104 (p<0.001) 0.0104 (p<0.001) Logistic B 0.0000 (p=0.994) SVM B PAGE 124 112 Kotsiantis et al. 2006; Kirkos et al. 2007). Anothe r potential explanation to why logistic regression performs well in this study is that logi stic regression performs relatively well when it is difficult to separate signal from noise (Perlich et al. 2003). The area under the curve for logisti c regression (AUC = 0.823), the measure of signal sep arability used in Perlich et al. (2003), is however, between the lowand highseparability gro ups found in their study. Although the results are somewhat surprising, the e xperimental findings are encouraging since neither logistic regression nor SVM require extensi ve tuning and do not require a lot of resources for training and evaluation purposes. Furthermore, logistic regression is widely used and accepted, and produces results that are relatively easy to interpret and understand. The experiment shows that out of 41 variables that have been found to be good predictors in prior fraud research, logistic regression uses a su bset of only nine variables: auditor turnover, total discretionary accruals, Big 4 auditor, accoun ts receivable, allowance for doubtful accounts, meeting or beating analyst forecasts, inventory to sales, unexpected employee productivity and value of issued securities to market value. Across all classifiers only six variables are selected by three or more classifiers: auditor turnover, total discretionary accruals, Big 4 auditor, accounts Table 4.11 (continued) Panel C: Prior Fraud Probability = 0.006 and Relati ve Error Cost = 1:40 pairwise ttests Classifier TukeyKramer HSD a Logistic SVM IMF Bagging C4.5 Stacking ANN A 0.0243 (p<0.001) 0.0219 (p<0.001) 0.010 (p=0.006) 0.0056 (p=0.112) 0.0026 (p=0.461) 0.0007 (p=0.833) Stacking A 0.0235 (p<0.001) 0.0212 (p<0.001) 0.0093 (p=0.010) 0.0049 (p=0.166) 0.0019 (p=0.598) C45 A 0.0217 (p<0.001) 0.0193 (p<0.001) 0.0074 (p=0.038) 0.0031 (p=0.387) Bagging A 0.0186 (p<0.001) 0.0163 (p<0.001) 0.0044 (p=0.216) IMF A 0.0143 (p=0.001) 0.0119 (p=0.001) SVM B 0.0024 (p=0.501) Logistic B a Levels not connected by the same letter are signif icantly different at a pvalue of 0.05 using TukeyKramer HSD and blocking for the effect of evaluatio n prior fraud probability and relative cost on estimated relative cost. PAGE 125 113 receivable, meeting or beating analyst forecasts, a nd unexpected employee productivity. These results, and the results reported for each classifi er (see Table 4.7) can be used by practitioners as guidance for selecting variables to be included in fraud detection models. Another implication of this finding is that research developing new fraud predictors needs to examine the utility of the fraud predictor using more than one classification algorithm, i.e., in addition to using logistic regression other classifiers like SVM and C4.5 shou ld be used when examining the utility of fraud predictors. The findings in this essay need to be corroborated by future research using different datasets to evaluate the generalizability of the results. Howev er, to increase the generalizability of the results I used 10fold cross validation where the classific ation performance was measured on data not used for training. This 10fold cross validation wa s repeated ten times to reduce the possibility of the results only pertaining to a specific 10fold c ross validation seed. Thus, the results should be generalizable to the population represented by the sample. However, datasets with different fraud firms can be used to validate that the sample is a good representative sample of fraud firms. A natural extension of this research is to examine additional classification algorithms. While I select classification algorithms based on findings in prior research, it is possible that other classification algorithms will provide relatively g ood performance in financial statement fraud detection. Related to this extension is the possibi lity of not only tuning classification algorithms for the fraud domain, but designing novel classifie rs for the specific purpose of detecting fraud. Finally, data mining research focusing on the class imbalance problem has proposed a number of sampling techniques such as SMOTE to improve classi fication performance (Chawla, et al 2002). The utility of these techniques in predicting fraud needs to be evaluated. PAGE 126 114 Chapter 5. Dissertation Conclusion The first essay, Information Market Based Decision Fusion, introduces a novel combiner method based on theoretical and empirical findings in information market research. The results show that when the true classes of objects are only revealed for objects classified as positive and the positive ratio is low, IMF outperforms Majority Average and Weighted Average. IMF outperforms Majority and performs on par with Avera ge and Weighted Average, when the true classes of objects are only revealed for objects cl assified as positive and the positive ratio is high Furthermore, IMF outperforms Weighted Average and M ajority, and at a marginal level of significance, outperforms Average, when the true cl asses of all objects are revealed. This research contributes to multiclassifier combination combine r method research and, thereby, also to the broader research stream of ensemblebased classific ation and to classification algorithm research in general. The second essay, The Effect of Discretionary Accru als, Earnings Expectations and Unexpected Productivity on Financial Statement Frau d: An Empirical Analysis, develops three novel fraud predictors: total discretionary accrual s, meeting or beating analyst forecasts and unexpected revenue per employee. The results show t hat the three variables are significant predictors of fraud. This research contributes to t he confirmatory fraud predictor research stream, which is part of a broader research area that focus es on developing and testing financial statement fraud predictors. The third essay, Financial Statement Fraud Detectio n Using Data Mining: An Empirical Analysis, takes artifacts from the broader research streams to which the first two essays contribute, i.e., classification algorithm research and financial statement fraud predictor research, and compares the utility of artifacts developed in these research streams in detecting financial statement fraud. I find that logistic regression an d SVM perform well relative to the other classification algorithms tested, i.e., C4.5, ANN, stacking, bagging and IMF. Logistic regression and SVM also provide the best performance under wha t is believed to be the most relevant prior probability and relative cost estimates. The result s additionally show that out of 41 variables that have been found to be good predictors in prior frau d research, only six variables are selected by three or more classifiers: auditor turnover, total discretionary accruals, Big 4 auditor, accounts PAGE 127 115 receivable, meeting or beating analyst forecasts, a nd unexpected employee productivity. While other predictors are used by the classifiers their use is limited to only one or two classifiers. Thus the utility of a given predictor, other than the si x listed above, is dependent on the specific classifier used. The results from Essay I in combination with the re sults from Essay III show that IMF performs better than existing combiner methods and better than stacking, an ensemblebased classification algorithm. Stacking is similar to IM F in that both methods use all the individual classifiers in the experiment as baseclassifiers a nd then combine the results of these classifiers into an overall ensemble decision; stacking using a metalearner and IMF using an information market based combiner method. Thus, the information market based combiner method developed in Essay I aggregates the baseclassifiers decision s more effectively than the metalearner used in stacking. The results also show that IMF performs o n par with bagging, another ensemblebased classification algorithm. Bagging uses the combiner method AVG. Given that the positive ratio in the fraud domain is low and that the first essay sh ows that IMF outperforms AVG when the positive ratio is low, I expected IMF to outperform bagging. However, the homogeneous baseclassifiers in bagging are trained using different data samples than the heterogeneous baseclassifiers in IMF, which might explain why bagging performs on par with IMF even though bagging uses AVG. Assuming that IMF provides better performance than AVG and that the ensemble in bagging provides better performance tha n the ensemble used in IMF, it might be possible to improve the performance over bagging an d IMF by combining the two algorithms. Future research can investigate the effectiveness o f using IMF to combine the decisions of the baseclassifiers in bagging and determine if baggin g with IMF performs better than bagging with AVG. The results from Essay II in combination with the r esults from Essay III show that the three predictors created in Essay II, unexpected revenue per employee, total discretionary accruals and meeting or beating analyst forecasts, are significa nt predictors of fraud and provide utility to classification algorithms. The three predictors pro vide insights into (1) conditions under which fraud is more likely to occur (total discretionary accruals is high), (2) incentives for fraud (firms desire to meet or beat analyst forecasts), and (3) how fraud is committed and can be detected (detection of revenue fraud using unexpected employ ee productivity). These three predictors are also among the group of six predictors selected by 50 percent or more of the classification algorithms. These results indicate that in a group of 41 fraud predictors that prior research has found to be significant predictors of fraud, the th ree predictors developed in Essay II are among the top six variables in terms of utility provided to the classification algorithms in fraud PAGE 128 116 prediction. Thus, the predictors developed in Essay II provide new knowledge about financial statement fraud and are useful in financial stateme nt fraud classification. To conclude, IMF performs well relative to existing combiner methods over a range of different domains, as shown in Essay I. In the frau d detection task, IMF is a robust performer and shows some promise when compared to other ensemble based methods. The three variables developed in Essay II were statistically significan t predictors of fraud and these were shown to be robust. These variables made up half of the six var iables selected from a group of 41 by 50 percent or more of the classification algorithms. I finally provide guidance for future fraud detection efforts by showing that logistic regressi on and SVM generally provide the best performance and specifically provide the best perfo rmance under what is believed to be the most realistic conditions. I also identified which predi ctors are overall most useful to the different classification algorithms. Six variables were selec ted by 50 percent or more of the classification algorithms: auditor turnover, total discretionary a ccruals, Big 4 auditor, accounts receivable, meeting or beating analyst forecasts, and unexpecte d employee productivity. PAGE 129 117 Chapter 6. References ACFE, 2006, Report to the Nation on Occupational Fr aud and Abuse, Association of Certified Fraud Examiners, Austin, TX. AICPA, 1988, Statement on Auditing Standards (SAS) No. 53: The Auditors Responsibility to Detect and Report Errors and Irregularities, Americ an Institute of Certified Public Accountants, New York, NY. AICPA, 1997, Statement on Auditing Standards (SAS) No. 82: Consideration of Fraud in a Financial Statement Audit, American Institute of Ce rtified Public Accountants, New York, NY. AICPA, 2002, Statement on Auditing Standards (SAS) No. 99: Consideration of Fraud in a Financial Statement Audit, American Institute of Ce rtified Public Accountants, New York, NY. Bayley, L., S., Taylor, 2007, Identifying Earnings Management: A Financial Statement Analysis (Red Flag) Approach, Working Paper, January 20, 20 07. Beasley, M., 1996, An Empirical Analysis of the Re lation between the Board of Director Composition and Financial Statement Fraud, The Accounting Review 71(4), pp. 443465. Bell, T., J. Carcello, 2000, A Decision Aid for As sessing the Likelihood of Fraudulent Financial Reporting, Auditing: A Journal of Practice & Theory 19(1), pp. 169184. Beneish, M., 1997, Detecting GAAP Violation: Impli cations for Assessing Earnings Management among Firms with Extreme Financial Perfo rmance, Journal of Accounting and Public Policy 16, pp. 271309. Beneish, M., 1999, Incentives and Penalties Relate d to Earnings Overstatements That Violate GAAP The Accounting Review 74(4), pp. 425457. Berg, J.E., T.A. Rietz, 2003, Prediction Markets a s Decision Support Systems, Information Systems Frontiers 5(1), pp. 7993. Breiman, L., 1996, Bagging Predictors, Machine Learning 24(2), pp. 123140. Burgstahler, D., M. Eames, 2006, Management of Ear nings and Analysts' Forecasts to Achieve Zero and Small Positive Earnings Surprises, Journal of Business Finance & Accounting 33(56), pp. 633652. PAGE 130 118 Carlsson, P., F. Ygge, A. Andersson, 2001, Extendi ng Equilibrium Markets, IEEE Intelligent Systems 16(4), pp. 1826. Chan, P.K., W. Fan, A.L. Prodromidis, S.J. Stolfo, 1999, Distributed Data Mining in Credit Card Fraud Detection, IEEE Intelligent Systems and Their Applications 14(6), pp. 6774. Chawla, N.V., K.W. Bowyer, L.O. Hall, W.P. Kegelmey er, 2002, SMOTE: Synthetic Minority Oversampling Technique, Journal of Artificial Intelligence Research (16), pp. 321357. Chen, C., J. Sennetti, 2005, "Fraudulent financial reporting characteristics of the computer industry under a strategicsystems lens," Journal of Forensic Accounting 6(1), pp.2354. Dechow, P., R. Sloan, A. Sweeney, 1995, Detecting Earnings Management, The Accounting Review 70(2), pp. 193225. Dechow, P., R. Sloan, A. Sweeney, 1996, Causes and consequences of earnings manipulations: An analysis of firms subject to Enforcement Actions by the SEC, Contemporary Accounting Research 13(1), pp. 136. Dichev, I., D. Skinner, 2002, Largesample evidenc e on the debt covenant hypothesis, Journal of Accounting Research 40, pp. 10911123. Dopuch, N., R. Holthausen, R. Leftwich, 1987, Pred icting Audit Qualifications with Financial and Market Variables, The Accounting Review (62)3, pp. 431454. Drummond, C., R.C. Holte, 2006, Cost Curves: An Im proved Method for Visualizing Classifier Performance, Machine Learning 65(1), pp. 95130. Duin, P.W.R., M.J.D. Tax, 2000, Experiments with C lassifier Combining Rules, International Workshop on Multiple Classifier Systems 2000 Fama, E., 1970, Efficient Capital Markets: A Revie w of Theory and Empirical Work, Journal of Finance 25(2), pp. 383417. Fan, A., M. Palaniswami, 2000, Selecting bankruptc y predictors using a support vector machine approach, Neural Networks (6), pp. 354359. Fanning, K., K. Cogger, 1998, Neural network detec tion of management fraud using published financial data, International Journal of Intelligent Systems in Acc ounting, Finance and Management 7(1), pp. 2141. Feroz, E., T. Kwon, V. Pastena, K. Park, 2000, The Efficacy of RedFlags in Predicting the SEC's Targets: An Artificial Neural Networks Approa ch, International Journal of Intelligent Systems in Accounting, Finance & Management 9(3), pp. 145157. Fries, T., N. Cristianini, C. Campbell, 1998, The kernel adatron algorithm: a fast and simple learning procedure for support vector machines, In the Proceedings of the 15th International Conference on Machine Learning Madison, WI. PAGE 131 119 Green, B.P., J.H. Choi, 1997, Assessing the Risk o f Management Fraud Through Neural Network Technology, Auditing: A Journal of Practice & Theory 16(1), pp. 1428. Hall, M., G. Holmes, 2003, Benchmarking Attribute Selection Techniques for Discrete Class Data Mining," IEEE Transactions on Knowledge and Data Engineering (15)3, pp. 116. Hanson, R., 2003, Combinatorial Information Market Design, Information Systems Frontiers 5(1), pp. 107119. Hayek, F.A., 1945, The Use of Knowledge in Society , The American Economic Review 35(4), pp. 519530. Healy, P., 1985, The effect of bonus schemes on ac counting decisions, Journal of Accounting and Economics 7, pp. 85107. Healy, P.M., J.M. Wahlen, 1999, A review of the ea rnings management literature and its implications for standard setting, Accounting Horizons 13(4), pp. 365383. Hribar, P., D.W. Collins, 2002, Errors in Estimati ng Accruals: Implications for Empirical Research, Journal of Accounting Research 40(1), pp. 105134. Jaccard, J., C.K. Wan, 1996, LISREL approaches to interaction effects in multipl e regression Sage Publications, Thousand Oaks, CA. Jain, A.K., R.P.W. Duin, J. Mao, 2000, Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intell igence 22(1), pp. 437. Jones, J., 1991, Earnings management during import relief investigations, Journal of Accounting Research 29, pp. 193228. Kaminski, K., S. Wetzel, L. Guan, 2004, Can financ ial ratios detect fraudulent financial reporting, Managerial Auditing Journal (19)1, pp. 1528. Kasznik, R., 1999, On the Association between Volu ntary Disclosure and Earnings Management, Journal of Accounting Research 37(1), pp. 5781. Kelly, J., 1956, A New Interpretation of Informati on Rate., IEEE Transactions on Information Theory 2(3), pp. 185189. Kennedy, P.E., 1981, Estimation with Correctly Int erpreted Dummy Variables in Semilogarithmic Equations, American Economic Review 71(4), p. 801. Kirkos, E., C. Spathisb, Y. Manolopoulos, 2007, Da ta Mining techniques for the detection of fraudulent financial statements, Expert Systems with Applications (32)4, pp. 9951003. Kittler, J., M. Hatef, R.P.W. Duin, J. Matas, 1998, On Combining Classifiers, IEEE Transactions on Pattern Analysis and Machine Intell igence 20(3), pp. 226239. PAGE 132 120 Kotsiantis, S., E. Koumanakos, D. Tzelepis, V. Tamp akas, 2006, Forecasting Fraudulent Financial Statements using Data Mining, International Journal of Computational Intelligence (3)2, pp. 104110. KPMG, 2006, KPMG Forensics 2006 survey of fraud in Australia and New Zealand, KPMG Forensic, Melbourne, Australia. Lam, L., 2000, Classifier Combinations: Implementa tions and Theoretical Issues, Multiple Classifier Systems in Lecture Notes in Computer Sci ence 1857, pp. 7786. Lee, T.A., R.W. Ingram, T.P. Howard, 1999, The dif ference between earnings and operating cash flow as an indicator of financial reporting fr aud, Contemporary Accounting Research 16(4), pp. 749786. Lee, W., S.J. Stolfo, K.W. Mok, 2000, Adaptive Int rusion Detection: A Data Mining Approach, Artificial Intelligence Review 14(6), pp. 533567. Lin, J., M. Hwang, J. Becker, 2003, A Fuzzy Neural Network for Assessing the Risk of Fraudulent Financial Reporting, Managerial Auditing Journal 18(8), pp. 657665. Loebbecke, J.K., M.M. Eining, J.J. Willingham, 1989 Auditors experience with material irregularities: Frequency, nature, and detectabilit y, Auditing: A Journal of Practice and Theory 9(1), pp. 128. Newman, D.J., S. Hettich, C.L. Blake, C.J. Merz, 19 98, UCI Repository of Machine Learning Databases http://www.ics.uci.edu/~mlearn/MLReposito ry.html, Irvine, CA, University of California, Department of Information and Computer Science. Nissen, M.E., K. Sengupta, 2006, Incorporating Sof tware Agents into Supply Chains: Experimental Investigation with a Procurement Task, MIS Quarterly 30(1), pp. 145166. Oversight, 2005, The 2005 Oversight System Report o n Corporate Fraud, Oversight, Atlanta, GA. PCAOB, 2004, Auditing Standard No. 2: An Audit of I nternal Control Over Financial Reporting Performed in Conjunction With an Audit of Financial Statements, Public Company Accounting Oversight Board, Washington, DC. PCAOB, 2007, Auditing Standard No. 5: An Audit of I nternal Control Over Financial Reporting That Is Integrated with An Audit of Financial State ments, Public Company Accounting Oversight Board, Washington, DC. Pennock, M.D., 2004, A Dynamic PariMutuel Market for Hedging, Wagering, and Information Aggregation, In the Proceedings of the 5th ACM Conference on EC ommerce New York, NY. Perlich, C., F. Provost, J. Simonoff, 2003, Tree I nduction vs. Logistic Regression: A LearningCurve Analysis, Journal of Machine Learning Research (4), pp. 211255. PAGE 133 121 Phua, C., D. Alahakoon, V. Lee, 2004, Minority Rep ort in Fraud Detection: Classification of Skewed Data, SIGKDD Explorations 6(1), pp. 5059. Platt, J., 1999, Fast Training of support vector m achines using Sequential Minimal Optimization, Advances in Kernel Methods: Support Vector Learning pp. 185208. Plott, C.R., J. Wit, W.C. Yang, 2003, Parimutuel B etting Markets as Information Aggregation Devices: Experimental Results, Economic Theory 22(2), pp. 311351. Prodromidis, A., P.Chan, S. Stolfo, 2000, Metalea rning in distributed data mining systems: Issues and approaches, Advances in Distributed and Parallel Knowledge Disc overy (eds.) Kargupta, H. and Chan, P., Chapter 3, AAAI/MIT. Provost, F., T. Fawcett, 2001, Robust Classificati on for Imprecise Environments, Machine Learning 42(3), pp. 203231. Provost, F., T. Fawcett, R. Kohavi, 1998, The Case Against Accuracy Estimation for Comparing Induction Algorithms, In the Proceedings of the Fifteenth International C onference on Machine Learning Madison, WI. Quinlan, J.R., 1993, C4.5: Programs for Machine Learning Morgan Kaufmann Publishers. Rubinstein, M., 1976, The Strong Case for the Gene ralized Logarithmic Utility Model as the Premier Model of Financial Markets, The Journal of Finance 31(2), pp. 551571. SaarTsechansky, M., F. Provost, 2004, Active Samp ling for Class Probability Estimation and Ranking, Machine Learning 54(2), pp. 153178. Shin, K.S., T. Lee, H.J., Kim, 2005, An applicatio n of support vector machines in bankruptcy prediction model, Expert Systems with Application (28), pp. 127135. Stolfo, S., A.L. Prodromidis, S. Tselepis, W. Lee, D.W. Fan, P.K. Chan, 1997, JAM: Java agents for metalearning over distributed databases, In the Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining Newport Beach, CA. Suen, C.Y., L. Lam, 2000, Multiple classifier comb ination methodologies for different output levels, Lecture notes in computer science 1857, pp. 5266. Summers, S.L., J.T. Sweeney, 1998, Fraudulently Mi sstated Financial Statements and Insider Trading: An Empirical Analysis The Accounting Review 73(1), pp. 131146. Weiss, S.M., I. Kapouleas, 1989, An empirical comp arison of pattern recognition, neural nets, and machine learning classification methods, Proceedings of the Eleventh International Joint Conference on Artificial Intelligence Detroit, MI West, D., S. Dellana, J. Qian, 2004, Neural networ k ensemble strategies for decision applications, Computer & Operations Research (32), pp. 25432559. PAGE 134 122 Witten, I.H., E. Frank, 2005, Data Mining: Practical machine learning tools and t echniques San Francisco, CA. Wolfers, J., E. Zitzewitz, 2006, Interpreting Pred iction Market Prices as Probabilities, In the Proceedings of the Allied Social Science Associatio n Annual Meeting Boston, MA. Wolpert, D., 1992, Stacked generalization, Neural Networks (5)2, pp. 241259. Ygge, F., J.M. Akkermans, 1999, Decentralized Mark ets versus Central Control: A Comparative Study, Journal of Artificial Intelligence Research 11, pp. 301333. Yule, G.U, 1900, On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c., Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematic al or Physical Character 194, pp. 257319. Zhao, H., S. Ram, 2004, Constrained Cascade Genera lization of Decision Trees, IEEE Transactions on Knowledge and Data Engineering 16(6), pp. 727739. Zheng, Z., B. Padmanabhan, 2007, Constructing Ense mbles from Data Envelopment Analysis, INFORMS Journal on Computing 19(4), pp. 486496. PAGE 135 123 Chapter 7. Appendices PAGE 136 124 Appendix 1: Proof Lemma 1 Lemma 1 : The optimal bets of agent i in P3 while classifyi ng t is: itjq= pitj(wit+m) j J. Proof Lemma 1 Z3 = maxqitj pit1ln ( qit1Ot1) + pit2ln ( qit2Ot2) (37) S.T. qit1 + qit2 = wit+ m (38) qitj 0 (39) Using Lagrangian multipliers 1, 2, and 3, we get L3 = pit1ln ( qit1Ot1) + pit2ln ( qit2Ot2) + 1( wit + m qit1 qit2) + 2( qit1 0) + 3( qit2 0) (40) n L3/ n qit1 = 0 => pit1Ot1/ qit1Ot1 1 = 0 (41) n L3/ n qit2 = 0 => pit2Ot2/ qit2Ot2 1 = 0 (42) n L3/ n 1 = 0 => wit + m qit1 qit2 = 0 (43) 2( qit1 0) = 0 => 2 qit1 = 0 (44) 3( qit2 0) = 0 => 3 qit2 = 0 (45) simplify (41) and (42) pit1/ qit1 1 = 0 (46) pit2/ qit2 1 = 0 (47) combine (46) and (47) pit1/ qit1 = pit2/ qit2 (48) combine (43) and (48) pit1 / qit1 = pit2 / ( wit + m qit1) (49) pit2 / qit2 = pit1 / ( wit + m qit2) (50) simplify (49) and (50) (note that pit1 = 1 pit2) pit1 ( wit + m ) pit1 qit1 = qit1 pit1 qit1 (51) pit2 ( wit + m ) pit2 qit2 = qit2 pit2 qit2 (52) simplify (51) and (52) qit1 = pit1 ( wit + m ) (53) PAGE 137 125 Appendix 1: (Continued) qit2 = pit2( wit + m ) (54) Use the Hessian matrix for pit1ln ( qit1Ot1) + pit2ln ( qit2Ot2) + 1( wit + m qit1 qit2) + 2( qit1 0) + 3( qit2 0) to verify that L3 has a relative maximum at the critical point obtai ned in (53) and (54): n 2 2 3 2 1 2 3 2 2 1 3 2 2 1 3 2/ ) /( ) /( /it it it it it itq L q q L q q L q L, where (55) 2 1 3 2/itq L = pit1/ qit1 2 (56) 2 2 3 2/itq L = pit2/ qit2 2 (57) ) /(2 1 3 2 it itq q L = 0 (58) ) /(1 2 3 2 it itq q L = 0 (59) The determinant of (55) is: D3 = (2 1 3 2/itq L )(2 2 3 2/itq L ) () /(2 1 3 2 it itq q L )() /(1 2 3 2 it itq q L ) D3 = (pit1/ qit1 2) (pit2/ qit2 2) (60) Simplify (60) D3 = pit1 pit2/ qit1 2qit2 2 (61) J j when 0< pitj<1, then 0< qitj < ( wit+ m ) as per (53) and (54), therefore D3 > 0. Further, since 2 1 3 2/itq L < 0 and 2 2 3 2/itq L < 0 (see (56), (57)), therefore the critical point is a relative maximum. When pitj = 0 or 1, then D3=0, i.e., the Hessian is indeterminate. It can be seen from (38), (45) and (53) that when pit1 = 0 then qit1=0, qit2 = ( wit+ m ) and 3 = 0. It can similarly be verified that when pit2 = 0 then qit2=0, qit1 = ( wit+ m ) and 2 = 0. PAGE 138 126 Appendix 2: Proof Lemma 2 Lemma 2: The optimal bets of agent i in P4 while classifyi ng t is: Solution a: 1 itq= pit1 km + ait 2 1 2 2 1 1 t t t it t itO O O p O p and 2 itq= pit2 km + ait 2 1 1 1 2 2 t t t it t itO O O p O p, when 0 < pit1 km + ait 2 1 2 2 1 1 t t t it t itO O O p O p< km and 0 < pit2 km + ait 2 1 1 1 2 2 t t t it t itO O O p O p< km ; Solution b: 1 itq= km and 2 itq= 0, when km pit1 km + ait 2 1 2 2 1 1 t t t it t itO O O p O p; and Solution c: 2 itq= km and 1 itq= 0, when km pit2 km + ait 2 1 1 1 2 2 t t t it t itO O O p O p Proof Lemma 2 Z4 = maxqitj pit1ln ( qit1Ot1 + ait) + pit2ln ( qit2Ot2 + ait) (62) S.T. qit1 + qit2 = km (63) qitj 0 (64) Using Lagrangian multipliers 1, 2, and 3, we get L4 = pit1ln ( qit1Ot1 + ait) + pit2ln ( qit2Ot2 + ait) + 1( km qit1 qit2) + 2( qit1 0) + 3( qit2 0) (65) n L4/ n qit1 = 0 => pit1Ot1 / ( qit1Ot1 + ait) 1 + 2 = 0 (66) n L4/ n qit2 = 0 => pit2Ot2 / ( qit2Ot2 + ait) 1 + 3 = 0 (67) n L4/ n 1 = 0 => km qit1 qit2 = 0 (68) 2( qit1 0) = 0 => 2 qit1 = 0 (69) 3( qit2 0) = 0 => 3 qit2 = 0 (70) given (68), (69) and (70) Z4 has three possible solutions: Solution a: 0 < qit1 < km 0 < qit2 < km 2 = 0 3 = 0 (71) PAGE 139 127 Appendix 2: (Continued) Solution b: qit1 = 0 qit2 = km 3 = 0 (72) Solution c: qit2 = 0 qit1 = km 2 = 0 (73) Agent i determines the optimal solution of (62) given only the constraint in (63) as given below. To solve for Solution a combine (66), (67) and (71) pit1Ot1 / ( qit1Ot1 + ait) 1 = pit2Ot2 / ( qit2Ot2 + ait) 1 (74) simplify (74) pit1qit2Ot1Ot2 + pit1Ot1ait = pit2qit1Ot1Ot2 + pit2Ot2ait (75) combine (68) and (75) kmpit1Ot1Ot2 pit1qit1Ot1Ot2 + pit1Ot1ait = pit2qit1Ot1Ot2 + pit2Ot2ait (76) kmpit2Ot1Ot2 pit2qit2Ot1Ot2 + pit2Ot2ait = pit1qit2Ot1Ot2 + pit1Ot1ait (77) simplify (76) and (77) (note that pit1 + pit2 = 1) qit1Ot1Ot2 = kmpit1Ot1Ot2 + pit1Ot1ait pit2Ot2ait (78) qit2Ot1Ot2 = kmpit2Ot1Ot2 + pit2Ot2ait pit1Ot1ait (79) simplify (78) and (79) qit1 = pit1km + ait(pit1Ot1 pit2Ot2)/Ot1Ot2 (80) qit2 = pit2km + ait(pit2Ot2 pit1Ot1)/Ot1Ot2 (81) If qit1 > 0 and qit2 >0 then Solution a is given by (80) and (81). When qit1 0, then agent i will bet as per Solution b, else when qit2 0, then agent i will bet as per Solution c. Use the Hessian matrix for pit1ln ( qit1Ot1 + ait) + pit2ln ( qit2Ot2 + ait) + 1( km qit1 qit2) + 2( qit1 0) + 3( qit2 0) to verify that L4 has a relative maximum at the critical point obtai ned in (80) and (81). n 2 2 4 2 1 2 4 2 2 1 4 2 2 1 4 2/ ) /( ) /( /it it it it it itq L q q L q q L q L, where (82) 2 1 4 2/itq L = 2 1 1 2 1 1) /(it t it t ita O q O p+ (83) 2 2 4 2/itq L = 2 2 2 2 2 2) /(it t it t ita O q O p+ (84) PAGE 140 128 Appendix 2: (Continued) ) /(2 1 4 2it itq q L= 0 (85) ) /(1 2 4 2it itq q L= 0 (86) The determinant of (82) is: D4 = (2 1 4 2/itq L)(2 2 4 2/itq L) () /(2 1 4 2it itq q L)() /(1 2 4 2it itq q L) D4 = (2 1 1 2 1 1) /(it t it t ita O q O p+ ) (2 2 2 2 2 2) /(it t it t ita O q O p+ ) (87) Simplify (87) D4 = 2 2 2 2 1 1 2 2 2 1 2 1) ( ) /((it t it it t it t t it ita O q a O q O O p p+ +) (88) J j when 0< pitj<1, then D4 > 0. Note that Otj 1 by definition, ait>0 given wit > ( k 1) m Further since 2 1 4 2/itq L < 0 and 2 2 4 2/itq L < 0 (see (83), (84)), therefore the critical point is a relative maximum. When pitj = 0 or 1, then D4=0, i.e., the Hessian is indeterminate. It can be s een from (63) CostSavings3 and (80) that when pit1 = 0 then qit1 0, qit2 km i.e., pit2 km + ait 2 1 1 1 2 2 t t t it t itO O O p O p km as per (81). It can similarly be verified that when pit2 = 0 then qit2 0, qit1 km i.e., pit1 km + ait 2 1 2 2 1 1 t t t it t itO O O p O p km as per (80) When qit1 0 then constraint (64) becomes binding and qit1 is set to 0 (Solution b). Similarly, when qit2 0 then constraint (64) becomes binding and qit2 is set to 0 (Solution c). PAGE 141 129 Appendix 3: Proof Lemma 3 Lemma 3 : Given any combination of betting behaviors as per Lemma 1 and Lemma 2, equilibrium exists, and the equilibrium odd for j=1 is: Ot1 = 1 ) ( ) ( ) ( ) (2 12 1 2 12 2+ + + + + a D D i b D i it it a D D i c D i it itkm m w p km m w p Proof Lemma 3 In IMF, for each object t, the house manipulates the market odds Ot1 and Ot2 to establish the equilibrium odds that occur when: Ot1Qt1 = Ot2Qt2 (89) Using Lemma 1 and Lemma 2, the LHS and RHS of (89) are: Ot1Qt1 = Ot1 +1 1) (D i it itm w p+ Ot1 )2 1 2 2 1 1 2 t t t it t it a D iO O O p O p n r++ Ot1) (2kmb D i + Ot1)0(2 c D i; and (90) Ot2Qt2 = Ot2 +1 2) (D i it itm w p+ Ot2 )2 1 1 1 2 2 2 t t t it t it a D iO O O p O p n ++ Ot2)0(2 b D i+ Ot2) (2kmc D i , (91) where all agents i 1 D that bet per Lemma 1 are defined as i 1 D and all agents that bet per Lemma 2, solutions a, b and c are defined as b aD i D i2 2, and cD i2, respectively. On substituting the LHS of (89) with the RHS of (90 ) and the RHS of (89) with the RHS of (91): Ot1 +1 1) (D i it itm w p+ Ot1 )2 1 2 2 1 1 2 t t t it t it a D iO O O p O p n r++ Ot1) (2kmb D i + Ot1)0(2 c D i= Ot2 +1 2) (D i it itm w p+ Ot2 )2 1 1 1 2 2 2 t t t it t it a D iO O O p O p n ++ Ot2)0(2 b D i + Ot2) (2kmc D i (92) On Simplifying (92): + + + ) ( ) (1 2 1 1 1 1km p O m w p Oit a D i t D i it it t PAGE 142 130 Appendix 3: (Continued) ) ( ) ( ) (2 1 2 2 2 1 1 2km O p a O O p ab D i t it it a D i t t it it a D i + = + + + ) ( ) (2 2 2 1 2 2km p O m w p Oit a D i t D i it it t ) ( ) ( ) (2 2 1 2 1 2 2 2km O p a O O p ac D i t it it a D i t t it it a D i + (93) On Simplifying (93) by substituting 2 tOfor 1 tO/ (1 tO1): + + + ) ( ) (1 2 1 1 1 1km p O m w p Oit a D i t D i it it t ) ( ) ( ))1 ( (2 1 2 2 1 1 2km O p a O p ab D i t it it a D i t it it a D i + = 11 1t tO O + + + ) ( 1 ) (2 2 1 1 1 2km p O O m w pit a D i t t D i it it ) ( 1 ) ( ) 1 (2 1 1 1 2 1 2 2km O O p a O p ac D i t t it it a D i t it it a D i + (94) On simplifying (94): Ot1+ + + + )) ( ) ( ) ( ) ( (2 1 2 1 2 1 1km p a km p m w pb D i it it a D i it a D i D i it it )) ( ) ( ) ( ( 12 2 2 1 2 1 1km km p m w p O Oc D i it a D i D i it it t t + + + = ) 1 (1 2 2 t it it a D iO p a ) (2 2it it a D ip a (95) On simplifying (95): ( Ot11) Ot1+ + + + )) ( ) ( ) ( ) ( (2 1 2 1 2 1 1km p a km p m w pb D i it it a D i it a D i D i it it + + + )) ( ) ( ) ( (2 2 2 1 2 1km km p m w p Oc D i it a D i D i it it t = ) ( )1 (2 2 1it it a D i tp a O ) (2 2it it a D ip a (96) On simplifying (96): ( Ot11) Ot1+ + + + )) ( ) ( ) ( ) ( (2 1 2 1 2 1 1km p a km p m w pb D i it it a D i it a D i D i it it )) ( ) ( ) ( ) ( (2 2 2 2 2 1 2 1km p a km p m w p Oc D i it it a D i it a D i D i it it t + + + + = 0 (97) PAGE 143 131 Appendix 3: (Continued) On simplifying (97): ( Ot11) = + + + + )) ( ) ( ) ( ) ( (2 1 2 1 2 1 1km p a km p m w pb D i it it a D i it a D i D i it it ) ( ) ( ) ( ) (2 2 2 2 2 1 2km p a km p m w pc D i it it a D i it a D i D i it it + + + + (98) On simplifying (98): Ot1 = 1 ) ( ) ( ) ( ) ( ) ( ) (2 2 1 1 1 2 2 2 1 2+ + + + + + + + + km a km p m w p km a km p m w pb D i it a D i it D i it it c D i it a D i it D i it it (99) On simplifying (99), note that ait = wit + m km : Ot1 = 1 ) ( ) ( ) ( ) ( ) ( ) (2 2 1 1 1 2 2 2 1 2+ + + + + + + + + km m w p m w p km m w p m w pb D i it a D i it D i it it c D i it a D i it D i it it (100) On simplifying (100): Ot1 = 1 ) ( ) ( ) ( ) (2 12 1 2 12 2+ + + + + a D D i b D i it it a D D i c D i it itkm m w p km m w p (101) Since Otj = 1/ Pitj 1 equilibrium odds exist when + + + +a D D i b D i it it a D D i c D i it itkm m w p km m w p2 12 1 2 12 2) ( ) ( ) ( ) ( 0. Thus, since 0 pitj 1, ( wit + m ) > 0, and km >0, equilibrium odds exist when agents bet as per Lemma 1 and Lemma 2 PAGE 144 132 Appendix 4: Empirical Experiments of Equilibrium O dds Discontinuous Agent Bets in Odds If agent bets are discontinuous over Otj then the existence of equilibrium odds cannot be guaranteed (Carlsson et al. 2001). To provide some insights into the utility of IMF in situations such as above, when eq uilibrium odds might not exist (agents not betting as per Lemma 1 and Lemma 2), we run an expe riment with risk neutral agents that bet their entire wealth on only one event j that satisfies pitjOtj >1, i.e., the bets are discontinuous and equilibrium odds do not always exist (verified empi rically). The combiner method main effect is evaluated and the results are statistically equival ent to the results in the main experiment. Existence of Equilibrium Odds Equilibriums odds are defined in IMF as the odds th at give QtjOtj = Qt, and where Qtj and Qt are functions of Otj. The proof in Appendix 3 shows that the equilibrium odd Ot1 is equal to 1 ) ( ) ( ) ( ) (2 12 1 2 12 2+ + + + + a D D ib D i it it a D D i c D i it itkm m w p km m w p However, this can not be used directly to determine the existence of equilibrium odds Otj for a given object t because of the recursive nature of Ot1 and Qt1 (note that qitj is used to determine if i D1, i D2a, i D2b or i D2c, and that the agents use Ot1 to determine qitj). To empirically validate the existence of equilibrium odds (defined as odds such as  QtjOtj Qt/ Qt < 0.0000000001), we use binary search with = 0.000000000000001. Based on this setting, we fin d equilibrium odds for 98.32% of the objects. PAGE 145 133 Appendix 5: Equivalence of Net Benefit and Cost Sa vings In a given classification context, Cost Savings (CS ) is defined as the difference between the costs that would result if no classification system is us ed and the costs that results when a classification system is used (Chan et al. 1999). In the fraud con text CS is defined as: CS = P*Fraud Cost (FN*Fraud Cost + (TP+FP)*Invest igation Cost), (102) where P is the number of fraud instances and Fraud Cost is the cost of one fraud instance. simplify (102): CS = (PFN)*Fraud Cost (TP+FP)*Investigation Cost (103) simplify (103): CS = Fraud Cost*TP Investigation Cost*(TP+FP) (104) CS is the same as Net Benefit, since Fraud Cost is the same as FN cost avoidance. PAGE 146 134 Appendix 6: Relation between Estimated Relative Co st and Net Benefit Estimated Relative Cost (ERC) is defined as the cos t per classified firm of undetected instances of financial statement fraud plus the cost of inves tigating nonfraudulent firms: ERC = nFN/ nP x CFN x P(Fraud) + nFP/ nN x CFP x P(NonFraud), (105) where P(Fraud) and P(NonFraud) are the assumed pop ulation fraud and nonfraud probabilities, i.e., nP/( nP+ nN) and nN/( nP+ nN), respectively; CFP is the cost of false positive classifications, and CFN is the cost of false negative classifications; nFP is the number of false positive classifications, nFN is the number of false negative classifications, nP is the number of positive instances in the dataset and nN is the number of negative instances in the dataset simplify (105): ERC = nFN/ nP x CFN x nP/( nP+ nN) + nFP/ nN x CFP x nN/( nP+ nN), (106) simplify (106): ERC = ( nFN x CFN + nFP x CFP)/( nP+ nN) (107) substitute nFN in (107) with nPnTP (note that nFN + nTP = nP): ERC = (( nPnTP) x CFN + nFP x CFP)/( nP+ nN) (108) simplify (108): ( nP x CFN)/( nP+ nN) = ( nTP x CFN nFP x CFP)/( nP+ nN) + ERC (109) substitute CFN in the RH of (109) with CPCi and CFP with Ci, where CP is the cost of fraud and Ci is the cost of investigation: ( nP x CFN)/( nP+ nN) = ( nTP x ( CPCi) nFP x Ci)/( nP+ nN) + ERC (110) simplify (110): ( nP x CFN)/( nP+ nN) = ( CP x nTP Ci x ( nTP + nFP))/( nP+ nN) + ERC (111) substitute CP x nTP Ci x ( nTP + nFP) in (111) with NB i.e., Net Benefit, given that FN cost avoidance and investigation cost in Net Benefit are equivalent to the cost of fraud, CP, and the cost of investigation Ci, in ERC ( nP x CFN)/( nP+ nN) = NB /( nP+ nN) + ERC (112) substitute ( nP x CFN)/( nP+ nN) in (112) with constant a the average fraud cost of all firms. Note that ( nP x CFN)/( nP+ nN) is constant in any given dataset: a = NB /( nP+ nN) + ERC (113) ERC plus net benefit per classified firm is equal to c onstant a In a given dataset when net benefit per classified firm increases as the result of the classification effort ERC decreases by the same amount, and vice versa. xml version 1.0 encoding UTF8 standalone no record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 002000470 003 fts 005 20090422134535.0 006 med 007 cr mnuuuuuu 008 090422s2008 flu s 000 0 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0002486 035 (OCoLC)318989119 040 FHM c FHM d FHM 049 FHMM 090 HF71 (Online) 1 100 Perols, Johan L. 0 245 Detecting financial statement fraud : b three essays on fraud predictors, multiclassifier combination and fraud detection using data mining h [electronic resource] / by Johan L. Perols. 260 [Tampa, Fla] : University of South Florida, 2008. 500 Title from PDF of title page. Document formatted into pages; contains 134 pages. Includes vita. 502 Dissertation (Ph.D.)University of South Florida, 2008. 504 Includes bibliographical references. 516 Text (Electronic dissertation) in PDF format. 520 ABSTRACT: The goal of this dissertation is to improve financial statement fraud detection using a crossfunctional research approach. The efficacy of financial statement fraud detection depends on the classification algorithms and the fraud predictors used and how they are combined. Essay I introduces IMF, a novel combiner method classification algorithm. The results show that IMF performs well relative to existing combiner methods over a wide range of domains. This research contributes to combiner method research and, thereby, to the broader research stream of ensemblebased classification and to classification algorithm research in general. Essay II develops three novel fraud predictors: total discretionary accruals, meeting or beating analyst forecasts and unexpected employee productivity. The results show that the three variables are significant predictors of fraud.Hence Essay II provides insights into (1) conditions under which fraud is more likely to occur (total discretionary accruals is high), (2) incentives for fraud (firms desire to meet or beat analyst forecasts), and (3) how fraud is committed and can be detected (revenue fraud detection using unexpected employee productivity). This essay contributes to confirmatory fraud predictor research, which is a substream of research that focuses on developing and testing financial statement fraud predictors. Essay III compares the utility of artifacts developed in the broader research streams to which the first two essays contribute, i.e., classification algorithm and fraud predictor research in detecting financial statement fraud.The results show that logistic regression and SVM perform well, and that out of 41 variables found to be good predictors in prior fraud research, only six variables are selected by three or more classifiers: auditor turnover, Big 4 auditor, accounts receivable and the three variables introduced in Essay II. Together, the results from Essay I and Essay III show that IMF performs better than existing combiner methods in a wide range of domains and better than stacking, an ensemblebased classification algorithm, in fraud detection. The results from Essay II and Essay III show that the three predictors created in Essay II are significant predictors of fraud and, when evaluated together with 38 other predictors, provide utility to classification algorithms. 538 Mode of access: World Wide Web. System requirements: World Wide Web browser and PDF reader. 653 Earnings management Discretionary accruals Unexpected productivity Information markets Combiner methods 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.2486 