USF Libraries
USF Digital Collections

Risky predictions, damn strange coincidences, and theory appraisal

MISSING IMAGE

Material Information

Title:
Risky predictions, damn strange coincidences, and theory appraisal a multivariate corroboration index for path analytic models
Physical Description:
Book
Language:
English
Creator:
Hogarty, Kristine Y
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla.
Publication Date:

Subjects

Subjects / Keywords:
verisimilitude
path analysis
model fit
theory testing
precision
Dissertations, Academic -- Measurement and Evaluation -- Doctoral -- USF   ( lcsh )
Genre:
government publication (state, provincial, terriorial, dependent)   ( marcgt )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Summary:
ABSTRACT: The empirical testing of theories is an important component of research in any field. Yet despite the long history of science, the extent to which theories are supported or contradicted by the results of empirical research remains ill defined. Quite commonly, support or contradiction is based solely on the "reject" or "fail to reject" decisions that result from tests of null hypotheses that are derived from aspects of theory. Decisions and recommendations based on this forced and often artificial dichotomy have been scrutinized in the past. Such an overly simplified approach to theory testing has been vigorously challenged in the past. Theories differ in the extent to which they provide precise predictions about observations. The precision of predictions derived from theories is proportional to the strength of support that may be provided by empirical evidence congruent with the prediction. However, the notion of precision linked to strength of support is surprisingly absent from many discussions regarding the appraisal of theories. In the early 1990s, Meehl presented an index of corroboration to summarize the extent to which empirical tests of theories provide support or contradiction of theories. This index is comprised of a closeness component and an estimate of precision. The purpose of this study was to evaluate the utility of this index of corroboration and its behavior when employing path analytic methods in the context of social science research. The performance of a multivariate extension of Meehl's Corroboration Index (Ci) was evaluated using Monte Carlo methods by simulating traditional path analysis. Five factors were included in the study: model complexity, level of intolerance, verisimilitude, sample size and level of collinearity. Results were evaluated in terms of the mean and standard error of the resulting multivariate Ci values. Of the five central design factors investigated, the level of intolerance was observed to be the strongest influence on mean Ci. Verisimilitude and model complexity were not observed to be strong determinants of the mean Ci. The lack of sensitivity of the index to the other design factors led to a proposed alternative conceptualization of the multivariate corroboration index to guide future research efforts.
Thesis:
Thesis (Ph.D.)--University of South Florida, 2003.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Kristine Y. Hogarty.
General Note:
Includes vita.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 158 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001441468
oclc - 53961815
notis - AJM5908
usfldc doi - E14-SFE0000141
usfldc handle - e14.141
System ID:
SFS0024837:00001


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001441468
003 fts
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 031203s2003 flua sbm s000|0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0000141
035
(OCoLC)53961815
9
AJM5908
b SE
SFE0000141
040
FHM
c FHM
090
LB3051
1 100
Hogarty, Kristine Y.
0 245
Risky predictions, damn strange coincidences, and theory appraisal
h [electronic resource] :
a multivariate corroboration index for path analytic models /
by Kristine Y. Hogarty.
260
[Tampa, Fla.] :
University of South Florida,
2003.
502
Thesis (Ph.D.)--University of South Florida, 2003.
500
Includes vita.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
Title from PDF of title page.
Document formatted into pages; contains 158 pages.
520
ABSTRACT: The empirical testing of theories is an important component of research in any field. Yet despite the long history of science, the extent to which theories are supported or contradicted by the results of empirical research remains ill defined. Quite commonly, support or contradiction is based solely on the "reject" or "fail to reject" decisions that result from tests of null hypotheses that are derived from aspects of theory. Decisions and recommendations based on this forced and often artificial dichotomy have been scrutinized in the past. Such an overly simplified approach to theory testing has been vigorously challenged in the past. Theories differ in the extent to which they provide precise predictions about observations. The precision of predictions derived from theories is proportional to the strength of support that may be provided by empirical evidence congruent with the prediction. However, the notion of precision linked to strength of support is surprisingly absent from many discussions regarding the appraisal of theories. In the early 1990s, Meehl presented an index of corroboration to summarize the extent to which empirical tests of theories provide support or contradiction of theories. This index is comprised of a closeness component and an estimate of precision. The purpose of this study was to evaluate the utility of this index of corroboration and its behavior when employing path analytic methods in the context of social science research. The performance of a multivariate extension of Meehl's Corroboration Index (Ci) was evaluated using Monte Carlo methods by simulating traditional path analysis. Five factors were included in the study: model complexity, level of intolerance, verisimilitude, sample size and level of collinearity. Results were evaluated in terms of the mean and standard error of the resulting multivariate Ci values. Of the five central design factors investigated, the level of intolerance was observed to be the strongest influence on mean Ci. Verisimilitude and model complexity were not observed to be strong determinants of the mean Ci. The lack of sensitivity of the index to the other design factors led to a proposed alternative conceptualization of the multivariate corroboration index to guide future research efforts.
590
Adviser: Kromrey, Jeffrey D.
653
verisimilitude.
path analysis.
model fit.
theory testing.
precision.
690
Dissertations, Academic
z USF
x Measurement and Evaluation
Doctoral.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.141



PAGE 1

Risky Predictions, Damn Strange Coincidences, and Theory Appraisal: A Multivariate Corroboration Index for Path Analytic Models by KRISTINE Y. HOGARTY A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Educational Measurement and Research College of Education University of South Florida Major Professor: Jeffrey D. Kromrey, Ph.D. Robert Dedrick, Ph.D. John Ferron, Ph.D. Kathleen McNelis, Ph.D. Date of Approval: October 31 2003 Keywords: theory testing, model fit, path analysis, verisimilitude, precision Copyright 2003, Kristine Y. Hogarty

PAGE 2

Acknowledgement At a time such as this, it is quite liberating to take the time to reflect on the many influences that came to b ear not only on the production of this research but perhaps more importantly on the shaping of thoughts and ideas during this most challenging, yet stimulating journey. After a rather circuitous route, I was most fortunate to find a home in the Department of Educational Measurement and Research. During all of my years studying business and criminology I never would have predicted (albeit risky or otherwise) that I would find my true passion in statistics and research. I have been truly overwhelmed by the nurturing nature of the faculty in this department, and it is with the deepest appreciation that I thank my committee members that are part of this most dedicated faculty. Dr. Bob Dedrick and Dr. John Ferron, I thank you for your thoughts, your recommenda tions and suggestions and your enthusiasm and support for my research. I thank Dr. Kathy McNelis for keeping my grounded in the real world and for giving me something to think about in the elevator. And, as my major professor, Dr. Jeff Kromrey, you have helped me grow in so many ways. I am so very thankful for your insight and guidance, your firm yet kind words, and of course, all of those gentl e nudges along the way. Your drive and your passion for the profession are truly inspirational. On a closing n ote, I would certainly be remiss if I did not acknowledge the fortitude of my familyto Mom and Kevin, we finally did it!

PAGE 3

Dedication For her unwavering support and enthusiasm for everything that I have ever aspired to accomplish, it is with loving memorie s that I dedicate this dissertation to Ethel.

PAGE 4

i Table of Contents List of Tables iv List of Figures vi Abstract ix Chapter One Introduction 1 Background 1 Need for Another Tool 3 Meehls Index of Corroboration 4 Statement of the Problem 7 Purpose 7 Fo ur Research Questions and Three Research Hypotheses 9 Research Questions 9 Research Hypotheses 9 Delimitations and Limitations 10 Organization of the Study 10 Definitions 12 Chapter Two Review of the Literature 15 Overview 15 Philosophy of Science 17 Theory Testing 18 Path Analysis 22 Fit Indices 26 Absolute Indices of Fit 27 Incremental Fit Indices 29 Binomial Index of Model Fit 30 Theoretical and Empirical Fit 31 How Persuasive is Good Fit 33 Meehls Ci 34 An Example 34 Amalgam 38 Reactions to Me ehls Ci 40

PAGE 5

ii Past Research on Ci 42 Importance of the Study 48 Chapter Three Method 51 Organization 51 Purpose 51 Four Research Questions and Three Hypotheses 52 Research Questions 52 Research Hypotheses Monte Carlo Studies 53 Research Design 54 Multiv ariate Extension of Ci 66 Conduct of the Monte Carlo Study 77 Data Generation Strategy 80 Data Analysis 82 Chapter Four Results 83 Organization 83 Four Research Questions and Three Hypotheses 83 Research Questions 83 Research Hypotheses 84 Relatio nship Between Mean Ci and the Central Design Factors 84 Probing Deeper: The Influence of Verisimilitude, Model Complexity Collinearity, and Sample Size after Controlling for Intolerance 85 Estimates of Mean Ci 89 Mean Ci by Precision of Prediction and Level of Intolerance 100 Relationship between the Standard Deviation of Ci and Model Complexity, Collinearity, and Sample Size 106 Probing Deeper: An Examination of the Variability in Path Coefficients 110 Relationship Between Mean Ci, Precisi on of Prediction and Verisimilitude 112 Relationship Between Mean Ci, Precision of Prediction and Model Complexity 113 Probing Deeper: An Examination of Bias Evidence in Expected Path Coefficients 117 Relationship Between Mean Ci, Precision of Prediction, Collinearity and Sample Size 119 Summary 121

PAGE 6

iii Chapter Five Conclusions, Implications, and Recommendations 122 Organization 122 Statement of the Problem 123 Purpose 123 Method 124 Relationship Between Mean Ci, Verisimilitude, Intolerance, M odel Complexity, Collinearity, and Sample Size 125 Relationship Between the Standard Deviation of Ci, Model Complexity, Collinearity, and Sample Size 126 Relationship Between Mean Ci, Precision of Prediction, Model Complexity, and Level of Collinearit y 127 Relationship Between Mean Ci, Precision of Prediction, and Verisimilitude 128 Relationship Between Mean Ci and Precision of Prediction 128 Implications for Theory and Practice 129 Recommendations for Future Research 132 References 137 About the Author End P age

PAGE 7

iv List of Tab les Table 1 Values of Ci under Four Levels of Precision 36 Table 2 Obtained Value of Mean Ci for Three Levels of Tuning Multivariate Intolerance by Level of Verisimilitude, Precision of Prediction: Six Variable Model, Low Collinearity, Sample Size=10 0 75 Table 3 Population Correlation Matrix for 4 Variable Model, (VIF = 1.5) 77 Table 4 Population Correlation Matrix for 4 Variable Model, (VIF = 3.0) 77 Table 5 Population Correlation Matrix for 6 Variable Model, (VIF = 1.5) 78 Table 6 Popula tion Correlation Matrix for 6 Variable Model, (VIF = 3.0) 78 Table 7 Population Correlation Matrix for 8 Variable Model, (VIF = 1.5) 79 Table 8 Population Correlation Matrix for 8 Variable Model, (VIF = 3.0) 79 Table 9 Estimated DF, SS, and Omega Sq uared by Design Factor 86 Table 10 Estimated DF, SS, and Omega Squared, Intolerance=Non Null Prediction 87 Table 11 Estimated DF, SS, and Omega Squared, Intolerance=Directional Prediction 88 Table 12 Estimated DF, SS, and Omega Squared, Intolerance=Int erval Prediction 89 Table 13 Model Complexity, Verisimilitude, and Number of Estimated Paths 90 Table 14 Mean Ci by Intolerance, Verisimilitude, Collinearity, and Sample Size, Model Complexity=Low 101 Table 15 Mean Ci by Intolerance, Verisimilitude, Collinearity, and Sample Size, Model Complexity=Moderate 102

PAGE 8

v Table 16 Mean Ci by Intolerance, Verisimilitude, Collinearity, and Sample Size, Model Complexity=High 103 Table 17 Standard Deviation of Ci by Intolerance, Verisimilitude, Collinearity, and Sample Size, Model Complexity=Low 107 Table 19 Standard Deviation of Ci by Intolerance, Verisimilitude, Collinearity, and Sample Size, Model Complexity=Moderate 108 Table 19 Standard Deviation of Ci by Intolerance, Verisimilitude, Collinearity, and Sample Size, Model Complexity=High 109 Table 20 Expected Multivariate Closeness by Verisimilitude, Intolerance, Model Complexity and Collinearity 118

PAGE 9

vi List of Figures Figure 1 Continuum of Precision 4 Figure 2 Mediated Causal Model 25 Fi gure 3 Predicted Values of Ci under Varying Levels of Precision 37 Figure 4 Truth for Four Variable Model, Low Collinearity 55 Figure 5 Truth for Four Variable Model, Moderate Collinearity 55 Figure 6 Truth for Six Variable Model, Low Collinearity 5 6 Figure 7 Truth for Six Variable Model, Moderate Collinearity 57 Figure 8 Truth for Eight Variable Model, Low Collinearity 58 Figure 9 Truth for Eight Variable Model, Moderate Collinearity 59 Figure 10 Six Variable Exclusionary Model, High Level of Verisimilitude 61 Figure 11 Six Variable Exclusionary Model, Moderate Level of Verisimilitude 62 Figure 12 Six Variable Exclusionary Model, Low Level of Verisimilitude 62 Figure 13 Six Variable Supplementary Model, Moderate Level of Level of Verisimilitude 63 Figure 14 Six Variable Supplementary Model, Low Level of Verisimilitude 63 Figure 15 Alternative Methods for Computing Closeness, One, Two, Three, and Four Parameter Models 71 Figure 16 Alternative Methods for Computing Into lerance, One and Two Parameter Models 73

PAGE 10

vii Figure 17 Tuning Multivariate Intolerance, Intolerance= .50 76 Figure 18 Data Generation Strategy 81 Figure 19 Four Variable Exclusionary Model, Moderate Level of Verisimilitude (MVD) 92 Figure 20 Four Vari able Exclusionary Model, Low Level of Verisimilitude (LVD) 92 Figure 21 Four Variable Supplementary Model, Moderate Level of Verisimilitude (MVA) 93 Figure 22 Four Variable Supplementary Model, Low Level of Verisimilitude (LVA) 93 Figure 23 Six Vari able Exclusionary Model, Moderate Level of Verisimilitude (MVD) 94 Figure 24 Six Variable Exclusionary Model, Low Level of Verisimilitude (LVD) 94 Figure 25 Six Variable Supplementary Model, Moderate Level of Verisimilitude (MVA) 95 Figure 26 Six V ariable Supplementary Model, Low Level of Verisimilitude (LVA) 95 Figure 27 Eight Variable Exclusionary Model, Moderate Level of Verisimilitude (MVD) 96 Figure 28 Eight Variable Exclusionary Model, Low Level of Verisimilitude (LVD) 97 Figure 29 Eig ht Variable Supplementary Model, Moderate Level of Verisimilitude (MVA) 98 Figure 30 Eight Variable Supplementary Model, Low Level of Verisimilitude (LVA) 99 Figure 31 Mean Ci by Level of Verisimilitude, Non Null Prediction 104 Figure 32 Mean Ci by Le vel of Verisimilitude, Directional Prediction 104 Figure 33 Mean Ci by Level of Verisimilitude, Interval Prediction 105

PAGE 11

viii Figure 34 Mean Ci by Level of Intolerance and Number of Estimated Parameters 105 Figure 35 Box and Whisker Plot of Estimated Standard Deviations 110 Figure 36 Stem and Leaf Plot of the Standard Errors 111 Figure 37 Mean Ci by Level of Intolerance and Verisimilitude 113 Figure 38 Mean Ci by Level of Verisimilitude and Model Complexity 114 Figure 39 Mean Ci by Level of Intolerance a nd Verisimilitude, Model Complexity=Low 115 Figure 40 Mean Ci by Level of Intolerance and Verisimilitude, Model Complexity=Moderate 116 Figure 41 Mean Ci by Level of Intolerance and Verisimilitude, Model Complexity=High 116 Figure 42 Mean Ci by Level of Intolerance, Collinearity, and Sample Size 120

PAGE 12

ix Risky Predictions, Damn Strange Coincidences, And Theory Appraisal: A Multivariate Corroboration Index for Path Analytic Models Kristine Y. Hogarty ABSTRACT The elucidation and empirical testing of theories are important components of research in any field. Yet despite the long history of science, the extent to which theories are supported or contradicted by the results of empirical research remains ill defined. Quite commonly, support or contradi ction is based solely on the reject or fail to reject decisions that result from tests of null hypotheses that are derived from aspects of theory. Decisions and recommendations based on this forced and often artificial dichotomy have been scrutinized in the past. In recent years, such an overly simplified approach to theory testing has been challenged on logical grounds (Meehl, 1997, 1990, 1978; Serlin & Lapsley, 1985). Theories differ in the extent to which they provide precise predictions about obs ervations. The precision of predictions derived from theories is proportional to the strength of support that may be provided by empirical evidence congruent with the prediction. However, the notion of precision linked to strength of support is surprisin gly absent from many discussions regarding the appraisal of theories.

PAGE 13

x Meehl (1990a) has presented a logically sound index of corroboration to summarize the extent to which empirical tests of theories provide support or contradiction of theories. The purpo se of this study was to evaluate the utility of this index of corroboration and its behavior when employing path analytic methods in the context of social science research. The performance of a multivariate extension of Meehls Corroboration Index ( C i ) was evaluated using Monte Carlo methods. Correlational data were simulated to correspond to tests of theories via traditional path analysis. Five factors were included in the study: number of variables in the path model, level of intolerance of the theory, co rrespondence of the theory to the true path model used for data generation, sample size and level of collinearity. Results were evaluated in terms of the mean and standard error of the resulting multivariate C i values. The level of intolerance was obser ved to be the strongest influence on mean C i Verisimilitude and model complexity were not observed to be strong determinants of the mean C i Sample size and collinearity evidenced small relationships with the mean value of C i but were more closely relat ed to the sampling error. Implications for theory and practice include alternatives and complements to tests of statistical significance, a shift from comparing findings to the null hypothesis, to the comparison of alternative theories and models, and the inclusion of additional logical components besides the theory itself. Lastly, an alternative conceptualization of the multivariate corroboration index is advanced to guide future research efforts.

PAGE 14

1 Chapter One Introduction Background The elucidation and empirical testing of theories are important components of research in any field. Kerlinger (1964) suggested that these components are fundamental distinctions between science and common sense. Yet despite the long history of science, the extent to which t heories are supported or contradicted by the results of empirical research remains ill defined. Often such support or contradiction is reduced to the reject or fail to reject decisions resulting from tests of null hypotheses that are derived from aspec ts of theory. That is, a theory is supported by empirical evidence if null hypotheses are rejected, when the theory suggests they should be rejected. Conversely, a theory is contradicted (and may be considered refuted, cf. Popper, 1959) if such theoret ically derived null hypotheses are not rejected. The limitations of null hypothesis testing are well known (e.g., Harlow, Mulaik, & Steiger, 1997), and such testing has been the subject of much criticism and controversy (Kirk, 1972; Morrison & Henkel, 1970 ). Over the years, a considerable amount of doubt has been cast on the merit of null hypothesis testing as a theoretical tool The use of this approach in the testing of theories presents unique conceptual challenges and interpretational dangers. Accordi ng to Thompson (2002) the field of psychology has witnessed a lengthy deliberation about the utility of statistical significance, questioning whether these tests should be banned from journals of the American

PAGE 15

2 Psychological Association (APA). A task force charged with examining this issue did not endorse a ban on these tests, but rather articulated a wide ranging set of recommendations for improved inquiry and reporting (Wilkinson & APA Task Force on Statistical Inference, 1999). Recommendations that were adopted and included in the recent fifth edition of the APA (2001) Publication Manual inform potential authors to include information about the obtained magnitude or value of the test statistic, the degrees of freedom, the probability of obtaining a valu e as extreme or more extreme than the one obtained, and the direction of the effect (p. 22). Additionally, the reporting of confidence intervals (for parameter estimates and functions of parameters and for effect sizes) is strongly recommended. According to Thompson (2002), this move represents a positive step forward with respect to improved intellectual inquiry that is less reliant on null hypothesis significance tests (NHST) and requires a heightened sense of responsibility on the part of the research community. Still others question why reform has proceeded further in some other disciplines, including medicine, than in psychology. A few researchers contend that what has happened in psychology was not inevitable. We leave to historians and sociologis ts of science the fascinating and important question of why psychology has persisted for so long with poor statistical practice (Finch, Cummings, & Thomason, 2001, p. 205 206). The persistence of poor statistical practices in a broad range of disciplines in the social sciences is particularly vexing. This conundrum suggests that it would be profitable to explore

PAGE 16

3 alternatives to traditional approaches to theory testing and consider underutilized, different or yet to be developed statistical tools. Need f or Another Tool In recent years, such an overly simplified approach to theory testing has been challenged on logical grounds (Meehl, 1997, 1990, 1978; Serlin & Lapsley, 1985). Theories differ in the extent to which they provide precise predictions about ob servations. The precision of predictions derived from theories is proportional to the strength of support that may be provided by empirical evidence congruent with the prediction. That is, a precise prediction that is supported by data warrants more logic al evidence in support of the theory than does a weak prediction supported by data. This relationship between the precision of prediction and the strength of logical support is rooted in the relative rarity of the data, absent the theory. That is, without the theory, would we expect to see such data anyway? The extent to which we would not expect to see such data is what Salmon (1984) refers to as a damn strange coincidence, and the extent to which a theory predicts such otherwise rare data is a risky p (Meehl, 1978). The degree to which theories differ in their precision of prediction can be illustrated with a simple example. Consider one of the most basic predictions, that is, predictions about population mean differences. A simple predictio n that men and women will have different means on a given variable is a relatively weak prediction. A prediction that the mean of women will be greater than that of

PAGE 17

4 men is somewhat stronger, a prediction that the means will differ by some value between fiv e and ten points is stronger yet, and a prediction that the means will differ by exactly 7 points is even more precise. The precision of prediction can be conceptualized as existing along the continuum presented in Figure 1. Less Precise More Precise w M m m w M > mm w M 5 10 <-< mm M w 7 -= mm Figure 1 Continuum of Precision. Naturally, this basic example can be extended if a researcher desires to make predictions about the direction and magnitude of relationships among variables, and with additional adjustment, can be modified to extend beyond univariate analyses to include multivariate contexts. Meehls Index of Corrobora tion (C i ) Meehl (1997, 1990a) has proposed an index of corroboration ( C i ) that may provide a standardized means of expressing the extent to which empirical research supports or contradicts a theory: C i = (Cl)(In) Where Cl = the closeness of the data to the theoretical prediction (verisimilitude or truth likeness), and

PAGE 18

5 In = the intolerance of the theory (e.g., a standardized precision of prediction). These terms are further defined as follows: Cl = 1 (D/S) Where D = the deviation of observed data fro m the tolerance interval of the theory S = Spielraum (the range of data values that are expected whether or not the theory is true) In = 1 (I/S) Where I = the interval tolerated by the theory (e.g., the raw precision of prediction). As it is quite com mon to encounter the use of good fit to support theories in the research community and a host of these indices already exist, one may ask, why do we need another fit index? While the proposed index may appear to resemble methods commonly used to establish goodness of fit, the additional component that represents the precision of prediction (i.e., the intolerance of the prediction) makes this index somewhat unique. As conceptualized previously, the proposed corroboration index combines both a traditional measure of fit, represented by the deviation of observed data from the tolerance interval of the theory and the degree of precision with which the prediction is advanced. Further, the sheer logic of appraising a scientific theory is often more complicate d than some would believe (Meehl, 1997). In addition to the

PAGE 19

6 aforementioned argument regarding the precision of prediction (that is, a precise prediction that is supported by the data warrants more logical evidence of support than does a weak prediction su pported by the data), the movement from theory into an empirical test necessitates the incorporation of many logical components besides the theory itself. Meehl (1997) presents these components as elements of an equation: ( ) ( ) 12 gggg xpin TA C A C OO Where T = the theory being tested, A x = Auxiliary theories relied upon during the conduct of the research. C p = Ceteris paribus (all other things being equal), A i = Instrumental theories related to measures and controls employed, C n = Realized particulars (the extent to which the research was actually conducted as we think it was), and 12 OO = the material conditional if you observe O 1 you will observe O 2 . That which is subject to empirical test is not the theory alone, but the amalgam o f these elements. Data that appear to contradict a theory may arise because of errors anywhere in this combination of elements (e.g., the theory may be correct but the groups we thought were equivalent were actually systematically different from each ot her on an important, confounding variable). In the following chapter, the elements of this amalgam are further elucidated through the use of an illustrative example representing these components in the conduct of disciplined inquiry.

PAGE 20

7 Statement of the Pro blem Meehl (1997, 1990a) has presented a logically sound index of corroboration to summarize the extent to which empirical tests of theories provide support or contradiction of those theories. However, the numerical properties of this index have not been i nvestigated beyond some of the most basic predictions about population mean differences, zero order, and first order partial correlations (Hogarty & Kromrey, 2002, 2001, 2000). Monte Carlo methods were used in the previous studies to examine the behavior of the corroboration index. These methods currently remain the most feasible way to study this index and thus a similar approach was followed in this study. Purpose The purpose of this study was to build upon the previous research by advancing to the nex t logical step, by evaluating the utility of the corroboration index and its behavior when appraising theories employing path analytic methods in the context of social science research. A simulation study was used to evaluate the behavior of the index of corroboration under a variety of encountered conditions that are typically encountered in the conduct of empirical research. Many researchers approach path analysis by beginning with a model in which there is substantial confidence. This confidence may stem from either theoretical or substantive reasoning about the linkages between the variables under investigation. Less attention, however, is typically given to estimating the

PAGE 21

8 magnitude of such linkages. Most areas of psychology do not permit a high de gree of precision. According to Blaich (1998), quasi quantitative predictions of rough magnitudes of effects could help advance the field. Therefore, the primary focus of this investigation was on the precision in the prediction of the magnitude of effec ts and an examination of factors that moderate the relationship between corroboration and precision. An extension of this nature required modifications of the index as it was originally conceptualized. For example, when considering more than one paramete r estimate, the formula used in the calculation of closeness was: 1 1 1 J J j j j D CL S = = Where J is equal to the number of parameters (i.e., path coefficients) being estimated. Additionally, an earlier exploration of variations in the calculation o f intolerance resulted in the following formula deemed to be most appropriate: 1 1 = = J j j j I In S For a complete discussion of the rationale behind the selection of these estimates of multivariate closeness and intolerance, consult the section e ntitled Multivariate Extension of C i provided in Chapter Three.

PAGE 22

9 Four Research Questions and Three Research Hypotheses Research Questions 1. What is the relationship between mean C i and the main effects examined in the study (i.e., verisimilitude, intoler ance, model complexity, collinearity, and sample size)? 2. What is the relationship between the standard deviation of C i and model complexity, collinearity, and sample size? 3. To what extent is the relationship between mean C i and the precision of predi ction (i.e., intolerance) influenced by the complexity of the model (i.e., the number of variables in the model)? 4. To what extent is the relationship between mean C i and the precision of prediction (i.e., intolerance) influenced by the level of collinea rity? Research Hypotheses 1. The relationship between mean C i and the precision of prediction (i.e., intolerance will be slightly influenced by the closeness of the data to the theory (verisimilitude). 2. The relationship between mean C i and the precis ion of prediction (i.e., intolerance) will not be substantively influenced by sample size. 3. The relationship between mean C i and precision of prediction will be substantively stronger than the relationship between mean C i and verisimilitude, model compl exity, collinearity, and sample size.

PAGE 23

10 Delimitations and Limitations This examination is limited to an exploration of the aforementioned relationships through the use of traditional path analyses employing least squares regression analysis. The focus of t his inquiry is on the magnitude of path coefficients obtained by examining the relationship between observed variables, rather than latent variables, which would necessitate the employment of more sophisticated methods, such as structural equation modeling techniques. Additionally, this study includes models in which the causal flow is unidirectional, that is, the investigation of a series of recursive models. Organization of the Study In the first chapter, the reader is acquainted with the ongoing contr oversy and some of the past and current thinking with respect to tests of null hypotheses that are derived from aspects of theory. The rationale and purpose for the study are outlined and the research questions and hypotheses are advanced. Included in thi s chapter are the delimitations, limitations and important definitions central to the study. Chapter Two is devoted to a review of the relevant literature that coheres around the central theme of theory testing in the social sciences, focusing on the cent ral issues related to inquiry of this nature. Although little research to date has been conducted on the behavior of Meehls index of corroboration, related literatures that required exploration included the philosophy of science, theory testing, path ana lysis, and an examination of fit indices in the context of structural

PAGE 24

11 equation modeling. The index of corroboration is operationally defined in this chapter and a illustrative example is provided. Reactions to the notion of a corroboration index and past research are also discussed. Lastly, the importance of the study is explicated. Chapter Three outlines the method employed, and describes the central design factors, the procedures and data analysis strategy employed in this study. The procedures outli ned here include the selection of the components of the multivariate corroboration index, the conduct of the study, and the data generation strategy. Chapter Four presents the results of the study. The results are organized with respect to each of the re search questions and hypotheses. In addition, within each section, one the primary research questions and hypotheses have been addressed, supplementary analyses and results are examined in order to further elucidate some of the more subtle relationships e videnced in the data. The chapter concludes with a summary of the chapter key findings. Chapter Five provides a sound set of conclusions that are firmly grounded in the results of the study, the findings of past empirical research and the body of literat ure that coheres around the central theme of theory testing in the social sciences. Following this recapitulation of the major findings of the study, important implications for practice and theory are advanced. The chapter concludes with recommendations for future research.

PAGE 25

12 Definitions Closeness. The closeness of the data to the theoretical prediction (verisimilitude or truth likeness) defined as 1 minus the deviation divided by the Spielraum. Cl = 1 (D/S) (Meehl, 1990a). Collinearity. Refers to correlations among independent variables. Literally, collinearity refers to the case of data vectors representing two variables falling on the same line, that is, two variables that are perfectly correlated. However, most researchers use the term to imp ly near collinearity among a set of independent variables (Pedhazur, 1997). Corroboration index (C i ). This index is defined as a standardized measure of the extent to which empirical research supports or contradicts a theory. The index is defined as cl oseness (Cl) multiplied by intolerance (In) (Meehl, 1990a). Deviation (D). The deviation of the observed data from the tolerance interval of the theory (Meehl, 1990a). Endogenous variable. In a causal model, an endogenous variable is one whose variati on is explained by exogenous or other endogenous variables in the model (Pedhazur, 1997, p. 770). Empirical Fit. The degree of congruence (or fit) between the hypothesized model and the observed data (Hu & Bentler, 1999). Exogenous variable. In a caus al model, an exogenous variable is one whose variation is assumed to be determined by causes outside the

PAGE 26

13 hypothesized model (Pedhazur, 1997, p. 770). That is, a variable that lacks hypothesized causes in the path analysis model. Interval (I). The inter val tolerated by the theory. The unstandardized precision of prediction (Meehl, 1990a). Intolerance (In). The standardized precision of prediction. Intolerance is defined as 1 minus the interval divided by the spielraum. In = 1 (I/S) (Meehl, 1990a). M odel Misspecification. For this study, model misspecification can occur when true paths are omitted or ancillary paths are included in the model. In this context, model misspecification is reflected in the level of verisimilitude. Path coefficient. A standardized regression coefficient indicating the direct effect of one variable on another variable in path analysis. For each independent variable in the equation, there is a path coefficient indicating the amount of expected change in the dependent va riable as a result of a unit change in the independent variable (Pedhazur, 1997, p. 772). Path model. A diagram that graphically displays the hypothesized pattern of causal relations among a set of variables (Pedhazur, 1997, p. 770). Recursive model. A model that considers only unidirectional causal relationships, that is reciprocal causation between variables is ruled out (Pedhazur, 1997, p. 771). Spielraum (S). The range of data values that are expected whether or not the theory is true (Meehl, 1 990a).

PAGE 27

14 Theoretical Fit. The degree of isomorphism between a theoretical model and a true model (Olssson, Troye, & Howell, 1999, p. 31). Theory. A set of interrelated constructs, definitions, and propositions that present a systematic view of phenomena by specifying relations among variables, with the purpose of explaining phenomena (Kerlinger, 1964). Variance Inflation Factor (VIF). This component indicates the inflation of the variance of b ( i.e., the estimated path coefficient ) resulting from th e correlation between two or more independent variables (Pedhazur, 1997). Verisimilitude. The closeness of the observed data to the theoretical prediction (truth likeness). Verisimilitude is an ontological concept; that is, it refers to the relations hip between theory and the real world (Meehl, 1990a, p. 133).

PAGE 28

15 Chapter Two Review of the Literature Overview The review of the literatures coheres around the central theme of theory testing in the social sciences. This chapter is divided into six major s ections: Philosophy of Science, Theory Testing, Path Analysis, Fit Indices, Meehls Index of Corroboration C i and Importance of the Study. The chapter is organized in this manner to facilitate communication of the central issues by eliciting insight fro m the extant literature, and to develop a balanced landscape for the presentation of competing methodologies. Although the chapter is physically divided into major and minor subsections, at times, no true conceptual boundaries may exist. The review of the literature begins with a broad overview, recounting the history of the philosophy of science by tracing the evolution of thought and practice that have characterized theory appraisal over the past few decades. Once the contemporary origins of this discip line have been explored, the review is naturally extended through an examination of traditional methods employed in theory testing. In this section, common approaches are described, methodological obstacles and objections are advanced, and observed defici encies inherent in hypothesis testing approaches are uncovered. In addition, there is a brief review of common features of theories that are often deemed desirable. This section invites readers to extend their thinking beyond

PAGE 29

16 the consideration of a singl e or traditional indicator of verisimilitude, suggesting a broader milieu that might include supplementary indices and/or other variables. Following this section, attention is then directed to a prcis of path analysis, one of the most widely employed meth ods in the testing of theories. This section examines some of the methodological nuances of this popular method and considers certain conditions that need to be satisfied when utilizing this type of statistical technique. To further elucidate this approa ch, a brief review of structural equation modeling is presented. A central element of this statistical modeling method, the determination of empirical fit, is introduced via a commentary on the similarities and differences, as well as the inherent strengt hs and weaknesses of some of the more commonly employed indices of goodness of fit. This section concludes by addressing the question, How persuasive is a In the fifth section of this chapter, the corroboration index is reintroduced and an arg ument for the incorporation or adoption of an index of corroboration is advanced (Meehl, 1997, 1990a). This discussion is augmented with some of the comments and criticisms presented by contemporary scholars. This major section of the literature review co ncludes with a summary of the recent empirical research that establishes a firm foundation for the current research endeavor. The review of the literature is brought to a close by addressing the importance of the current study and thus reveals the potenti al utility of a corroboration index in the evaluation of theories across a vast array of domains and disciplines.

PAGE 30

17 Philosophy of Science According to Kuhn (1962), the history of any science can be described by a succession of incommensurable paradigms. I n this view, competing paradigms do not agree on what constitutes knowledge or the meaning of truth, with empirical work done in one paradigm having little importance or relevance to another. Beliefs constituting a paradigm are so fundamental they are imm une from empirical testing. In this regard, experimental failures may lead to the rejection of specific theories, but the paradigm remains untouched, directing the construction of new theories. The recent work of other philosophers of science such as Lak atos and Lauden stand in contrast to those views held by Kuhnians, suggesting that research programs are not incommensurable, and evolve in ways not predicted by Kuhn (Gholson & Barker, 1985). Lakatos (1970) substituted the Kuhnian paradigm with a researc h program that involves a succession of theories. A hard core of shared commitments links theories, each successive theory introducing a new and more detailed articulation of these commitments. Accordingly, a protective belt of dispensable hypotheses provides shelter from immediate empirical refutation. Dispensable features are modified by successive theories with the core assumptions remaining intact. The ability to stimulate the development of complex and adequate theories is viewed as an objective feature and important characteristic of any research program. Lauden (1977) replaced the notion of a research program with a research tradition. This extension involves families of theories sharing a

PAGE 31

18 common ontology and methodology, that is, a shared v ision of reality and agreement regarding appropriate ways to investigate that reality. In addition to empirical factors, conceptual factors are viewed as important in theory appraisal, and independent of experimental success or failure. Lauden also offer ed a solution to Lakatoss unrealistic requirement that core commitments pass unchanged through successor theories, contending that core principles are not functionally metaphysical and can be modified in response to empirical testing. Theory Testing For nearly three quarters of a century, statistical significance testing has been the most widely used method of analysis in psychological experiments (Nickerson, 2000). In many areas of psychology, refutation of the null hypothesis has been the sole theory te sting procedure employed (Meehl, 1990aa). Over the years, a considerable amount of doubt has been cast on the merit of null hypothesis testing as a theoretical tool. Commenting on the slow progress in soft psychology, Dar (1987) stated that null hypothes is tests are destructive to theory building. According to Lykken (1968), theory corroboration requires testing of multiple predictions because the chance of getting statistically significant results for the wrong reasons in any given case is surprisingly high (p. 158). For example, in the social sciences we are typically concerned with many variables, some that are within our control whereas others are not. Many of these variables, although not of direct interest or central to a study, have been shown to be nuisance variables, variables that may have a significant influence, or may

PAGE 32

19 interact with each other (Meehl, 1978). Additionally, the well known influence of sample size on statistical significance tests may well be in itself cause for skepticism of unlikely statistically significant results. Analyses need to be designed to shed light on whether a model is consistent with the data, if not, then doubt is cast about the theory from which the model was derived. Consistency, however, does not constitute proof, it merely lends support. According to Popper (1959), all research can accomplish is falsification of theory -those theories that survive are not disconfirmed. Gigerenzer (1998) has argued that the institutionalization of null hypothesis significan ce testing has permitted surrogates for theories to flourish resulting in one word explanations, redescriptions, vague dichotomies, and data fitting. Arguments against the use of tests of statistical significance abound. According to Meehl (1990a), any nu ll hypothesis of zero correlation between two variables or of zero difference between two sample means may confidently be set up by an investigator as a straw man which often can be refuted, even when conceptually meaningless predictors are chosen at ran dom. Carver (1978) contends, statistical significance tells us nothing directly relevant to whether the results we found are large or small, and it tells us nothing with respect to whether the sampling error is large or small (p. 291). Over the years, there has been a concerted effort aimed toward encouraging researchers to standardly provide some indication of effect size along with or in place of the results of statistical significance tests. Effect sizes have been viewed as consistent with null hyp othesis significance testing and as an important complement. This move,

PAGE 33

20 alluded to in the introductory chapter, represents a positive step forward with respect to improved intellectual inquiry; that is less reliance on null hypothesis significance tests (T hompson, 2002). Meehl (1990a) contends the way in which a theory accumulates money in the bank is by passing several stiff tests; claiming that the main way a theory gets money in the bank is by predicting facts that, absent the theory, would be anteced ently improbable (p 115). Theoretical support depends on a variety of factors, including the relative uniqueness of the prediction, how surprising the prediction is, the precision of prediction, and degree of correspondence between the prediction and the observed data (Nickerson, 2000). The role of theory in the formation of causal models was perhaps most forcefully expressed by Hanson: Causes are connected with effects; but this is because our theories connect them, not because the world is held toge ther by cosmic glue. The world may be glued together by imponderables, but that is irrelevant for understanding causal explanations. The notions behind the cause x and the effect y are intelligible only against a pattern of theory, namely one which p uts guarantees on inferences from x to y Such guarantees distinguish truly causal sequences from mere coincidence (1958, p.64) Faust and Meehl (2002) contend that in the evaluation of theories, researchers need to develop predictors of the success of t heories or their

PAGE 34

21 longtime fate. This can be accomplished by starting with a list of properties or indices, and by beginning to explore the relationship between these variables or features and theory success. There are a number of features of theories th at are deemed desirable by researchers. Among the commonly advanced characteristics there is much overlap but little agreement regarding which features should assume the greatest importance, or which should prevail over other features when inconsistencies or contradictions arise, or when different features favor competing theories (Faust & Meehl, 2002). These authors advance an abbreviated list of desirable features that includes: parsimony, which itself can be divided into a number of characteristics, s uch as simplicity of explanation or the fewest postulates per observation statement. The list might also include novelty in relation to numerical precision, that is some variation of Popperian risk or Salmonian damn strange ld add rigor, qualitative diversity or breadth, reducibility upward or downward, and elegance or mathematical beauty (p. S187). Further, these authors contend that no credible philosopher of science has ever claimed that any one of these features is a gu arantee of truth or that any one feature is always superior to another. As an aid to the future evaluation of theories, Faust and Meehl (2002) reintroduced an index of predictive accuracy in relation to risk (a.k.a., Meehls C i ) and proposed that addit ional indices could be developed to rate qualitative diversity and parsimony. By working with a host of potential variables related to theory status, perhaps some of those traditionally advanced combined with

PAGE 35

22 others less traditional or yet to be developed we can get a better sense of predictive power, how variables are best combined, and how we might begin to cope with inconsistencies. A contemporary debate with respect to theory appraisal and theory development can be found within the pages of a recent issue of P sychological Methods Within these journal pages, scholars revisited this vital issue and offered a host of recommendations and suggestions. These include yet another statistical approach to strong appraisal of truth or verisimilitude that invo lved a class of path diagrams (Meehl & Waller, 2002); an evaluation of tests of statistical significance (Markus, 2002); a treatise on just identified, recursive models as compared to the delete one and add one models proposed by Meehl and Waller (Reichard t, 2002); and commentaries on the proposed Meehl and Waller approach to path analysis and verisimilitude (see for example MacCallum, Browne, & Preacher 2002; and Mulaik, 2002). In sum, it appears that the controversy is still alive and well and continues to be on the minds of prominent scholars in the field. Path Analysis Causal modeling is a tremendously popular method and valuable analytical tool used extensively in the social sciences. This method is not intended to discover causes, but to shed light o n the tenability of causal models. Causal models must specify both the relationships between independent and dependent variables, as well as explicitly state the relationships among all

PAGE 36

23 variables considered. Each link between the variables under investig ation implicitly represents a hypothesis that would be tested by estimating the magnitude of the relationship (Asher, 1983). Predicated on the assumptions of valid design and execution, this method holds greater promise of increasing awareness and underst anding of more complex phenomenon than simple examinations of correlations between variables without attention to mediating or spurious relationships. The examination of causal relationships, that is the cause x and the effect y are intelligible only agai nst a pattern of theory, one that puts guarantees on inference from x to y. As stated previously, such guarantees distinguish truly causal sequences from mere coincidence (Hanson, 1958). Path analysis is an extremely popular statistical method and there h as been a substantial increase in the use of this type of modeling technique over the years by social and behavioral scientists. Path analysis falls within the general category of methods referred to as structural equation modeling or covariance structure analysis. This method is commonly used for analyzing systems of structural equations and allows researchers to shed light on questions regarding whether or not a proposed causal model is consistent with the data. One advantage of this technique is that it allows a researcher to investigate the utility of a proposed theoretical framework. Accordingly, a proposed theory is represented by a mathematical model. This mathematical model conveys the nature of the relationships among the variables under inves tigation. Consistency of a model with the data, does not however constitute proof of a theory, but rather provides support for a particular theory.

PAGE 37

24 Competing or equivalent models can also be consistent with the same data. For any given covariance structu re model, there will often be alternative models that may be indistinguishable from the suggested model in terms of goodness of fit (MacCallum, Wegener, Uchino, & Fabrigar, 1993). The decision regarding the tenability of such models rests not on the data but on the theory from which the model was generated. According to Bollen (1989), contemporary applications of path analysis emphasize three components: the path diagram, the decomposition of covariances and correlations in terms of model parameters, and the distinction between direct, indirect, and total effects of one variable on another. Pedhazur (1997) claims that although a path diagram is not essential for the numerical analysis employed in path analysis, it provides a useful venue for visibly displa ying hypothesized patterns of causal relationships among a set of variables. Estimates of model parameters, path coefficients, provide information with respect to the magnitude of the direct effect, or expected amount of change in a dependent variable res ulting from a unit change in the independent variable, holding all others constant (Pedhazur, 1997). Path coefficients represent the individual components that result when we decompose the correlation between two endogenous variables or between an exogeno us and endogenous variable. The distinction between total, direct and indirect effects arises from the relationships represented by the causal model. Consider the simple model depicted in Figure 2.

PAGE 38

25 Figure 2 Mediated Causal Model. Consider the total effect of X 1 on X 3 In this model we observe that X 1 has a direct effect on X 3 as described by p 31 We also observe that X 1 has an indirect effect on X 3 as mediated by X 2 (p 21 and p 32 ). This results in a two part decomposition, that is the direct effect of X 1 on X 3 and the indirect effect of X 1 on X 3 via X 2 The total effect of one variable on another is the sum of both direct and indirect effects (Alwin & Hauser, 1975; Duncan, 1975; Finney, 1972). Therefore, the total effect o f X 1 on X 3 is calculated by adding the direct effect (p 31 ) to the product of the paths constituting the indirect effects (p 21 *p 32 ). Additionally, direct, indirect and total effects can be produced for other variables in the model, for example the total ef fect of X 1 if we add X 4 to the model. The total effect of X 1 on X 4 would be obtained by calculating the product of the indirect effects (i.e., p 21 *p 32 *p 43 + p 31 p 43 ). p 21 p 43 p 31 p 32 X2 X3 X1 X4

PAGE 39

26 Fit indices One benefit of a structural equation modeling approach is that fit indices are available for assessing the tenability of the estimated model. Because models are only approximations of reality, they can be expected to fit only approximately. If the relationships implied by the model (as evidenced in the variance/covariance matrix ) are not consistent with those observed in the data, it becomes clear that the model is inadequate and that the effect estimates should be questioned. Hu and Bentler (1999) contend that the two most commonly employed methods of evaluating model fit are t hose that involve the 2 c goodness of fit statistics and fit indices. There are a variety of goodness of fit measures and a host of methods used in determining the goodness of fit of a proposed theoretical model. These indices gen erally fall into two broad categories, absolute and incremental fit indices (Bollen, 1989; Gerbing & Anderson, 1993; Hu & Bentler, 1995; Marsh, Ball, & McDonald, 1988; Tanaka, 1993). Absolute measures of goodness of fit assign a numerical value to the deg ree to which the proposed model reproduces the sample data (i.e., variance/covariance structure), that is, the degree of misspecification of a particular model. According to Hu and Bentler (1999) reference models are not typically used to assess the amoun t of increment in model fit, however, it is possible to make a comparison to a saturated model, one that accurately reproduces the sample covariance structure. Incremental indices, on the other hand, are used to compare the fit of the proposed model to th e fit of a null model. If goodness of fit is adequate, it is

PAGE 40

27 considered evidence for the plausibility of the model, that is, the theoretical model may accurately represent the true model. Again, for any given covariance structure model, there will often be alternative models that may fit the data equally as well (MacCallum, 1993). The chi square goodness of fit index is frequently used to assess the fit between the variance/covariance matrix implied by the model and the observed covariance matrix of the sample. Early investigations of the behavior of 2 c (Boomsma, 1982) revealed that this index was too dependent on sample size to be useful in many situations. There is now general acceptance of the unsatisfactory behavior of the 2 c statistic for the assessment of model fit (Thompson & Daniel, 1996). It is important to note, however, that although sample size will not cause a good model to have poor fit, with larger sample sizes minor deviations in parameter s estimates are often detected. Absolute Indices of Fit A host of fit indices have been designed to avoid some of the pitfalls involved with sample size, violations of distributional assumptions and model misspecification, hazards that have traditionally plagued the traditional overall test of fit (i.e., the 2 c statistic). Absolute indices of goodness of fit include the Goodness of Fit Index (GFI) and the Adjusted Goodness of Fit Index (AGFI; Bentler, 1983; Joreskog & Sorbom, 1984 ; Tanaka & Huba, 1985), Steigers Gamma Hat, a rescaled version of

PAGE 41

28 Akaikes information criterion (CAK, Cudeck & Browne, 1983), a cross validation index (CK, Brown & Cudeck, 1989), McDonalds (1989) Centrality Index (Mc), Hoelters (1983) Critical N (CN), a standardized version of Joreskog and Sorborms (1981) root mean squared residual (SRMR; Bentler, 1995), and the root mean square error of approximation (RMSEA; Steiger & Lind, 1980). (Hu & Bentler, 1999, p. 2) There is little empirical support that the se other fit indices can more unambiguously point to model accuracy as compared to the 2 c test (Hu & Bentler, 1999). Further, in their investigation of the effects of sample size, estimation methods and model specification, Fan, T hompson, and Wang (1999) issued concerns about the behavior of certain fit indices and the information that they provide with respect to misspecified models, specifically their noncomparable nature and the strong influence of estimation method. Additionall y, often cited problems exist with various measures, as they are affected by sample size, and may indicate good overall model fit even when one or more of the parameters in the model is poorly determined and fail to provide information regarding what is wr ong with the model (Fraas & Newman, 1994).

PAGE 42

29 Incremental Fit Indices In contrast to absolute fit indices, incremental fit indices measure the improvement in fit by comparing a target model with a more restricted, nested baseline model (Hu & Bentler, 1999) Most typically, a null model, one in which there are no correlations between the observed variables is considered (Bentler & Bonett, 1980). Examples of incremental fit indices include the Normed Fit Index (NFI, Bentler & Bonett, 1980), Bollens fit ind ex (BL86, 1986), the Tucker Lewis Index (TLI, 1973), another index developed by Bollen (BL89, 1989) Bentlers (1989, 1990) and McDonald and Marshs (1990) Relative Noncentrality Index (RNI) and Bentlers Comparative Fit Index (CFI). The formulas for some of the aforementioned indices are provided in Hu and Bentler (1999). Hu and Bentler (1999) claim that there are two pressing issues that must be considered in the proper application of fit indices for model evaluation. These issues are important consider ations for applied researchers and methodologists. The first important issue concerns the behavior of fit indices under various data and model conditions, including a host of commonly encountered situations in general practice. These conditions include s ensitivity of fit index to model misspecification, small sample bias, estimation effects, effects of violations of normality and independence, and bias of fit indexes resulting from model complexity (p. 4). The second issue involves the judicious applica of thumb. As with many rules of thumb, little consensus exists with respect to conventional cut off criteria, and often recommendations are diverse and or inconsistent. In light of the lack of empirical evidence, questions remain with

PAGE 43

30 res pect to the adequacy of these conventionally advanced cutoffs. A recent examination of this issue (Hu & Bentler, 1999) revealed that for some of the recommended fit indices, the cutoff criterion was evidenced to be greater (or smaller) than conventional r ules of thumb required for model evaluation or selection. Binomial Index of Model Fit In contrast to some of the more traditional indices, Fraas and Newman (1994) proposed a binomial test of model fit as an alternative method for determining the goodness of fit of a proposed theoretical model. This method, employing an index referred to as the binomial index of model fit value, is based on the application of the binomial distribution to the number of paths in a model that are supported by the data. This approach requires that an event be classified into one of two categories according to certain criteria, that is, the determination of whether the data provide support for a given path. This determination can be made in a number of ways. For example, the decision for support can be based on (a) the parameter estimate for a path exceeds an a priori effect size, (b) the parameter estimate is statistically significant, and (c) the parameter estimate reflects a hypothesized algebraic sign or any combination of these. After criteria have been established to determine whether a given path is supported by the data, the second step involves testing the actual number of paths supported by the data. Using a binomial test, the probability of obtaining at least the n umber of paths supported by the data is calculated. If the calculated

PAGE 44

31 probability is less than the alpha level (i.e., less than would be expected to occur by chance), the conclusion drawn is that the data are supportive of the model. These authors conte nd that the use of a binomial test to estimate how well data support a theoretical model differs conceptually from other goodness of fit approaches. They purport that this method is better described as the degree to which the paths support the nomological net of a theory, rather than being based on the reproduction of a variance covariance matrix. Concerns and criticisms surrounding this approach include the lack of independence between events, the effect of sample size when employing statistical significa nce as a criterion, and the limited capacity of the index to provide insight regarding path misspecification. However, the most salient problem for this line of inquiry is the differential application of criteria in determining support for a given path. Freedom to adjust this criteria will likely result in contradictory conclusions regarding model fit, leading to inconsistency across studies and thus failing to provide a standardized estimate of the precision of prediction. Theoretical and Empirical Fit In an investigation designed to compare the performance of different maximum likelihood and generalized least squares estimation techniques, Olsson, Troye, and Howell (1999) examined both measures of theoretic fit and empirical fit. According to these aut hors, in research the goal is often to construct models that reflect the structures and parameters of some

PAGE 45

32 unobservable causal mechanism. The degree of isomorphism between such a theoretic model and a true model can be labeled theoretic fit (p. 31). Alternatively, and most commonly, measures of empirical fit are employed because they serve as the only available evidence of the adequacy of the theoretical structure and accuracy of the parameter estimates and hence provide indirect support for a theory However, in a Monte Carlo study, when the true population parameter values are known, the discrepancy between the true values and the estimated parameter values can be calculated. For example, in a simulation study we can construct theoretical models t hat reflect the structures (M true ) and parameters (P true ) of some unobservable true model of the underlying causal mechanisms assumed to generate the empirical observations to achieve theoretical fit. However, in realistic settings, M true and P true are un known and there is no direct evidence of theoretical fit. Therefore, researchers make use of indicants of the theoretic models ability to account for the structures of the data employing indices of overall fit (Chi square, RMSEA, etc.), in addition to si gnificance tests of the parameters. If the goodness of fit is adequate, it is considered as evidence for the plausibility of the model; that is the theoretic model M theory may accurately represent M true To the extent that M theory is wrong (i.e., the the oretic model), an ideal estimation procedure would provide an accurate estimate of model error (Olsson, et al., 1999, p. 34 35).

PAGE 46

33 How persuasive is a Good Fit? According to Roberts and Pashler (2000), it is a mistake to allow good fits to provide subst antive support for a theory. These authors contend that the practice of using good fits to support theories is flawed in several ways, advancing several possible reasons for their continued use. These reasons include, a desire to imitate physics, the pre sence of confirmation bias, theory complexity, neglect of basic principles and a popularity based at least partly on repetition and inertia. A good fit reveals nothing about the flexibility of the theory (how much it cannot fit), the variability of the d ata (how firmly the data rule out what the theory cannot fit), or the likelihood of other outcomes (perhaps the theory could have fit any plausible result) (p. 358). In order to determine how much the fit should increase our belief in a proposed theory one must employ all three of the aforementioned pieces of information. Showing that a theory fits data is not only not enough, it is nearly meaningless. These authors also contend that it is necessary to compare plausible alternative outcomes with what the tested theory could explain through an examination of both the flexibility of the tested theory and the variability of the actual results. Further, the resultant evidence will not be very convincing if either is large compared to the range of plausibl e outcomes (Roberts & Pashler, 2000).

PAGE 47

34 Meehls C i Meehl (1997, 1990a) has proposed an index of corroboration ( C i ) that may provide a standardized means of expressing the extent to which empirical research supports or contradicts a theory: C i = (Cl)(In) where Cl = the closeness of the data to the theoretical prediction (verisimilitude or truth likeness), and In = the intolerance of the theory (e.g., a standardized precision of prediction). These terms are further explicated as follows: Cl = 1 (D/S) where D = deviation of observed data from the tolerance interval of the theory S = Spielraum (the range of data values that are expected whether or not the theory is true) In = 1 (I/S) where I = the interval tolerated by the theory (e.g., the raw precis ion of prediction). An Example To build on the simple example presented in Chapter One, (i.e., a prediction about population mean differences), let us now consider a theory that

PAGE 48

35 posits a relationship between two variables. Recall that large values of C i should result from strong theories making tight predictions in which data are very similar to predicted values. Lets suppose a researcher has made a prediction that a positive correlation exists between attitude toward computers and integration of compu ters in the classroom. This prediction is somewhat stronger than a simple prediction that a correlation exists, because a directional relationship is predicted. However, the prediction is less precise than a prediction that posits a positive relationship between .5 and 1.0. Further, knowing that the plausible values of a correlation range from 1.0 to +1.0, whether or not the theory is true, the Spielraum (S) is thus 2. In this example, the simple directional prediction of a positive correlation betwee n attitude and integration suggests a tolerance interval of 1.0 (any correlation greater than zero is consistent with this flabby prediction) and an intolerance (In) of 1 1/2 or 0.50. If the sample correlation between attitude and integration is found to be .50, the data do not deviate from the prediction (Cl = 1.0) and Meehls C i = (Cl)(In) = (1.0)(.50) = .50. If the prediction was not simply a positive correlation but a positive correlation between .5 and 1.0, then the tolerance interval is .5 and In = 1 .5/2 or .75. The same observed data (a correlation of .50) are also consistent with this prediction, but C i = (1.0)(.75) = .75. The latter theory receives more corroboration from the data because it made a riskier prediction that was consistent wi th the observations. Suppose the observed data evidenced a correlation of .5, indicating an inverse relationship between attitude and integration. Such data are not

PAGE 49

36 consistent with the predictions of either theory. For the theory providing a directiona l prediction only, the data deviate (D) from the lower bound of the tolerance interval by .5 and Cl = 1 D/S = .75. These data provide a corroboration index value of (Cl)(In) = (.75)(.50) = .375. For the riskier prediction of a positive correlation betw een .50 and 1.0 the data deviate by 1.0 and Cl = 1 D/S = .50. For this theory, the data also provide a corroboration index value of (Cl)(In) = (.50)(.75) = .375. Although the observed data deviate to a greater extent from the prediction of the latter th eory, the corroboration is the same, in this particular case, because the prediction was more precise. Table 1 and Figure 3 present the values of C i that would be realized for the values of sample correlation under four levels of precision of prediction. Note that as predictions become more accurate (the observed correlation is closer to the predicted correlation), higher values of C i are obtained with more precise predictions. When the prediction is far from the observed value, higher values were observe d from looser predictions. Further, the intolerance of the theoretical prediction presents an upper bound for C i (i.e., precision of prediction limits the degree of corroboration regardless of the magnitude of the observed correlation). Table 1. Values o f C i under Four Levels of Precision Prediction S I In r C i .1 0 .10 or < -> rr 2 1.8 .10 .5 .10 0 r > 2 1 .5 .5 .50 .5 0 1.00 r < =< 2 .5 .75 .5 .75 .5 0 .70 r << 2 .2 .9 .5 .90

PAGE 50

37 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Observed Sample Correlation Meehl's C i Precise Prediction (.50 < r < .70) Moderate Prediction (.50 < r =< 1.00) Directional Prediction (r > 0) Non-directional Prediction (r < -.10 or r > .10) Figure 3 Predicted Values of C i under Varying Levels of Precision.

PAGE 51

38 Amalgam Recall, from the previous chapter, that movement from theory into an empirical test necessitates the incorporation of many logical components besides the theory itse lf. That which is subject to empirical test is not the theory alone, but the amalgam of these elements. Meehl (1997, 1990a) presents these components as elements of an equation: ( ) ( ) 12 gggg xpin TA C A C OO Where T = the theory being tested, A x = Auxiliary theories relied upon during the conduct of the research. C p = Ceteris paribus (all other things being equal), A i = Instrumental theories related to measures and controls employed, C n = Realized particulars (the extent to which the research was actuall y conducted as we think it was), and 12 OO = the material conditional if you observe O 1 you will observe O 2 . Auxiliary theories ( A x ) lie at the periphery of the theory being tested and are somewhat distinct from the hard core con cepts or postulates of the theory under investigation. Although central portions of a particular theory may not be rigorously defined, there will likely exist key critical components as well as non central elements. These tangential components (although not central to the theory being explored) are still, in fact, a part of the theory.

PAGE 52

39 For example, in an investigation of the relationship between nutrition and anxiety in which anxiety is measured using responses to Likert type items written in English, t he use of participants whose primary language is not English necessitates an auxiliary theory that the inferences from the scores derived from the anxiety instrument retain their validity in such a population. If data obtained from such research fail to su pport theoretical predictions, the failure may be attributable to the core theory being incorrect or simply that the auxiliary theory did not hold. The concept of verisimilitude (truth likeness) is closely related to this core peripheral distinction. Meeh l (1990a) suggests that a theory that is false in its core postulates has lower verisimilitude than one that, while correct in its core concepts, is incorrect in several of its peripheral ones. As even the best theories are likely to be approximations of the true state of reality, verisimilitude then, refers to the relationship between the theory and the real world. Ceteris paribus does not mean that all factors not mentioned are equal for all participants, but rather that there are no systematic factors left unmentioned. This clause amounts to a very strong and highly improbable negative assertion that nothing else is at work except factors that are totally random and therefore subject to being dealt with by our statistical methods (Meehl, 1990aa, p. 1 11). The instrumental auxiliary theories ( A I ) are related to measures and controls employed by the researcher. These are distinguished from A x in that they do not contain any psychological constructs. Thus, if anxiety is measured by changes in

PAGE 53

40 galvanic s kin response rather than by a Likert instrument, the auxiliary theory at work is within A I rather than A x The realized particulars ( C n ) represent the extent to which the research was actually conducted as we think it was. This element of the amalgam repr esents treatment integrity. For example, if we plan to manipulate participant nutritional status to examine its relationship with anxiety, but the participants do not adhere to their dietary treatment, then the variable actually applied in the research is not what we think it is. Data that contradict our theory may arise because of this perturbation in C n Reaction to Meehls C i As might be expected, the mere mention of an index of corroboration, or an attempt to quantify meta theory, inspired consider able debate. Campbell (1990) contended that verisimilitude needs to be considered in the context of a pattern of predictions, one that can be matched to a pattern referred to as the s c facts. The s c facts represent the focal collective scientific con sensual facts, allowing for the connotation so called facts (Campbell, 1990, p. 144). The so called facts can arise from earlier tests of theories, or from theoretical sources such as exploratory experimentation or refined folk observations. In Campb ells view, the incorporation of the so called facts results in reducing the exaggerated role of theory. Competing theories would thus be compared based on their goodness of fit to the shared s c facts they provide predictions for. Campbell (1990) sugges ted a simple correlation might be provided as evidence of

PAGE 54

41 verisimilitude. It would seem that comparing correlation coefficients (a standardized measure) would indeed be similar to comparing the component of i C that represents close ness ( Cl ), as this estimate represents a standardized measure of verisimilitude. Chow (1990) asserted that Meehls argument (i.e., theory appraisal based on numerical predictions in various situations) is only appropriate when a theory is being tested with non experimental methods or in an ex post facto manner (p. 147). He further argues that given the inherent difference between experimental and non experimental studies, the proposed corroboration index would not be appropriate when a theory is bei ng tested experimentally. In agreement with Meehl, Humphreys (1990) contended that the target article not be restricted to courses and seminars on psychological theory, but rather it should be required reading for every graduate course in quantitative meth ods. Additionally, this researcher asserted that substantive advances in psychological research would occur, if psychologists were to plan their research, analyze their data, and discuss their findings in congruence with the current target article (p. 1 55). In response to the aforementioned commentary, Meehl (1990b) addressed each of the commentators and advanced a more focused discussion of the corroboration index and verisimilitude. In general, these comments served to clarify certain references and specific claims, and to underscore the intended purpose of the corroboration index. In particular, Meehl (1990b) noted the nearly wholesale lack of enthusiasm for the proposed index despite agreement with the

PAGE 55

42 critical aspects of his overall position. In a concluding remark, Meehl (1990b) offered the following notion, In employing any useful numerification of an open concept in the social sciences, one is properly alert to the caveats, but not frightened into cognitive paralysis by them (p.177). Clearl y, disapproving responses were likely anticipated, as we know from the history of science that radically novel ideas regularly meet with resistance, and statisticising metatheory is certainly a new and radical idea (p. 177). Past Research on C i An i nitial examination of the utility of the index and its behavior in theory testing was conducted in the context of a simple theory, the core of which predicted a difference in means between two groups (Hogarty & Kromrey, 2000). This effort was aimed toward illuminating the relationship between the closeness of the observed data or verisimilitude and the precision of prediction. The relationships explored in this study included factors related to the nature of the theory being tested (i. e., predicted mean difference between groups, the raw tolerance interval of the theory, and the Spielraum), the degree of correspondence of the theory to the actual populations simulated (i.e., population difference in means and variance ratios between the two populations), and research design factors (i.e., sample size, reliability of the dependent variable, and the confounding effect of an extraneous variable). An important limitation of this research, however, was that the investigation considered only relationships

PAGE 56

43 for t he most basic of predictions, that is, predictions about population mean differences. Under these very limited circumstances, the m ean index of corroboration was seemingly unaffected by sample size, and notably more influenced by the level of verisimilitud e and the level of intolerance specified by the theory. In addition, the reliability of the dependent measure was shown to have but a slight influence on the mean C i and only when predictions were very close to truth. Although sample size and measuremen t reliability were not important determinants of mean C i both factors were related to the variability of this statistic, with larger samples and more reliable measures providing greater stability across samples. Although such sampling variability is impor tant, one would anticipate that the degree of support for a theoretical prediction that was tested with a large sample should be greater than that provided by a small sample. This finding clearly illuminated the need for additional work aimed at incorpora ting a sample size component into an index such as C i A second study conducted by Hogarty and Kromrey (2001) was designed to investigate the relationship between theoretical predictions and empirical results through a consideration of Meehls index of cor roboration in the context of hypothetical theories that made relatively simple predictions (magnitude of a zero order correlation) and those that made more statistically complex predictions (magnitude of a first order partial correlation). This investigat ion served to advance knowledge about the behavior of Meehls index of corroboration beyond the most basic theoretical predictions of differences in

PAGE 57

44 population means. The relationships explored in this study included factors related to the nature of the t heory being tested (i.e., predicted magnitude and direction of correlation, the raw tolerance interval of the theory, and the type of correlation), the degree of correspondence of the theory to the actual populations simulated (i.e., true population correl ation, both zero and partial correlation, and the magnitude of correlation between the two focal variables and the variable being partialed) and research design factors (i.e., sample size and reliability). Surprisingly, nearly identical values of the sta tistic were obtained for both types of prediction across the various levels of the design factors that were employed. As with the evaluation of mean differences, the major influence on C i was the precision of the prediction. This factor far outweighed the impact of closeness (verisimilitude), with theories that made tight predictions obtaining notably higher values of C i than those making loose predictions even with extreme differences between the prediction and the true population parameter. Verisimilitude was less influential in determining C i than that observed in the assessment of mean differences (Hogarty & Kromrey, 2000). In addition to the building evidence regarding the influence of the precision of prediction and verisimilitude, insight was gained a bout the impact of measurement reliability when employing zero order and first order partial correlations. In our earlier investigation, less reliable measures evidenced slightly smaller values of C i In this study, the influence of measurement reliabil ity was found to depend on the relationship between the true population correlation and the prediction. When a theorys prediction was precisely correct, the largest mean value of Meehls C i

PAGE 58

45 results from using measures with the highest reliability, with p rogressively smaller values resulting from the use of successively less reliable measures. Similarly, if the theory predicted a correlation greater than the true value, more reliable measures produced larger values of Meehls C i with the difference in val ues becoming somewhat greater as the verisimilitude decreases (i.e., a greater difference between the prediction and the reality). However, when the predicted correlation was less than the true correlation, less reliable measures provided larger values of Meehls C i with the difference increasing as the predicted value approaches zero. The observed result, that theories with lower verisimilitude may obtain greater corroboration than theories with higher verisimilitude, if the measurement of the relevant v ariables is not reliable is a function of the attenuation of the sample correlation (Pedhazur, 1997). These results suggest that caution is needed regarding the interpretation of the magnitude of C i without regard to the context of the application. Altho ugh once again sample size was not deemed an important determinant of mean C i it was seen to influence the variability of this statistic. Similar to the results obtained by Hogarty and Kromrey (2000) in the investigation of prediction of mean differences, larger samples evidenced less variability in C i across samples. A gain, this suggests additional efforts should be aimed at incorporating a sample size component into an index such as C i Despite the obvious need to reduce the emphasis upon statistical si gnificance and null hypothesis testing, sample size requirements remain important considerations in the interpretation of research evidence. Therefore,

PAGE 59

46 t he most recent work by Hogarty and Kromrey (2002) included a sample size adjustment to the calculation of C i For this study, the sample size requirement was conceptualized as the smallest sample size that a researcher(s) would no longer be substantively concerned with sampling error (the smallest size at which sampling error may be considered trivial). T his was considered the fail safe N. In this context, a weight for C i was computed as the square root of the ratio of the studys sample size to the fail safe N . That is, study failsafe N Weigh t Relativ e Size N == == Incorporating this weight in Meehls C i provides the Weighted C i ( ) ( ) ( ) 11 = = -study i failsafe N DI Weighte d C C l I n RS S SN Through the incorporation of a sample size component into Meehls index of corroboration, a statistic that more closely approximated the desired behavior was suggested. In this study, six factors were manipulated. First, three factors related to the theory being tested were included. The predicted mean difference between groups was examined at five levels, the raw tolerance interval of the theory was examined at four levels, and the Spielraum was examined at three levels. These values of raw tolerance and Spielraum yield intolerance ( In ) values ranging from 0.50 (the value of intolerance for a simple directional prediction of effects) to 0.98 (reflecting a tight, risky prediction). Second, two factors related to the true populations simulated were manipulated. The population difference in means

PAGE 60

47 was examined at five levels, and variance ratios between the two populations were manipulated at four levels. These population mean differences, crossed with the theorys pre dictions provided conditions ranging from those in which the theorys prediction exactly represented the true populations (perfect verisimilitude), to those in which the theory deviated from the true population conditions by effect sizes as large as two st andard deviations. Finally, the sample size of each study, a characteristic of research design, was investigated at four levels. Once again, these findings shed light upon the relationships of these components in the context of only the most basic of predi ctions, that is, predictions about population mean differences. Under these very limited circumstances, the Weighted C i index of corroboration was profoundly affected by sample size, only slightly influenced by the level of verisimilitude, and severely li mited by the level of intolerance. These findings suggested that the major influence on C i was the precision of the predictions. This factor far outweighed the impact of closeness (verisimilitude), with theories that make tight predictions obtaining notab ly higher values of C i than those making loose predictions, even when the predictions were substantially wrong. As anticipated, the Weighted C i (in contrast to Meehls original formulation of C i ) provides a greater degree of support for a theoretical predi ction that was tested with a large sample than that provided by a small sample. Th e importance of theoretical intolerance as a determinant of degree of corroboration highlights the need for the development of precise theories in the

PAGE 61

48 social sciences. Addi tionally, the results of the analyses focusing on more traditional approaches to theory appraisal underscore the need to extend our thinking beyond the common reject or fail to reject decisions resulting from tests of null hypotheses that are derived f rom aspects of theory. Jointly, these results suggest that efforts to develop theories in the social sciences that enjoy greater precision of prediction may concomitantly provide critical tests with greater potential for corroboration. A theorys merit is a matter of degree, rather than a yes or no question, as it is treated in null hypothesis testing (Meehl, 1990aa). A natural extension of this previous line of research should involve the examination of these relationships when making more complex predic tions from theories. An extension of the components of C i to multivariable problems, such as encountered in path analysis, is worthy of investigation. Importance of the Study The use of path analysis in the appraisal of theories was most recently debated among the pages of a topical issue of Psychological Methods (2002). In fact, the entire issue was devoted to a conversation regarding theory appraisal, causal models, tests of statistical significance, empirical fit and verisimilitude (see for example, M arkus, 2002; MacCallum, et al., 2002; Meehl & Waller, 2002; Mulaik, 2002; Reichardt, 2002; Waller & Meehl, 2002). This present study is designed to contribute to this conversation, by building upon the previous research conducted in the context of a simpl e theory through an exploration of

PAGE 62

49 the utility and behavior of the corroboration index when testing more complex predictions, such as applications of path analysis. It is anticipated that the results will support the incorporation of Meehls C i in the pla nning of empirical studies as well as the interpretation of research results. It is also important to note that the intention is not to encourage abandonment of other supplementary approaches or tools or to use the corroboration index in isolation, rather the index is advanced with the understanding that it be employed in an auxiliary or complementary role. It is hoped that the use of a corroboration index may help in reducing the hypnotic fascination with null hypothesis significance testing (Meehl, 1 990aa). Its use should serve to move the arguments surrounding theory testing away from the testing of null hypotheses into a consideration of the complexity of the research context, the degree of risk entailed by the theorys predictions, and the exten t to which the obtained data (absent the theory) represent a damn strange coincidence. The index of corroboration is unique in that it combines both a measure of the closeness (or verisimilitude) of the data and the precision with which the prediction is made. Additionally, unlike some of the other indices that are typically employed in the conduct of research, the index of corroboration is not context bound or discipline specific. In this vein, it might be viewed as behaving like an effect size that is computed differently given different circumstances or situations. The univariate corroboration index is available if that is the type of measure that is appropriate (i.e., in testing population mean differences). This multivariate extension expands th e utility of the index to the next logical level, by

PAGE 63

50 exploring the versatility of the index beyond the limited applications previously examined.

PAGE 64

51 Chapter Three Method Organization The purpose of this chapter is to elucidate the method for this study. The chapter opens with a restatement of the purpose of the study and the research questions and hypotheses. A brief overview of the utility of Monte Carlo studies follows. A description of common applications and uses of simulation methods is then presented. After the efficacy of this approach has been established the research design is described. Illustrations of the models under consideration and population correlation matrices are included to demonstrate two of the central design factors, that is, model complexity and collinearity. The justification for the multivariate extension of the corroboration index is advanced and supported by results from a series of data simulations. The conduct of the Monte Carlo study is then explained through an illustrati on of the data generation strategy. The chapter concludes with a discussion of the interpretational framework that guides the reporting of the results. Purpose Meehl (1997, 1990a) has presented a logically sound index of corroboration to summarize the e xtent to which empirical tests of theories provide support or contradiction of those theories. However, the numerical properties of this index have not been investigated beyond some of the most basic predictions about population mean differences, zero orde r correlations and first order partial

PAGE 65

52 correlations. This study is the next logical step, providing an evaluation of the utility of the index and its behavior in the testing of theories employing path analysis in the context of social science research. Four Research Questions and Three Research Hypotheses Research Questions 1. What is the relationship between mean C i and the main effects examined in the study (i.e., verisimilitude, intolerance, model complexity, collinearity, and sample size)? 2. What is the relationship between the standard deviation of C i and model complexity, collinearity, and sample size? 3. To what extent is the relationship between mean C i and the precision of prediction (i.e., intolerance) influenced by the complexity of the mo del (i.e., the number of variables in the model)? 4. To what extent is the relationship between mean C i and the precision of prediction (i.e., intolerance) influenced by the level of collinearity? Research Hypotheses 1. The relationship between mean C i and the precision of prediction (i.e., intolerance will be slightly influenced by the closeness of the data to the theory (verisimilitude). 2. The relationship between mean C i and the precision of prediction (i.e., intolerance) will not be substantively influenced by sample size.

PAGE 66

53 3. The relationship between mean C i and precision of prediction will be substantively stronger than the relationship between mean C i and verisimilitude, model complexity, collinearity, and sample size. Monte Carlo Studies The b ehavior of Meehls C i was evaluated using Monte Carlo methods. A series of simulations were conducted that related theoretical predictions to empirical results. The use of simulation methods allows the control and manipulation of research design factors and the incorporation of sampling error into the analyses. The study was designed in the context of hypothetical theories, the cores of which predict a single outcome from various configurations of exogenous and endogenous variables. The resulting path c oefficients were the parameter estimates of primary interest. The utility of Monte Carlo studies is derived, in large part, from their ability to evaluate the properties of statistical procedures and help researchers select appropriate analytical procedur es under varying design conditions. Monte Carlo studies have been employed to investigate the behavior of a variety of parameter estimates of interest to researchers, as well as the Type I and Type II error rates of statistical tests, coverage probabiliti es of confidence intervals, the bias and variability of IRT item parameter estimates, factor loadings, path coefficients, and goodness of fit indices (Serlin, 2000). In this type of study, the conditions that researchers are likely to encounter in the con duct of applied research are manipulated and the properties of the estimates are examined under each of the

PAGE 67

54 varied scenarios. The results serve to inform researchers of reasonable approaches and proper cautions to exert when particular conditions are conf ronted. Research Design The choice of characteristics of the sampled populations (or factors) in Monte Carlo studies is typically determined by examining conditions that are likely to be encountered by researchers working in applied settings. Five factor s were manipulated in these simulations: factors related to the theory being tested, the degree of correspondence of the theory to the actual populations simulated and research design factors. The two factors related to the theory being tested were the nu mber of variables in the model and the size of the tolerance interval or level of intolerance. The number of variables in the model was examined at three levels, the simplest model containing four variables, a more sophisticated model with six variables, and the most complex model containing eight variables. The set of true models is fully illustrated in Figures 4 9.

PAGE 68

55 Figure 4 Truth for Four Variable Model, Low Collinearity. p 21 =.75 p42 =.30 p43 =.30 1 2 4 3 p 31 =.75 1 2 4 3 Figure 5 Truth for Four Variable Model, Moderate Collinearity. p 42 =.30 p43 =.30 p 21 =.91 p 31 =.91 e 2 =.66 e 3 =.66 e 4 =.85 e 2 =.41 e 4 =.82 e 3 =.41

PAGE 69

56 e 3 =.76 p 65 =.31 p 63 =.31 p 64 =.31 p 53 =.34 p 43 = .34 p 32 =.37 p 31 =.37 1 2 3 4 5 6 p 41 =.28 p 52 =.28 e 4 =.84 e 6 =.69 e 5 =.84 Figure 6. Truth for Six Variable Model, Low Collinearity. r 12= .58

PAGE 70

57 P 65 =.31 p 63 =.31 P 64 =.31 P 53 =.97 p 43 =.97 p 32 =.45 p 31 =. 45 p 41 = .25 P 52 = .25 1 2 3 4 5 6 e 3 =.52 e 4 =.63 e 5 =.63 e 6 =.57 Figure 7. Truth for Six Variable Model, Moderate Collinearity. r 12 =.82

PAGE 71

58 Figure 8. Truth for Eight Variable Model, Low Collinearity. p 75 =.30 p 74 =.30 p 87 =.29 p 85 =.31 p 86 =.28 p 64 =.28 P 42 =.30 r 12 =.41 p 32 =.30 p 41 =.30 p 51 =.30 p 31 =.30 1 2 3 4 5 6 7 8 e 3 =.86 e 5 =.86 e 4 =.86 p 52 =.30 p 63 =.31 p 65 =.28 e 6 =.78 e 6 =.74 e 7 =.88

PAGE 72

59 Figure 9. Truth for Eight Variable Model, Moderate Collinearity. p 75 =.31 p 74 =.31 p 87 =.32 p 85 =.33 p 86 =.29 p 64 =.29 P 42 =.30 r 12 =.82 p 32 =.30 p 41 =.30 p 51 =.30 p 31 =.30 1 2 3 4 5 6 7 8 e 3 =.82 e 5 =.82 e 4 =.86 p 52 =.30 p 63 =.36 p 65 =.38 e 6 =.64 e 6 =.66 e 7 =.86

PAGE 73

60 The complexity of the analysis is reflected, in large part, by the number of variables under investigation. For example, an examination of a model with eight variables involves exploring k*( k 1) 2 or 28 possible pairwise correlations among the variables. The Spielraum for each standardized path coefficient in all models w as necessarily fixed, ranging from 1.0 to +1.0. The raw tolerance interval of the theory was examined at three levels of precision: a non null condition with 0 b a directional condition employing the Spielraum, with 0o r0 b < b> and an interval prediction equal to of the Spielraum, that is .0 5 .55
PAGE 74

61 When examining moderate verisimilitude, one path was deleted, representing 1/6 or appro ximately 17% of the potential paths in the model. For the lowest level of verisimilitude, two paths were deleted, representing 1/3 or approximately 33% of the total number of potential paths in the model. Figures 10 12 provide an illustration for the six variable exclusionary model. For this six variable model, moderate verisimilitude reflects the deletion or addition of two paths, that is, approximately 13% of the potential paths. Low verisimilitude required the deletion or addition of five paths, again approximately 33% of the total number of potential paths. Further, the levels of verisimilitude for the six variable supplementary model are illustrated in Figures 13 14. For the most complex model, four paths were added or deleted to represent modera te verisimilitude (14% of the paths), and a total of nine paths were added or deleted for the lowest level of verisimilitude, reflecting approximately 32% of the potential paths. Figure 10 Six Variable Exclusionary Model, High Level of Verisimilitude. 1 2 3 4 5 6

PAGE 75

62 Figure 11. Six Variable Exclusionary Model, Mo derate Level of Verisimilitude. Figure 12. Six Variable Exclusionary Model, Low Level of Verisimilitude. 1 2 3 4 5 6 1 2 3 4 5 6

PAGE 76

63 Figure 13 Six Variable Supplementary Model, Moderate Level of Verisimilitude. Figure 14 Six Variable Supplementary Model, Low Level of Verisimilitude. 1 2 3 4 5 6 1 2 3 4 5 6

PAGE 77

64 Lastly, two factors related to the design of empirical research were included in the simulations. Sample size was examined at 3 levels (100, 200 and 500 observations) and the correlation betw een variables was examined at 2 levels The level of correlation between variables is an important consideration, and when designs involve more than two independent variables it is necessary to look beyond zero order correlations to diagnose this relation ship. In regression analyses this particular issue is referred to as collinearity. Collinearity may have devastating effects on regression statistics, manifesting in imprecise estimates of regression coefficients (Pedhazur, 1997), and thus is an importa nt consideration given the nature of this investigation. This influence is illustrated by examining the formula for the standard error of a regression coefficient for the case of two independent variables. The standard error for 1 b is given by: ( ) 1.2 2 .12 22 1 12 1 y y b s s xr = where 2 .12 y s = variance of estimate; 2 1 x =sum of squares of 1 X ; and 2 12 r =squared correlation between independent variable s 1 X and 2 X One method commonly employed in the diagnosis of collinearity, focuses on the variance of b which is the square of the formula provided as: ( ) 1.2 22 .1 2 .12 2 22 22 1 12 1 12 1 1 1 y yy b ss s xr xr ==

PAGE 78

65 In the preceding form ula, the term in the brackets is called the variance inflation factor (VIF). This component indicates the inflation of the variance of b resulting from the correlation between the two independent variables. The lower bound of the VIF is one, that is whe n 2 12 r =.00. The VIF gets larger (and variance of b more inflated) as the correlation between independent variables increases. Further, when standardized variables are used (i.e., correlations), the following equation illustrates the relationship between the regression coefficients and the correlation matrix: 1 Rr b = where b is a column vector of standardized coefficients, 1 R is the inverse of the correlation mat rix of regressors; and r is a column vector of correlations between each independent variable and the dependent variable. The inverted R matrix ( 1 R ) will contain the VIF values a long the principal diagonal. For the two variable case, this can be seen as: 21 22 1 2 12 1 21 22 1 2 12 1 11 1 11 R r rr r rr -= -There does not appear to be a single rule of thumb that has been widely accepted with respect to varying levels of VIF, that is, how big is too big? Belsley (1984) contended that the value of 10 is offered frequently, yet without meaningful foundation. This suggests that the VIF needs to be considered with respect to the factors operating within a particular study or context.

PAGE 79

66 The levels of collin earity examined in this study included a low level of collinearity, which would not tend to indicate a deleterious influence with respect to the analyses and results, and a moderate level of collinearity. The levels of VIF in this study were chosen for in vestigation based upon levels that would likely be encountered in applied research. VIF was set to 1.5 for the low collinearity condition and 3.0 for the moderate collinearity condition. More extreme values were examined but not selected for inclusion du e to the likelihood of redundancy among the variables that would not be well suited to this type of statistical analysis. Multivariate Extension of i C Recall that Meehl (1997) initially proposed an index of corroboration ( i C ) that provides a standardized means of expressing the extent to which empirical research supports or contradicts a theory: C i = (Cl)(In) where Cl = the closeness of the data to the theoretical prediction (verisimilitude or truth li keness), and In = the intolerance of the theory (e.g., a standardized precision of prediction). These terms are further explicated as follows: Cl = 1 (D/S) where D = deviation of observed data from the tolerance interval of the theory

PAGE 80

67 S = Spielraum (t he range of data values that are expected whether or not the theory is true) In = 1 (I/S) where I = the interval tolerated by the theory (e.g., the raw precision of prediction). For this study, a multivariate extension of this index was required. The mu ltivariate extension of i C investigated in the context of path analyses through the use of multiple regression analysis is defined as follows: 1 1 J j j j I Intolerance S = = 1 1 1 J J j j j D Closeness S = = where j indexes the set of re lationships being tested (i.e., I j and S j are the tolerance interval and Spielraum for path coefficient j and D j is the distance between the theoretical value and the observed value). As the proposed corroboration index has not previously been employed in a multivariate context, alternative approaches were explored in order to determine the most effective method to employ. There initially appeared to be two alternative approaches to the composite Meehlian corroboration index, considering a multivariate sit uation. The first method involves the computation of C i separately for each path coefficient and then multiplying the obtained values for an overall index:

PAGE 81

68 1 11 J jj i J jj DI C SS = =- [ C i =1] where the D i S i and I i are treated for each path independently An alternative approach that was considered involved the calculation of the product of the distances for each variable and the product of the standardized tolerances. These products would then be subtracted from the value 1. 11 11 JJ JJ i JJ JJ DI C SS == = - [ C i =2] where the D i S i and I i are treated for each variable independently. It was apparent that both of these formulae will necessarily have problems at the extreme values. For example, in the first formula, if I 1 = S 1 then C i = 0 regardless of the st atus of the other variables. In the second formula, if D 1 = 0 then CL = 0 regardless of the other variables. Discounting these extreme conditions, however, they appeared worth pursuing. Further consideration of the two approaches revealed that when using the first approach, the obtained index of corroboration was reduced, in most cases, as additional parameters were included in the specified model. Naturally, the second approach was then chosen for this investigation. The next logical step was to investi gate possible representations of the two components of the multivariate C i that is, an estimate of multivariate closeness (or verisimilitude) and intolerance (i.e., precision of prediction). Two

PAGE 82

69 methods of calculating closeness were examined. The first method, was developed based on Pythagorean thinking, and is represented by: 1 2 2 1 1 1 J j j j D CL JS = =[CL=1] The second method involved the computation of individual closeness estimates for each of the obtained path coefficients, calculation of the produ ct of these terms, and taking the J th root of the product term (with J equal to the number of path coefficients). 1 1 1 J J j j j D CL S = = [CL=2] To illuminate the difference in these two formulations of closeness let us consider a simple model with fou r variables and a tolerance interval of .5. For this example let us assume a lower limit of .25 and an upper limit of .75. For a given sample, the following path coefficients are obtained: p 21 =.30, p 31 =.04, p 42 =.10, and p 43 =.85. For each estimated path coefficient we must first calculate the deviation (d j ) from the lower or upper limit of the tolerance interval (e.g., d 4 = .85 .75 = .10). Therefore the obtained deviations would be d 1 =0, as .30 falls within the interval that ranges from .25 to .75; d 2 =.21, d 3 =.15 and d 4 =.10. Given the first formulation of closeness (Cl=1), the obtained value of multivariate closeness was estimated to be: 1 2 2 1 1 1 J j j j D CL JS = =

PAGE 83

70 1 2222 2 1 0 .2 1 .1 5 .10 1 .93 4 2 222 CL = ++ += For the second formulation of closeness (Cl=2), the obtain ed value of multivariate closeness was calculated as: 1 1 1 J J j j j D CL S = = 1 4 0 .2 1 .1 5 .10 1 11 1 .94 2 222 = ---= CL Although the obtained values of closeness for these two calculations appear very similar in magnitude, the results of a small simulation that was conducted to evaluate the utility of the two proposed methods highlighted the difference in the formulations across a broader range of conditions. The results of this investigation are illustrated in Figure 15. Examination of this figure revealed the sup erior performance of the second method. For the first method, there appeared to be a sharp decline in the resultant value of C i with small departures from truth. The more gradual, linear decline, consistent with the behavior of the univariate C i was de emed to be more representative of how this component of the corroboration index should contribute to the calculation of multivariate corroboration.

PAGE 84

71 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 9 10 Distance Closeness Value Closeness 1Parameter Closeness 2 Parameters, Formula #1 Closeness 2 Parameters, Formula #2 Closeness 3 Parameters, Formula #1 Closeness 3 Parameters, Formula #2 Closeness 4 Parameters, Formula #1 Closeness 4 Parameters, Formula #2 Figure 15 Alternative Methods for Computing Closeness, One, Two, Three, and Four Parameter Models.

PAGE 85

72 Following the examination of these two approaches to closeness, alternative strategies for calculating intolerance were explored. The formula initially considered was simply the product of the standardized intolerances for the individual path coefficients: 1 1 J J J J I In S = = [In=1] This approach was not deemed profitable because the intolerance was found to get smaller as more variables were added to the model. For example, with a four variable model and a toleranc e interval of 1, the application of this formula would yield an intolerance = .065, as calculated by 1111 111 1 .0625 2222 ---= An alternative approach was to obtain the product of the tolerances for all of the parameters and then subtract that value fr om one. 1 1 J j j j I In S = = [In=2] If we apply the same example to the second formulation of multivariate intolerance, we see that we obtain a very different answer. Applying the second approach to intolerance yields an intolerance = .9375, calcu lated as 1111 1 .9375 2222 -= Both methods of calculating intolerance are illustrated in Figure 16.

PAGE 86

73 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 Tolerance Interval Intolerance 1 Parameter Intolerance Two Parameters Intolerance Formula #1 Two Parameters Intolerance Formula #2 Figure 16 Alternative Methods for Computing Intolerance, One and Two Parameter Models.

PAGE 87

74 However, a preliminary anal ysis of the application of the chosen formulate for intolerance revealed that this statistic approached its upper limit of 1.0 much too quickly as the number of paths evaluated increased. Therefore, an approach to tuning the level of intolerance was inv estigated. In this approach, a root operation was performed on the product term: 1 ( 1) 1 1 J JX J j j j I In S -= = [In=3] where X = some constant between 0 and 1. If we consider the limits of such an exponent, if X = 0 then the exponent will reduce to 1/ J o r the j th root. This most extreme case will not allow the multivariate intolerance to increase as the number of parameters increases. At the other extreme ( X = 1), the exponent will reduce to 1.0 regardless of the number of parameters, which is the multiva riate intolerance formula that was originally proposed. To help guide the selection of an appropriate level of adjustment, another small simulation study was conducted. These results are illustrated in Figure 17 for an intolerance level = .50 (i.e., dire ctional prediction). Examination of this figure reveals the incremental influence of various tuning adjustments to multivariate intolerance as parameters are added to a model. The calculation of intolerance, and hence mean Ci, was then submitted to a ser ies of tuning adjustments to explore the influence of tuning on the multivariate index. Variability across three of the levels examined appeared relatively insignificant. A small sample of these values is provided in Table 2. To avoid the appearance of e ither overly downward or upward bias, it was decided

PAGE 88

75 that a tuning factor of .50 would provide the appropriate correction to the index of multivariate intolerance. Table 2 Obtained Value of Mean Ci for Three Levels of Tuning Multivariate Intolerance by Le vel of Verisimilitude, Precision of Prediction. Six Variable Model, Low Collinearity, Sample Size = 100 Tune Level Precision of Prediction Verisimilitude .4 .5 .6 Non null High .08 .09 .10 Moderate (MVD) .08 .09 .10 Low (LVD) .07 .08 .09 Moderat e (MVA) .08 .09 .11 Low (LVA) .08 .09 .11 Directional High .66 .71 .77 Moderate (MVD) .65 .70 .76 Low (LVD) .63 67 .72 Moderate (MVA) .66 .72 .78 Low (LVA) .66 .72 .79 Interval High .88 .92 .95 Moderate (MVD) .88 .91 .94 Low (LVD) .86 .89 92 Moderate (MVA) .88 .92 .95 Low (LVA) .88 .91 .94

PAGE 89

76 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1 3 5 7 9 11 13 15 17 19 21 Number of Parameters Multivariate Intolerance Tune = .1 Tune = .2 Tune = .3 Tune = .4 Tune = .5 Tune = .6 Tune = .7 Tune = .8 Tune = .9 Tune = 1 Figure 17. Tuning Multivariate Intolerance, Intolerance = .50.

PAGE 90

77 Conduct of the Monte Carlo Study This research was conducted using SAS/IML version 8.2. Conditions for the study were run under Windows 2000. For this study, six population correlation matrices were constructed based on a specified number of variables, level of collinearity, and true path model. The true population correlation matrices are exhibited i n Tables 3 8. Table 3 Population Correlation Matrix, 4 Variable Model, (VIF @ 1.5) X1 X2 X3 X4 X1 1.00 X2 .75 1.00 X3 .75 .56 1.00 X4 .45 .47 .47 1.00 Table 4 Population Correlation Matrix, 4 Variable Model, (V IF @ 3.0) X1 X2 X3 X4 X1 1.00 X2 .91 1.00 X3 .91 .82 1.00 X4 .54 .55 .55 1.00

PAGE 91

78 Table 5 Population Correlation Matrix, 6 Variable Model, (VIF @ 1.5) X1 X2 X3 X4 X5 X6 X1 1.00 X2 .58 1.0 0 X3 .58 .58 1.00 X4 .47 .36 .50 1.00 X5 .36 .47 .50 .27 1.00 X6 .43 .43 .61 .54 .54 1.00 Table 6 Population Correlation Matrix, 6 Variable Model, (VIF @ 3) X1 X2 X3 X4 X5 X6 X1 1.00 X2 .82 1.00 X3 .5 4 .54 1.00 X4 .54 .48 .54 1.00 X5 .48 .54 .54 .36 1.00 X6 .49 .49 .65 .59 .59 1.00

PAGE 92

79 Table 7 Population Correlation Matrix, 8 Variable Model, (VIF @ 1.5) X1 X2 X3 X4 X5 X6 X7 X8 X1 1.00 X2 .41 1.00 X3 .42 .42 1.00 X4 .42 .42 .25 1.00 X5 .42 .42 .25 .25 1.00 X6 .37 .37 .46 .43 .43 1.00 X7 .25 .25 .15 .38 .38 .26 1.00 X8 .31 .31 .26 .32 .54 .49 .48 1.00 Table 8 Population Correlation Matrix, 8 Variable Model, (VIF @ 3.0) X1 X2 X3 X4 X5 X6 X7 X8 X1 1.00 X2 .82 1.00 X3 .54 .54 1.00 X4 .54 .54 .33 1.00 X5 .54 .54 .33 .33 1.00 X6 .56 .56 .58 .53 .59 1.00 X7 .34 .34 .21 .42 .42 .35 1.00 X8 .45 .45 .34 .39 .64 .60 .56 1.00

PAGE 93

80 Data Generation Strategy Multivariate normal samples were generated from each population R matrix and a sample correlation matrix was computed for each sample. Each of the sample correlation matrices was then analyzed using a series of regression equations. These regression equations were determined by the desired level of verisimilitude that was being examined (i.e., the path model implied by the theory). It was the theoretical model that determined the appropriate regression equations to employ. The serie s of regression equations were applied to each sample and the resulting parameter estimates were used in the calculation of the closeness component of the corroboration index. In the final computation of the corroboration indices, the size of the toleranc e interval was manipulated. The program code was verified by hand checking results from benchmark datasets. The data resulting from each path analysis were pooled and the average value of C i was evaluated in the context of the central design factors. The method for data simulation is illustrated in Figure 18. For each population matrix, 10,000 samples were generated. The use of 10,000 samples provided adequate precision of estimates of the sampling behavior of the corroboration index For example, 10 ,000 samples provide a maximum 95% confidence interval width around an observed proportion that is .0098 (Robey & Barcikowski, 1992)

PAGE 94

81 Construct Population Correlation Matrix for Number of Variables ( k = 4, 6, 8) and level of VIF ( VIF = 1.5, 3.0) Compute Mean and Variance of Sample Va lue of C i Across 10,000 Samples Generate Multivariate Normal Sample from Population ( N = 100, 200, 5 00) Estimate Path Coefficients for Theoretical Model (Low, Moderate, and High Verisimilitude) Compute Intolerance and Closeness for Each Model ( I = Low, Moderate, High) Compute Sample Value of C i for Each Model Repeat for 10,000 Samples Figure 18. Data Generation Strategy.

PAGE 95

82 Data Analysis To guide the interpretation of the simula tion results, i C was treated as a dependent variable and a factorial ANOVA was conducted. The independent variables for this ANOVA were the five Monte Carlo design factors: (a) the number of variables in the model (i.e., model comp lexity), (b) level of intolerance (c) level of verisimilitude, (d) sample size, and (e) level of collinearity. In addition to the main effects, the interactions of these factors were examined. For each of these analyses, an effect size estimate, omega sq uared ( 2 w ), was used to estimate the proportion of variance accounted for in the population by each effect (Maxwell & Delaney, 1990). For the calculation of this effect, within cell variability was provided by the variance of the 10,000 replications of each condition. The estimate of 2 w is given by 2 ( ) () effec t effec t wg T wg S S d f MS S S MS w = + The results of this research are presented in tables and graphs that address each of the research questions and hypotheses through an illustration of the relationship between the central design facets and the resultant mean C i and standard deviation of C i Further, a series of tables and graphs are employed to illustrate the extent to which the relationship between mean C i and level of intolerance was influenced by verisimilitude, collinearity and sample size. When deemed appropriate, supplementary analyses and results are presented to further explain some of the more unexpected relationships evidenced in the data.

PAGE 96

83 Chapter Four Resu lts Organization The purpose of this chapter is to present the results of this study. As the results are organized with respect to the research questions and hypotheses, the chapter opens with a restatement of the proposed research questions and hypothes es. Within each section, after each of the primary research questions and hypotheses have been addressed, supplementary analyses and results are examined in order to further elucidate some of the more subtle relationships evidenced in the data. At the end of the chapter key findings are underscored and summarized. Four Research Questions and Three Research Hypotheses Research Questions 1. What is the relationship between mean C i and the main effects examined in the study (i.e., verisimilitude, intoleran ce, model complexity, collinearity, and sample size)? 2. What is the relationship between the standard deviation of C i and model complexity, collinearity, and sample size? 3. To what extent is the relationship between mean C i and the precision of predict ion (i.e., intolerance) influenced by the complexity of the model (i.e., the number of variables in the model)?

PAGE 97

84 4. To what extent is the relationship between mean C i and the precision of prediction (i.e., intolerance) influenced by the level of collineari ty? Research Hypotheses 1. The relationship between mean C i and the precision of prediction (i.e., intolerance will be slightly influenced by the closeness of the data to the theory (verisimilitude). 2. The relationship between mean C i and the precisio n of prediction (i.e., intolerance) will not be substantively influenced by sample size. 3. The relationship between mean C i and precision of prediction will be substantively stronger than the relationship between mean C i and verisimilitude, model complex ity, collinearity, and sample size. Relationship Between Mean C i and the Central Design Factors As stated earlier, in order to guide the interpretation of the simulation results, the average i C was treated as a dependent variabl e and a factorial ANOVA was conducted. The independent variables for this ANOVA were the five Monte Carlo design factors: (a) the number of variables in the model (i.e., model complexity), (b) level of intolerance (c) level of verisimilitude, (d) sample s ize, and (e) level of collinearity. In addition to these main effects, the interactions of these factors were also examined. For each of these analyses, an effect size estimate, omega squared ( 2 w ), was used to estimate the propor tion of variance accounted for in the population by each effect. The results of these

PAGE 98

85 analyses are presented in Table 9. An examination of the obtained effect sizes revealed that only a single effect evidenced considerable influence on the average C i A s anticipated, the level of intolerance was the most salient influence on average C i with an estimated 2 w =.55. The remaining main effects and interaction effects exercised negligible influence on mean C i The residual mean squar e value presented along with the obtained values of omega squared represents the average variability within each condition (or cell) under examination. Probing Deeper: The Influence of Verisimilitude, Model Complexity, Collinearity, and Sample Size aft er Controlling for Intolerance In light of the very strong influence of this single design factor, it appeared fruitful to examine the other main effects and interaction effects after controlling for the level of intolerance. Therefore, three additional analyses were conducted. Again, mean C i was treated as the dependent variable and three separate ANOVAs, one for each level of intolerance, were conducted with the remaining four design factors (i.e., the number of variables in the model, verisimilitude, sample size and level of collinearity) treated as independent variables. Consistent with the initial analysis, the interactions of these factors were also examined. The results of this set of analyses are presented in Tables 10 12. The resultant values of 2 w suggest that the number of variables in the model ( 2 w =.26) and level of verisimilitude ( 2 w =.21) were somewhat influential, but only for lowest level of precision (i.e., non n ull predictions). These analyses also revealed the lack of influence of any of the other central design factors examined.

PAGE 99

86 Table 9 Estimated DF, SS, and Omega Squared by Design Factors Effect DF SS 2 w N of Variables (k) 2 0.07 <.0 1 Verisimilitude (V) 4 0.05 <.01 k*V 8 0.05 <.01 Collinearity (C) 1 <.01 <.01 k*C 2 <.01 <.01 V*C 4 <.01 <.01 k*V*C 8 <.01 <.01 N of Observations (N) 2 <.01 <.01 k*N 4 <.01 <.01 V*N 8 <.01 <.01 k*V*N 16 <.01 <.01 C*N 2 <.01 <.01 k*C*N 4 <.01 <. 01 V*C*N 8 <.01 <.01 k*V*C*N 16 <.01 <.01 Intolerance (I) 2 32.40 0.55 k*I 4 0.02 <.01 V*I 8 0.01 <.01 k*V*I 16 <.01 <.01 C*I 2 <.01 <.01 k*C*I 4 <.01 <.01 V*C*I 8 <.01 <.01 k*V*C*I 16 <.01 <.01 k*I 4 <.01 <.01 k*N*I 8 <.01 <.01 V*N*I 16 <.01 <.01 k*V*N*I 32 <.01 <.01 C*N*I 4 <.01 <.01 k*C*N*I 8 <.01 <.01 V*C*N*I 16 <.01 <.01 k*V*C*N*I 32 <.01 <.01 Residual MS 26 2699729 <.01

PAGE 100

87 Table 10 Estimated DF, SS, and Omega Squared, Intolerance = Non Null Prediction Effect DF SS 2 w N of Variables (k) 2 <.01 0.26 Verisimilitude (V) 4 <.01 0.21 k*V 8 <.01 0.01 Collinearity (C) 1 <.01 <.01 k*C 2 <.01 <.01 V*C 4 <.01 <.01 k*V*C 8 <.01 <.01 N of Observations (N) 2 <.01 <.01 k*N 4 <.01 <.01 V*N 8 <.01 <.01 k*V*N 16 <.0 1 <.01 C*N 2 <.01 <.01 k*C*N 4 <.01 <.01 V*C*N 8 <.01 <.01 k*V*C*N 16 <.01 <.01 Residual MS <.01 89909 <.01

PAGE 101

88 Table 11 Estimated DF, SS, and Omega Squared, Intolerance = Directional Prediction Effect DF SS 2 w N of Variable s (k) 2 0.06 0.01 Verisimilitude (V) 4 0.04 0.01 k*V 8 <.01 <.01 Collinearity (C) 1 <.01 <.01 k*C 2 <.01 <.01 V*C 4 <.01 <.01 k*V*C 8 <.01 <.01 N of Observations (N) 2 <.01 <.01 k*N 4 <.01 <.01 V*N 8 <.01 <.01 k*V*N 16 <.01 <.01 C*N 2 <.01 <.01 k*C*N 4 <.01 <.01 V*C*N 8 <.01 <.01 k*V*C*N 16 <.01 <.01 Residual MS 6 899909 <.01

PAGE 102

89 Table 12 Estimated DF, SS, and Omega Squared, Intolerance = Interval Prediction DF SS 2 w Effect N of Variables (k) 2 0.03 <.01 Verisim ilitude (V) 4 0.02 <.01 k*V 8 <.01 <.01 Collinearity (C) 1 <.01 <.01 k*C 2 <.01 <.01 V*C 4 <.01 <.01 k*V*C 8 <.01 <.01 N of Observations (N) 2 <.01 <.01 k*N 4 <.01 <.01 V*N 8 <.01 <.01 k*V*N 16 <.01 <.01 C*N 2 <.01 <.01 k*C*N 4 <.01 <.01 V*C*N 8 <.01 <.01 k*V*C*N 16 <.01 <.01 Residual MS 19.98 899909 <.01 Estimates of Mean Ci To facilitate the interpretation of the results in this section, and to gain a better understanding of the nature of the models under investigation, the level of model complexity, model misspecification and the number of estimated paths under examination are displayed in Table 13. In this table and in all of the tables and figures that follow, HV represents conditions with high verisimilitude ; MVD represents models with moderate verisimilitude with model misspecification occurring as paths are deleted; and LVD represents models with low

PAGE 103

90 verisimilitude resulting from additional paths being deleted. MVA represents a moderate level of verisimilitude that resulted when one or more paths are added to the model, while LVA represents, l ow verisimilitude occurring when further paths are added. To further aid in the interpretation of the results it may be useful to reconsider the relationship between the raw tolerance interva l of the theory and the level of intolerance examined in this study. Recall that the raw tolerance interval of the theory was examined at three levels of precision: a non null condition with 0 b a directional condition employing the Spielraum, with 0o r0 b < b> and an interval prediction equal to of the Spielraum, that is .0 5 .55
PAGE 104

91 four paths are estimated. Similarly, both the moderate complexity model low verisimilitude auxiliary model (LVA) and the high complexity high verisimilitude model (HV) both include 14 estimated path s. For illustrative purposes, diagrams representing each level of model complexity and misspecification are provided in Figures 19 30.

PAGE 105

92 Figure 19 Four Variable Exclusionary Model, Moderate Level of Verisimilitude (MVD). Figure 20 Four Variable Exclusionary Model, Low Level of Verisimilitude (LVD). 1 2 4 3 1 2 4 3

PAGE 106

93 Figure 21 Four Variable Supplementary Model, Moderate Level of Verisimilitude (MVA). Figure 22 Four Variable Supplementary Mod el, Low Level of Verisimilitude (LVA). 1 2 4 3 1 2 4 3

PAGE 107

94 Figure 23. Six Variable Exclusionary Model, Moderate Level of Verisimilitude (MVD). Figure 24. Six Variable Exclusionary Model, Low Level of V erisimilitude (LVD). 1 2 3 4 5 6 1 2 3 4 5 6

PAGE 108

95 Figure 25 Six Variable Supplementary Model, Moderate Level of Verisimilitude (MVA). Figure 26 Six Variable Supplementary Model, Low Level of Verisimilitude (LVA). 1 2 3 4 5 6 1 2 3 4 5 6

PAGE 109

96 Figu re 27. Eight Variable Exclusionary Model, Moderate Level of Verisimilitude (MVD). 1 2 3 4 5 6 7 8

PAGE 110

97 Figure 28 Eight Variable Exclusionary Model, Low Level of Verisimilitude (LVD). 1 2 3 4 5 6 7 8

PAGE 111

98 Figure 29 Eight Variable Supplementary Model, Moderate Level of Verisimilitude (MVA). 1 2 3 4 5 6 7 8

PAGE 112

99 Figure 30 8 Variable Supplementary Model, Low Level of Verisimilitude (LVA). 1 2 3 5 6 7 8 4

PAGE 113

100 Mean Ci by Precision of Prediction and Level of Verisimilitude The obtained values of mean C i by level of verisimilitude, collinearity, sample size and precision of prediction are presented in Tables 14 16. As trends across this number of conditions may be challenging to visualize, a series of box and whisker plots is also provided (see Figures 31 33). As these figures clearly illustrate, there is a very strong relationship between the magnitude of mean C i and the level of intolerance. These figures also reveal a general lack of variability across the various levels of verisimilitude for both the non null and interval predictions. For example, for the weakest predictions (i.e., non null in nature), mean C i was estimated to range from .07 to .09, and for the most precise predictions, mean C i ranged fro m .84 to .92. However, for the directional prediction, mean C i was observed to range from .60 to .73. For both directional and interval predictions the variability in mean C i was more pronounced for the low verisimilitude, exclusionary models (LVD) than for any of the other models examined. Viewed from a slightly different perspective, the considerable influence of intolerance on mean C i is also evident if we examine this relationship across the number of estimated paths in the various models under consid eration (see Figure 34). In this illustration, we can readily observe that as the number of estimated paths increases there is very little variability within each level of intolerance, yet with increasingly precise predictions, the magnitude of mean C i ri ses dramatically.

PAGE 114

101 Table 14 Mean C i by Intolerance, Verisimilitude, Collinearity, and Sample Size Model Complexity = Low Verisimilitude Collinearity Sample Size Mean C i Non Null Directional Interval HV Low 100 0.08 0.67 0.89 MVD 0.07 0.65 0.87 LVD 0.07 0.60 0.84 MVA 0.08 0.68 0.89 LVA 0.08 0.69 0.89 HV 200 0.08 0.67 0.89 MVD 0.07 0.65 0.87 LVD 0.07 0.60 0.84 MVA 0.08 0.68 0.89 LVA 0.08 0.69 0.90 HV 500 0.08 0.67 0.89 MVD 0.07 0.65 0.88 LVD 0.07 0.60 0.84 MVA 0.08 0. 68 0.90 LVA 0.08 0.69 0.90 HV Moderate 100 0.08 0.67 0.89 MVD 0.07 0.65 0.87 LVD 0.07 0.60 0.84 MVA 0.08 0.68 0.89 LVA 0.08 0.68 0.88 HV 200 0.08 0.67 0.89 MVD 0.07 0.65 0.87 LVD 0.07 0.60 0.84 MVA 0.08 0.68 0.89 LVA 0.08 0.68 0.88 HV 500 0.08 0.67 0.89 MVD 0.07 0.65 0.87 LVD 0.07 0.60 0.84 MVA 0.08 0.68 0.89 LVA 0.08 0.68 0.89

PAGE 115

102 Table 15 Mean C i by Intolerance, Verisimilitude, Collinearity, and Sample Size Model Complexity = Moderate Verisimilitude Collinearity Sample Size Mean C i Non Null Directional Interval HV Low 100 0.09 0.71 0.92 MVD 0.09 0.70 0.91 LVD 0.08 0.67 0.89 MVA 0.09 0.72 0.92 LVA 0.09 0.72 0.91 HV 200 0.09 0.71 0.92 MVD 0.09 0.70 0.91 LVD 0.08 0.67 0.89 MVA 0.09 0.72 0 .92 LVA 0.09 0.72 0.91 HV 500 0.09 0.71 0.92 MVD 0.09 0.70 0.91 LVD 0.08 0.67 0.89 MVA 0.09 0.72 0.92 LVA 0.09 0.72 0.92 HV Moderate 100 0.09 0.71 0.92 MVD 0.09 0.70 0.91 LVD 0.08 0.67 0.87 MVA 0.09 0.72 0.91 LVA 0.09 0.72 0.9 1 HV 200 0.09 0.71 0.92 MVD 0.09 0.70 0.91 LVD 0.08 0.67 0.87 MVA 0.09 0.72 0.92 LVA 0.09 0.72 0.91 HV 500 0.09 0.71 0.92 MVD 0.09 0.70 0.91 LVD 0.08 0.67 0.87 MVA 0.09 0.72 0.92 LVA 0.09 0.72 0.92

PAGE 116

103 Table 16 Mean C i by Intole rance, Verisimilitude, Collinearity, and Sample Size Model Complexity = High Verisimilitude Collinearity Sample Size Mean C i Non Null Directional Interval HV Low 100 0.09 0.73 0.92 MVD 0.09 0.72 0.92 LVD 0.08 0.69 0.90 MVA 0.09 0.73 0.92 LVA 0.09 0.73 0.92 HV 200 0.09 0.73 0.92 MVD 0.09 0.72 0.92 LVD 0.08 0.69 0.90 MVA 0.09 0.73 0.92 LVA 0.09 0.73 0.92 HV 500 0.09 0.73 0.92 MVD 0.09 0.72 0.92 LVD 0.08 0.69 0.90 MVA 0.09 0.73 0.92 LVA 0.09 0.73 0.92 HV Moderate 1 00 0.09 0.73 0.92 MVD 0.09 0.72 0.92 LVD 0.08 0.69 0.90 MVA 0.09 0.73 0.92 LVA 0.09 0.73 0.91 HV 200 0.09 0.73 0.92 MVD 0.09 0.72 0.92 LVD 0.08 0.69 0.90 MVA 0.09 0.73 0.92 LVA 0.09 0.73 0.92 HV 500 0.09 0.73 0.92 MVD 0.09 0. 72 0.92 LVD 0.08 0.69 0.90 MVA 0.09 0.73 0.92 LVA 0.09 0.73 0.92

PAGE 117

104 Figure 31 Mean C i by Level of Verisimilitude, Non Null Prediction. HV MVD LVD MVA LVA Verisimilitude 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Mean Ci Figure 32. Mean C i by Level of Verisimilitude, Directional Prediction. HV MVD LVD MVA LVA Verisimilitude 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Mean Ci

PAGE 118

105 Figure 33. Mean C i by Level of Verisimilitude, Interval Prediction. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 Number of Parameters Mean Ci Non-Null Directional Interval Figure 34. Mean C i by Level of Intolerance and Number of Estimated Paths. HV MVD LVD MVA LVA Verisimilitude 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Mean Ci

PAGE 119

106 Relationship between the Standard Deviation of C i and Model Complexity, Collinearity, and Sample Siz e An initial examination of the magnitude of the standard deviation of C i and the central design factors revealed very little variability across conditions. These values are organized by the central design factors and are presented in Tables 17 19. Figu re 35 provides a box and whisker plot of the distribution of the standard deviation of C i across all 270 conditions examined. The standard deviation of mean C i was estimated to range from <.01 to 0.02. In only two conditions, did the estimated standard d eviation of C i obtain a magnitude greater than .01, with both of these conditions occurring in the low complexity models under the most severe condition of model misspecification (i. e., low verisimilitude). This striking lack of variability suggests th at no practically significant relationship exists between the central design factors and the standard deviation of C i

PAGE 120

107 Table 17 Standard Deviation of C i by Intolerance, Verisimilitude, Collinearity, and Sample Size Model Complexity = Low Verisimilitude Collinearity Sample Size Standard Deviation of C i Non Null Directional Interval HV Low 100 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 0.01 LVA <.01 0.01 0.01 HV 200 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.0 1 <.01 <.01 MVA <.01 <.01 0.01 LVA <.01 0.01 0.01 HV 500 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 0.01 HV Moderate 100 <.01 <.01 <.01 MVD <.01 <.01 0.01 LVD <.01 <.01 0.01 MVA <.01 <.01 0.01 LVA <.01 0.01 0.02 HV 200 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 0.01 MVA <.01 <.01 0.01 LVA <.01 0.01 0.02 HV 500 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 0.01 0.01

PAGE 121

108 Table 18 Standard Deviation of C i by Intolerance, Verisimilitude, Collinearity, and Sample Size, Model Complexity = Moderate Verisimilitude Collinearity Sample Size Standard Deviation of C i Non Null Directional Interval HV Low 100 <.01 <.01 <.01 M VD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 0.01 HV 200 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 <.01 HV 500 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 <.01 HV Moderate 100 <.01 0.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 <.01 HV 200 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <. 01 <.01 LVA <.01 <.01 <.01 HV 500 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 <.01

PAGE 122

109 Table 19 Standard Deviation of C i by Intolerance, Verisimilitude, Collinearity, and Sample Size, Model Comple xity = High Verisimilitude Collinearity Sample Size Standard Deviation of C i Non Null Directional Interval HV Low 100 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 <.01 HV 200 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 <.01 HV 500 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 <.01 HV Moderate 100 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD < .01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 <.01 HV 200 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 <.01 LVA <.01 <.01 <.01 HV 500 <.01 <.01 <.01 MVD <.01 <.01 <.01 LVD <.01 <.01 <.01 MVA <.01 <.01 < .01 LVA <.01 <.01 <.01

PAGE 123

110 Figure 35 Box and Whisker Plot of Estimated Standard Deviations. Probing Deeper: An Examination of the Variability in Path Coefficients To investigate the lack of variability evidence by mean C i the variability of the average expected path coefficients, was examined by calculating the standard error. These analyses were conducted for each of the true models, for each level of model complexity and collinearity. The results are displayed as a stem and leaf plot in Fi gure 36. An examination of the distribution of resultant standard errors revealed a moderate degree of variability, however it was observed that more than half of the standard errors were estimated to be less than .10. Further examination of these result s revealed that the more complex models evidenced more sampling error than the moderate and simple models. Standard errors of this magnitude, across so many of the conditions examined, led many of the estimated path coefficients to fall within the specifie d tolerance 0.00 0.01 0.02 0.03 0.04 0.05 Standard Deviation of Ci

PAGE 124

111 interval. The lack of deviation from the tolerance interval resulted in multivariate closeness estimates that approached 1.0 across most of the conditions examined. Given an invariant intolerance estimate for each condition, the within cell v ariability of C i was entirely dependent on the estimate of closeness and failed to vary appreciably across samples. Stem Leaf .36 00 .34 .32 .30 .28 .26 00 .24 00000 .22 .20 000000 .18 000000 .16 000000 .14 0000000000000000 .12 000000 10 000000000000000000000000000 .08 000000000000000000 .06 00000000000000000000000000000 .04 000000000000000000000000 .02 000000000000000 Figure 36 Stem and Leaf Plot of Standard Errors of Regression Coefficients.

PAGE 125

112 Relationship between Mean C i Prec ision of Prediction and Verisimilitude To examine the relationship between mean C i the precision of prediction, and verisimilitude, it was first necessary to collapse the data across the other design factors (i.e., model complexity, sample size, and colli nearity) and compute marginal values of mean C i The results of this analysis are presented in Figure 37. Examination of this figure once again reveals the profound influence of the precision of prediction, and the negligible influence of verisimilitude or truth likeness. For each level of verisimilitude we see a dramatic increase in mean C i as the level of precision increases. For example, with high verisimilitude (HV), the mean C i is only approximately .09 for the non null prediction, yet reaches .9 1 for the interval prediction. However, if we look across the various level of verisimilitude the obtained values of mean C i vary very little. For the non null conditions, mean C i was observed to range from .08 to .09, whereas for the directional conditi on, mean C i ranges from .65 (LVD) to .71 ( for both MVA and LVA). For the most precise predictions, mean C i ranged from .87 (LVD) to .91 (HV, MVA, and LVA).

PAGE 126

113 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 HV MVD LVD MVA LVA Verisimilitude Mean Ci Non Null Directional Interval Figure 37 Mean C i by Level of Intolerance and Verisimilitude. R elationship between Mean C i Precision of Prediction and Model Complexity To examine the relationship between mean C i precision of prediction, and model complexity, it was again necessary to compute marginal values of mean C i by collapsing across the othe r central design factors (i.e., verisimilitude, sample size and collinearity). The results of this analysis are displayed in Figure 38. Once again the overwhelming influence of precision of prediction is depicted, while model complexity appears to exert but a slight influence on the magnitude of mean C i

PAGE 127

1 14 For example, for the simplest model containing four variables, mean C i was estimated to be .08 for the weakest prediction, .66 for the directional prediction and .88 for the interval prediction. However within each level of intolerance, the mean C i only evidenced a slight fluctuation with increased model complexity. A modest increase in mean C i was evidenced with the set of directional predictions, with mean C i ranging from approximately .66 for the si mplest model to approximately .72 for the most complex model. 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Simple Moderate Complex Model Complexity Mean Ci Non Null Directional Interval Figure 38. Mean C i by Level of Intolerance and Model Complexity. In order to determine if verisimilitude might be a potential moderating variable, an examination of the relationship between mean C i, precision of prediction and truth likeness was examined for each level of model complexity. The relationships among these central design factors are illustrated in Figures

PAGE 128

115 39 41. As evidenced in these figures, level of intolerance continued to exert a considerable influence on mean C i however, there was a slight increase in mean C i with an increase in model complexity. For example, for the LVD model, mean C i increases from .84 for the low complexity model to .90 for the high complexity model. Still, the more dramatic increases were observed for this model across level of intolerance, as mean C i increased from .07 for the non null prediction to .84 for the interval prediction. 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 HV MVD LVD MVA LVA Verisimilitude Mean Ci Non Null Directional Interval Figure 39 Mean C i by Level of Intolerance and Verisimilitude, Model Complexity = Low.

PAGE 129

116 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 HV MVD LVD MVA LVA Verisimilitude Mean Ci Non Null Directional Interval Figure 40. Mean C i by Level of Intolerance and Verisimilitude, Model Complexity = Moderate. 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 HV MVD LVD MVA LVA Verisimilitude Mean Ci Non Null Directional Interval Figure 41 Mean C i by Level of Intolerance and Verisimilitude, Model Complexity = High.

PAGE 130

117 Probing Deeper: An Examination of Bias Evidenced in the Expected Path Coefficients As the relationship between mean C i and the precision of prediction was expected to be moderate d by the level of verisimilitude, a series of additional analyses was conducted in an attempt to discern why this relationship was not evidenced in the data. Essentially, an examination of this nature can be considered to be synonymous with assessing the level of bias in the obtained standardized path coefficients. That is, the deviation of the expected sample path coefficients from the population parameters. As closeness is the element of the multivariate corroboration index that captures these deviatio ns in the data, it seemed appropriate to compare the average expected values of multivariate closeness to the estimates obtained from the population. These results were examined by level of model complexity, verisimilitude, collinearity and level of intol erance and are provided in Table 20. As these results suggest, there is very little deviation from truth in these data, resulting in multivariate closeness estimates of 1.00 for 80% of the conditions examined. Minor deviations from 1.00 most frequently occurred when making interval predictions, and were relatively consistent across level of model complexity. With a negligible amount of bias, and closeness approximating 1.0, the resultant component of verisimilitude failed to emerge as a salient factor across most of the conditions examined.

PAGE 131

118 Table 20 Expected Multivariate Closeness by Verisimilitude, Intolerance, Model Complexity and Collinearity Level of Collinearity Low Moderate Model Complexity Verisimilitude Non null Directional Interval Non null Directional Interval 4 HV 1.00 1.00 1.00 1.00 1.00 1.00 MVD 1.00 1.00 1.00 1.00 1.00 1.00 LVD 1.00 1.00 1.00 1.00 1.00 1.00 MVA 1.00 1.00 0.99 1.00 1.00 0.99 LVA 0.99 1.00 0.99 1.00 0.99 0.98 6 HV 1.00 1.00 1.00 1.00 1.00 1.00 MVD 1.00 1.0 0 1.00 1.00 1.00 1.00 LVD 1.00 1.00 1.00 1.00 1.00 0.98 MVA 1.00 1.00 1.00 1.00 1.00 1.00 LVA 0.99 1.00 0.99 0.99 1.00 0.99 8 HV 1.00 1.00 1.00 1.00 1.00 1.00 MVD 1.00 1.00 1.00 1.00 1.00 1.00 LVD 1.00 1.00 1.00 1.00 1.00 1.00 MVA 0.99 1.00 0. 99 1.00 1.00 0.99 LVA 0.99 1.00 0.99 0.99 1.00 0.99

PAGE 132

119 Relationship between Mean C i, Precision of Prediction, Collinearity and Sample Size The relationship between mean C i precision of prediction, collinearity and sample size is illustrated in Figur e 42. An examination of this figure reveals the striking lack of variability across the three samples sizes and two levels of collinearity examined in this study. Once again, mean C i evidence a dramatic increase as the intolerance level of the theory inc reased. For each of the non null predictions, mean C i was estimated to be approximately .09, evidencing no variability across the various levels of verisimilitude, model misspecification and sample size, regardless of whether the degree of verisimilitude was a function of adding paths or deleting paths from these models. In the case of directional predictions, the average C i value was estimated to be approximately .70 for each level of verisimilitude or truth likeness and sample size. Similarly, for th e most precise, or interval prediction, the average C i did not evidence any substantial degree of variability across level of collinearity or sample sizes. Of course, rapid acceleration of the average C i was evident as predictions became increasingly more precise or risky. While non null predictions resulted in an estimated average C i of only .09, precise predictions were rewarded with an average C i of approximately .90, regardless of the level of collinearity or size of the sample.

PAGE 133

120 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Non Null Directional Interval Level of Intolerance Mean Ci Low, N=100 Low, N=200 Low, N=500 Moderate, N=100 Moderate, N=200 Moderate, N=500 Figure 42 Mean Ci by Level of Intolera nce, Collinearity, and Sample Size.

PAGE 134

121 Summary An examination of the relationship between the estimated mean C i and the central design factors revealed the overwhelming influence of the precision of prediction. In all cases, the level of intolerance was observed to dramatically influ ence the magnitude of the estimated mean C i regardless of the design factor under consideration. This conclusion is consistent with the findings of the past empirical research when the behavior of the corroboration index was examined in both univariate a nd bivariate contexts. Factors that were anticipated to moderate the relationship between mean C i and the precision of prediction, such as verisimilitude, model complexity and collinearity failed to exert any substantive influence. These unexpected resul ts led to the necessity to probe deeper into the data in an attempt to better understand these disconcerting results. These additional analyses revealed the negligible influence of sampling error and an inherent lack of statistical bias in the models unde r investigation.

PAGE 135

122 Chapter Five Conclusions, Implications, and Recommendations Organization The purpose of this chapter is to provide a set of sound conclusions that are firmly grounded in the results of this study, the findings of past empirical research and the body of literature that coheres around the central theme of theory testing in the social sciences. Readers are reminded about the controversy surrounding theory appraisal in general and reacquainted with the research problem and the primary purpo se of the study. A brief synopsis of the method is also provided. The results are then discussed with respect to each research question and hypothesis. Following this recapitulation of the major findings of the study, important implications for practic e and theory are advanced. The chapter concludes with recommendations for future research. There is little argument that the elucidation and empirical testing of theories are important components of research in any field. Yet despite the long history of science, the extent to which theories are supported or contradicted by the results of empirical research remains ill defined. Quite commonly, support or contradiction is based solely on the reject or fail to reject decisions that result from tests of null hypotheses that are derived from aspects of theory. Decisions and recommendations based on this forced and often artificial dichotomy have been scrutinized in the past. In recent years, such an overly simplified approach to theory testing has been c hallenged on logical grounds (Meehl, 1997, 1990, 1978; Serlin & Lapsley, 1985). Theories differ in the extent to which they provide

PAGE 136

123 precise predictions about observations. The precision of predictions derived from theories is proportional to the strength of support that may be provided by empirical evidence congruent with the prediction. However, the notion of precision linked to strength of support is surprisingly absent from many discussions regarding the appraisal of theories. Statement of the Proble m Meehl (1997,1990a) has presented a logically sound index of corroboration to summarize the extent to which empirical tests of theories provide support or contradiction of those theories. However, the numerical properties of this index have not been inves tigated beyond some of the most basic predictions about population mean differences, zero order, and first order partial correlations (Hogarty & Kromrey, 2002, 2001, 2000). Purpose The purpose of this study was to build upon the previous research by adv ancing to the next logical step through the evaluation of the utility of the corroboration index and its behavior when appraising theories employing path analytic methods in the context of social science research. Many researchers approach path analysis b y beginning with a model in which there is substantial confidence. This confidence may stem from either theoretical or substantive reasoning about the linkages between the variables under investigation. Less attention, however, is typically given to estim ating the magnitude of such

PAGE 137

124 linkages. Most areas of psychology do not permit a high degree of precision. According to Blaich (1998), quasi quantitative predictions of rough magnitudes of effects could help advance the field. Therefore, the primary focus of this investigation was on the precision in the prediction of the magnitude of effects and an examination of factors that moderate the relationship between corroboration and precision. Method A Monte Carlo study was conducted to investigate the utilit y of a multivariate corroboration index in the appraisal of theories employing path analytic methods. A series of simulations was conducted that related theoretical predictions to empirical results. The study was designed in the context of hypothetical t heories, the cores of which predict a single outcome from various configurations of exogenous and endogenous variables. The resulting path coefficients were the parameter estimates of primary interest. Five factors were manipulated in these simulations: level of verisimilitude (low, moderate, high); level of intolerance (non null, directional, and interval predictions); model complexity (low, moderate, and high); level of collinearity (low and moderate); and sample size (100, 200, and 500). The level of verisimilitude was determined by the proportion of estimated paths that were either added to or deleted from a true model. Level of intolerance, or precision of prediction, was determined by the width of the tolerance interval (i.e., 95%, 50%, or 25% o f the Spielraum). The levels of collinearity were selected to reflect a condition with a low level of collinearity, that would not tend to indicate a deleterious influence

PAGE 138

125 with respect to the analyses and results, and a moderate level of collinearity (VIF =1.5 and VIF=3.0, respectively). The sample sizes selected for this study represent values that range from those that might be considered insufficient (e.g., N=100 for the high complexity models) to those that might be considered more than adequate (e.g., N=500 for both the low and moderate complexity models). For this study, six population correlation matrices were constructed based on a specified number of variables, level of collinearity, and true path model. 10,000 multivariate normal samples were gen erated from each population correlation matrix. Sample correlation matrices were then constructed and analyzed using a series of regression equations simulating path analysis. Standard deviations were then calculated for each of the 270 conditions. Last ly, the obtained values of C i resulting from each path analysis were pooled, and the average value of C i was evaluated in the context of the central design factors. Relationship between Mean C i Verisimilitude, Intolerance, Model Complexity, Collinear ity, and Sample Size For the analysis of the relationship between mean C i and the central design factors in the study, a factorial ANOVA was conducted, treating the average C i as a dependent variable. The independent variables in this analysis were the five central design factors. Omega squared was used to estimate the proportion of variance accounted for in the population by each effect, as well as each of the interaction effects. Somewhat surprisingly, only a single factor, level of intolerance, emer ged to explain more than half of the variance in C i The lack

PAGE 139

126 of influence of any of the other factors led to the decision to conduct another set of analyses, this time, controlling for the level of intolerance. Two factors that were responsible for expl aining a portion of the variance in C i emerged only for the lowest level of intolerance (i.e., the non null condition). With the exception of the influence of the number of variables and level of verisimilitude for these non null predictions, the results suggest that after such a large portion of the variance in C i was attributed to the level of intolerance, the other central design factors were unable to account for a noticeable amount of the variance. Relationship between the Standard Deviation of C i M odel Complexity, Collinearity, and Sample Size An examination of the standard deviation of C i revealed a striking lack of variability across all of the conditions examined. These results are relatively consistent with previous empirical findings. Desp ite the lack of relationship between the standard deviation and model complexity, it was expected that the standard deviation would be influenced by the level of collinearity and sample size. That is, we would expect to see more stability, and hence less variation, in C i as sample sized increased. Further, the level of collinearity, determined by the variance inflation factor (VIF) was expected to influence the variability of the obtained standardized path coefficients. This influence would also be expec ted to translate into less stable estimates of C i An examination of the magnitude of the standard errors of the path coefficients helped to shed some light on this puzzling finding. As expected, the magnitude of the standard errors, which represents th e typical difference

PAGE 140

127 between b and b was observed to decrease as sample size increased. Further, the more complex models evidence more sampling error than the moderate or simple models. However, in many cases, the magnitude of the standard error was not large enough to cause the estimated path coefficients to fall outside of a given tolerance interval. This lack of deviation from the tolerance interval resulted in multivariate closeness estimates of app roximately one across most of the conditions examined. Because the value of intolerance is constant for any given condition, the within cell variability of C i was primarily dependent on the variability in closeness, and hence the finding that C i did not v ary appreciably across samples. Relationship Between Mean C i Precision of Prediction, Model Complexity, and Level of Collinearity When the relationship between mean C i precision of prediction, model complexity and level of collinearity was explored, o nce again the overwhelming influence of the precision of prediction was noted. Mean C i evidenced a dramatic increase in magnitude as the precision of the prediction increased. Within each level of collinearity and sample size, the magnitude of mean C i re mained stable. Although the relationship between mean C i and precision of prediction was not anticipated to be moderated by sample size, the absence of the influence of collinearity was somewhat surprising.

PAGE 141

128 Relationship Between Mean Ci, Precision of Pred iction, and Verisimilitude The relationship between mean C i and the precision of prediction was expected to be moderated by the closeness of the data to the theory (i.e., verisimilitude). As this expected relationship was not evidence in the data, the l evel of bias in the obtained standardized path coefficients was investigated. These analyses were conducted by level of model complexity, verisimilitude, collinearity and level of intolerance. Initially, the deviation of the expected sample path coefficie nts from the population parameters was estimated. As stated earlier, it is the closeness element of the multivariate corroboration index that captures these deviations in the data; therefore it seemed prudent to investigate the average expected values of multivariate closeness. As the resultant bias was negligible, multivariate closeness estimates approached the upper limit of 1.00 consistently across the conditions examined. These results help to elucidate the finding that verisimilitude failed to play much of a role in these results. Relationship between Mean C i and Precision of Prediction Based upon previous empirical research, the precision of prediction was expected to exert a considerable influence on the magnitude of mean C i It was anticipated that this relationship would be substantively stronger than the relationship between mean C i and verisimilitude, model complexity, collinearity, and sample size. In a typical condition in which a weak or non null prediction was made, very little corrobora tive evidence was observed, however, the

PAGE 142

129 advancement of a directional prediction offered considerable improvement. As expected, the most precise or interval predictions yielded the greatest amount of corroboration. Consistent with past empirical findings (Hogarty & Kromrey, 2002, 2001, 2000), the results of this study revealed the profound influence of intolerance, which provided a ceiling for the magnitude of C i regardless of the other design factors examined in the study. Implications for Theory and Pr actice The introduction of a corroboration index was not intended to supplant the use of significance tests in general. Surely, null hypothesis testing has its place. In many of the situations that confront applied researchers, it is vital to distinguish findings that are likely due to chance and those that are not. Often, tests of statistical significance are employed as a starting or entry point in an investigation, prior to embarking on further analyses. In this vein, the use of null hypothesis testin g is used as a type of screen, providing insight regarding how to proceed with additional analyses. And, in many cases, tests of statistical significance are used simply because potentially viable alternatives are not superior or available. Still, there is little doubt that abuses of statistical significance testing are abundant. The results of tests of statistical significance provide limited information that is often misused and misinterpreted. In certain disciplines, for example advertising or marketi ng, the use of statistical testing is misleading given the inherent nature of the sampling methods employing such as quota,

PAGE 143

130 convenience, or large but perhaps not truly representative mail samples. Another commonly held notion is that studies that do not i nclude asterisks are flawed. Perhaps the most compelling reason to avoid such over reliance on tests of statistical significance is that it precludes us from thinking about solving problems and addressing research questions in a different way. Further, t here are certainly a large array of methodological tools that do not rely on experimental designs. The use of mixed methods is but one of the emerging viable alternatives to strict adherence to the null hypothesis way of knowing. As a complement to tests of statistical significance there has been a renewed emphasis toward requiring the reporting of effect sizes along with results of hypothesis tests. Further, more attention is now being given not only to point estimates of effects, but also the degree of confidence that we can place on these estimates and hence an emphasis on the reporting of confidence bands. We as researchers should always remain mindful of the arsenal of tools at our disposal as we search for answers to important questions and seek to discover the nature of the relationships that exist within the complex social systems that we investigate. The methods that might drive educational leaders in their effort to uncover the antecedents to high turnover among teachers might be the very same methods that business leaders apply to the study of factors related to satisfaction in the workplace. An important shift in the business of theory appraisal should involve the comparison of alternative theories and models rather than comparisons of outc omes to the null hypothesis. In many disciplines there exists a complex and

PAGE 144

131 overlapping array of social systems that beg for methods and tools that can serve as an enhancement to traditional analytic methods, rather than alternatives or preferred substitu tes. The sheer logic of appraising a scientific theory is often more complicated than some would believe (Meehl, 1997). In addition to the aforementioned argument regarding the precision of prediction (that is, a precise prediction that is supported by the data warrants more logical evidence of support than does a weak prediction supported by the data), the movement from theory into an empirical test necessitates the incorporation of many logical components besides the theory itself. Meehl (1997) present s these components as elements of an equation: ( ) ( ) 12 gggg xpin TA C A C OO Where T = the theory being tested, A x = Auxiliary theories relied upon during the conduct of the research. C p = Ceteris paribus (all other things being equal), A i = Instrumental th eories related to measures and controls employed, C n = Realized particulars (the extent to which the research was actually conducted as we think it was), and 12 OO = the material conditional if you observe O 1 you will observe O 2 . T hat which is subject to empirical test is not the theory alone, but the amalgam of these elements. Data that appear to contradict a theory may arise because of errors anywhere in this combination of elements. Emphasis should be given and attention focu sed on the influences of the other factors in this amalgam.

PAGE 145

132 Recommendations for Future Research Path analysis continues to enjoy widespread use in the appraisal of theories in many disciplines. Although an arsenal of fit indices are available to aid rese archers in assessing the tenability of an estimated model, these indices lack a critical components that gives consideration to the precision of the prediction under investigation. Meehl (1990a) contends the way in which a theory accumulates money in the bank is by passing several stiff tests; claiming that the main way a theory gets money in the bank is by predicting facts that, absent the theory, would be antecedently improbable (p 115). A theorys merit is a matter of degree, rather than a yes or n o question, as it is treated in null hypothesis testing (Meehl, 1990a). Theoretical support depends on a variety of factors, including the relative uniqueness of the prediction, how surprising the prediction is, the precision of prediction, and degree of correspondence between the prediction and the observed data (Nickerson, 2000). The conditions examined in this study were chosen based on the types of situations that applied researchers would be expected to encounter in the conduct of a traditional pat h analysis. The inclusion of three levels of model complexity and three sample sizes seemed to be reasonable representations of situations that are commonly confronted. The three tolerance intervals, or levels of intolerance were fairly representative of the strength of predictions that are typically advanced in the literature. That is, we would not be surprised to observe researchers making non null or directional predictions, even though we would hope to see even more precise predictions. For this rea son, a rather

PAGE 146

133 liberal tolerance interval was chosen for the most precise predictions, rather than choosing a narrower interval or point prediction that would not be an reasonable reproduction of reality. The correlation matrices that were created for thes e analyses were also chosen with care. Collinearity is a major threat in this type of correlational analysis and hence was featured as one of the central design factors in this study. Larger values of collinearity were explored but not included due to th e deleterious influence that more redundancy in the data would be expected to exert. The level of verisimilitude, or model misspecification, seemed to be in line with what an applied researcher might be expected to come across in the investigation and/or comparison of a number of competing models. Given the inherent nature of these models, and their seemingly reasonable conditions, it is discomforting that the multivariate corroboration index as currently formalized was not often successful in distinguish ing between misspecified models, and models varying in complexity and collinearity. There is considerable evidence that suggests that the current formulation of this multivariate index of corroboration needs to be reexamined. The overwhelming influence of the precision of prediction suggests that alternative representations of multivariate intolerance should be considered in order to a ensure a more appropriate balance between the two components that combine to measure corroboration. Further, the inabil ity of this index of corroboration to distinguish between situations in which weak predictions that are correct warrant the same degree of corroboration gleaned from precise predictions that are not correct is troublesome. To illustrate these discrepant f indings let us consider

PAGE 147

134 both ends of the spectrum, that is, just how false things could possibly get, or the worst case scenario for C i and the best case scenario for a given condition. Consider a low complexity supplementary model, with the lowest level of verisimilitude, and a precise prediction (i.e., 4 variables, 2 supplementary paths, interval prediction). The obtained C i for this model would be .95, whereas the obtained value of C i for the comparable model with high verisimilitude would be .99. A s this example illustrates, the current formulation of the multivariate index of corroboration does not do a good job of detecting model misspecification. Further, if we consider conditions in which a greater degree of deviation from truth is possible, ac ross all three levels of intolerance (that is, obtained path coefficients of 1.0 when truth was estimated to be .30) the resultant indices of corroboration would range from .08 for the non null prediction to .35 for the directional prediction to .43 for the interval prediction. Of course, deviations this large while not evidenced in the data for this study, might be expected if this work was to be replicated with a different set of correlation matrices and more extreme conditions. Although the shortcomin gs of the index in its current form cannot be disputed, the importance of theoretical intolerance as a determinant of degree of corroboration was once again brought to light, underscoring the need for the development of precise theories in the social scien ces. The results from this study suggest that efforts to develop theories in the social sciences and related disciplines that enjoy greater precision of prediction may concomitantly provide critical tests with greater potential for corroboration.

PAGE 148

135 In this study, the closeness component, or measure of verisimilitude appeared to behave badly, and hence did not serve to inform the multivariate corroboration index of the true nature of the data across many of the conditions. It would seem prudent to investiga te the behavior of this index with a modified index of closeness. A different formulization of this component might include a different conceptualization of the relationship between the intolerance interval and the Spielraum. One modification of this com ponent is to reflect the distance of an obtained estimate with respect to how large the deviation really is versus relative to the maximum possible distance. That is, D C l1 SI =rather than the original univariate conceptualization of D C l1 S =. This modification should reduce the tendency for closeness to be overstated and result in a more accurate reflection of multivariate corroboration. Future work on the conceptualization of both the components of closeness and int olerance, as well as an adjustment for sample size, is currently under consideration. Interpretations of risk, utility and the performance of the index under varied conditions will be sought from philosophers of science and applied researches to aid in the reconceptualization of the index. It is anticipated that this inquiry will lead to a set of recommendations regarding the use of the index, the degree of risk that represents a risky prediction given the context of the research being conducted and oth er potential uses of the index. Given the shortcoming of the index and the need for further investigation, applied researchers are cautioned against using this index of corroboration in its

PAGE 149

136 current form. However, an appropriately modified index of corrob oration may serve a variety of different functions across a variety of disciplines and contexts. The index might serve as one indicator among a host of properties or indices that are predictors of the success of a theorys long term fate (Faust & Meehl, 2 002). Given the presence or absence of other properties, an index that examines predictive accuracy in relation to risk might be given more or less weight. If we consider a collection of desirable traits that might include parsimony, novelty, risk, quali tative diversity or breadth and elegance of mathematical beauty, an index of corroboration might be considered a minor player. However, absent some of these more desirable properties, precision and truth likeness may carry a more formidable amount of we ight. A reformulated multivariate corroboration index may be applied in the planning of empirical studies as well as for the interpretation of research results. Its utility may extend beyond univariate, bivariate and traditional path analysis as more soph isticated methods such as Hierarchical Linear Modeling (HLM) and Structural Equation Modeling (SEM) enjoy more widespread attention and use. Ideally, in addition to the precision of prediction (i.e., intolerance), the index would be sensitive to factors su ch as sample size, model misspecification or verisimilitude, and model complexity. Its use should serve to move the arguments surrounding theory testing away from the testing of null hypotheses into a consideration of the complexity of the research contex t, the degree of risk entailed by the theorys predictions, and the extent to which the obtained data (absent the theory) represent a damn strange coincidence.

PAGE 150

137 References Alvin, D.F., & Hauser, R.M. (1975). The decomposition of effects in path analy sis. American Sociological Review 30 37 47. American Psychological Association. (1991). Publication Manual (5 th ed.), Washington, DC. Asher, H. B. (1983 ). Causal Modeling (2 nd ed.). Beverly Hills: Sage. Belsley, D. A. (1984). Reply. The American Stati stician 3 8, 90 93. Bentler, P.M., & Bonnet, D.G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin 88 588 606. Bentler, P.M. (1983). Some contributions to efficient statistics for structura l models: Specification and estimation of moment structures. Psychometrika 48 493 571. Bentler, P.M. (1989). EQS structural equations program manual Los Angeles: BMDP Statistical Software. Bentler, P.M. (1990). Comparative fit indexes in structura l models. Psychological Bulletin 107 238 246. Bentler, P.M. (1995). EQS structural equations program manual Encino, CA: Multivariate Software. Blaich, C. F. (1998). The null hypothesis significance test procedure: Cant live with it, cant live wi thout it. Behavioral and Brain Sciences, 21 194 195.

PAGE 151

138 Bollen, K. A. (1986). Sample size and Bentlers and Bonetts nonnormed fit index. Psychometrika 51 375 377. Bollen, K. A. (1989). A new incremental fit index for general structural equation models. Sociological Methods & Research, 17, 303 316 Boomsma, A. (1982). The robustness of LISREL against small sample size in factor analysis models. In K. G. Joreskog & H. Wold (Eds.), Systems under indirect observation: Causality, structure, prediction (P art 1, pp. 149 173). Amsterdam: North Holland. Browne, M.W., & Cudeck, R. (1989). Single sample cross validation indices for covariance structures. Multivariate Behavioral Research 24 445 455. Campbell, D.T. (1990). The Meehlian corroboration verisi militude theory of science. Psychological Inquiry 1 142 147. Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review 48 378 399. Chow, S. L. (1990). In defense of Popperian falsification. Psychological In quiry 1 147 149. Cudeck, R., & Browne, M.W. (1983). Cross validation of covariance structures. Multivariate Behavioral Research 18 147 167. Dar, R. (1987). Another look at Meehl, Lakatos, and the scientific practices of psychologists. American Psyc hologist, 47 145 151. Duncan, O. D. (1975). Introduction to Structural Equation Models New York: Academic Press.

PAGE 152

139 Fan, X., Thompson, B., & Wang, L. (1999). Effects of sample size, estimation methods, and model specification on structural equation model ing fit indexes. Structural Equation Modeling, 6 56 83. Faust, D., & Meehl, P.E. (2002). Using meta scientific studies to clarify or resolve questions in the philosophy and history of science. Philosophy of Science 69 S185 S196. Finch, S., Cummings, G., & Thomason, N. (2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement, 61 181 210. Finney, J. M. (1972). Indirect effects in path analysis. Sociologic al Methods and Research, 1 175 186. Fraas, J. W., & Newman, I. (1994). A binomial test of model fit. Structural Equation Modeling, 3 268 273. Gerbing, D. W., & Anderson, J.C. (1993). Monte Carlo evaluations of goodness of fit indices for structural eq uation models in educational research. In K.A. Bollen & J.S. Long (Eds.), Testing structural equation models (pp.40 65). Newbury Park, CA: Sage. Gholson, B. & Barker, P. (1985). Kuhn, Lakatos, and Laudan: Applications in the history of physics and psyc hology. American Psychologist 40 755 769. Gigerenzer, G. (1998). Surrogates for theories. Theory and Psychology 8 195 204.

PAGE 153

140 Hanson, N.R. (1958). Patterns of discovery; an inquiry into the conceptual foundations of science. Cambridge, England: Univers ity Press. H arlow, L. L., Mulaik, S. A. & Steiger, J. H. (1997). What if there were no significance tests? Mahwah, NJ: Erlbaum. Hoelter, J.W. (1983). The analysis of covariance structures: Goodness of fit indices. Sociological Methods and Research 11 325 344. Hogarty, K. Y. & Kromrey, J. D. (2000, April). Risky predictions and damn strange coincidences: An initial consideration of Meehls Index of Corroboration Paper presented at the annual meeting of the American Educational Research Association, Ne w Orleans. Hogarty, K. Y. & Kromrey, J. D. (2001, April). Corroboration and Coincidence: The Moderating Effect of Statistical Complexity on the Relationship Between Research Design Factors and Meehls C i. Paper presented at the annual meeting of the Ameri can Educational Research, Seattle. Hogarty, K.Y. & Kromrey, J.D. (2002, February). Whats N got to do with it? A modification of Meehls Index of Corroboration. Paper presented at the annual meeting of the Eastern Educational Research Association, Saras ota. Hu, L., & Bentler, P. M. (1999). Cut off criteria for fit indices in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling 6 1 54. Humphreys, L.G. (1990). View of a supportive empiricist. Psy chological Inquiry 1 153 155,

PAGE 154

141 Joreskog, K.G., & Sorbom, D. (1981). LISREL V: Analysis of linear structural relationships by the method of maximum likelihood. Chicago: National Educational Resources. Joreskog, K.G., & Sorbom, D. (1984). LISREL VI use rs guide (3 rd ed.) Mooresville, IL: Scientific Software. Kerlinger, F. N. (1964). Foundations of behavioral research New York: Holt, Rinehart, and Winston. Kirk, R. E. (Ed.). (1972). Statistical Issues Monterey, CA: Brooks/Cole. Kuhn, T. S. (1962) The structure of scientific revolutions Chicago: University of Chicago Press. Lakatos, I. (1970). Falsification and the methodology of scientific research programs. In I. Lakatos & A. Musgrave (Eds.) Criticism and the growth of knowledge (pp. 91 19 6). Cambridge, England: Cambridge University Press. Lauden, L. (1977). Progress and its problems Berkeley: University of California Press. Lykken, D. T. (1968). Statistical significance in psychological research Psychological Bulletin, 70, 151 159 MacCallum, R. C., Browne, M.W., & Preacher, K. J. (2002). Comments on the Meehl Waller (2002) procedure for appraisal of path analysis models. Psychological Methods 7 301 306. MacCallum, R. C. Wegener, B. N., Uchino, B. N., & Fabrigar, L. R. (1993 ). The problem of equivalent models in applications of covariance structure analysis. Psychological Bulletin, 114 185 199.

PAGE 155

142 Markus, K. A. (2002). The converse inequality argument against tests of statistical significance. Psychological Methods 7 147 160. Marsh, H.W., Ball, J.R., & McDonald, R.P. (1988). Goodness of fit indices in confirmatory factor analysis: Effects of sample size. Psychological Bulletin 103 391 411. Maxwell, S. E. & Delaney, H. D. (1990). Designing experiments and analyzing data Belmont, CA: Wadsworth. McDonald, R.P., & Marsh, H.W. (1989). Choosing a multivariate model: Noncentrality and goodness of fit. Psychological Bulletin 107 247 255. Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ro nald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology 46 806 834. Meehl, P. E., (1990a). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inqu iry, 1, 108 141. Meehl, P. E., (1990b). Authors response. Psychological Inquiry, 1, 173 180. Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numeral pr edictions. Harlow, Lisa, Ed; Mulaik, Stanley, Ed; and Steiger, James, ED. What if there were no significance tests? p. 393 425.

PAGE 156

143 Meehl, P.E., & Waller, N.G. (2002). The path analysis controversy: A new statistical approach to strong appraisal of verisi militude. Psychological Methods, 7 283 300. Morrison, D. E., & Henkel, R. E. (Eds.). (1970). The significance test controversy: A reader. Chicago, Aldine. Mulaik, S. A. (2002). Commentary on Meehl and Wallers (2002) path analysis and verisimilitude. Psychological Methods 7 316 322. Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5 241 301. Olsson, U., Troye, S. V., & Howell, R. D. (1999). Theoretical fit and emp irical fit: The performance of maximum likelihood versus generalized least squares estimation in structural equation models. Multivariate Behavioral Research, 34 31 58. Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation and prediction (3 rd ed.). Fort Worth: Harcourt Brace College Publishers. Popper, K. R. (1959). The logic of scientific discovery New York: Basic. Reichardt, C. S. (2002). The priority of just identified recursive models. Psychological Methods 7 307 315. Roberts, S., & Pashler, H. (2002). How persuasive is a good fit? A comment on theory testing. Psychological Review 107 358 367. Salmon, W. C. (1984). Scientific explanation and the causal structure of the world Princeton, N.J.: Princeton University Press.

PAGE 157

144 Serlin, R. C. & Lapsley, D. K. (1985). Rationality in psychological research: The good enough principle. American Psychologist, 40 73 83. Steiger, J. H., & Lind, J.C. (1980, May). Statistically based tests for the number of common factors. Pap er presented at the annual meeting of the Psychometric Society, Iowa City, IA. Tanaka, J.S. (1993). Multifaceted conceptions of fit in structure equation models with latent variables. In K.A. Bollen & J.S. Lond (Eds.), Testing structural equation models ( pp. 136 162). Newbury Park, CA: Sage. Tanaka, J.S., & Huba, G.J. (1985). A fit index for covariance structural models under arbitrary GLS estimation. British Journal of Mathematics and Statistical Psychology 42 233 239. Thompson, B. (2002, April). Wh at is AERA, anyway? Paper presented at the Annual Meeting of the American Educational Research Association, Chicago. Thompson, B., & Daniel, L.G. (1996). Factor analytic evidence for the construct validity of scores: An historical overview and some gui delines. Educational and Psychological Measurement 56 213 224. Tucker, L.R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika 38 1 10. Waller, N. G., & Meehl, P.E. (2002). Risky tests, verisimilit ude, and path analysis. Psychological Methods 7 323 337. Wilkinson, L., & APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist 54 594 604.

PAGE 158

About t he Author Kristine Y. Hogarty is a native New Yorker who moved to Florida to begin her career at the University of South Florida a fter serving four years in the United States Marine Corps. She was awarded a Bachelor of Science degree in Business Management in 199 1, and a Master of Arts degree in Criminology in 1994. During her tenure at USF she has served in many roles, including undergraduate advisor, teaching assistant, research associate and research consultant. Kris is currently the Coordinator of Research a nd Assessment Systems for the College of Education. Kriss primary research interests include applied statistics and data analysis. For the past eight years, Kris has had the good fortune to collaborate with faculty members and fellow doctoral students ac ross a variety of disciplines on a host of research projects and grants. These opportunities and experiences have frequently resulted in presentations at regional, national, and international conferences. A number of these collaborative works have also b een published in peer reviewed journals such as the Journal of Research on Computing in Education ; Behavior Research Methods, Instruments and Computers ; Multiple Linear Regression Viewpoints ; Psychometrika; and Educational and Psychological Measurement La st but not least, Kris is an ardent animal lover and a huge fan of palm trees, pelicans, sunsets and sandy beaches.