USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
xml version 1.0 encoding UTF8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 001416922 003 fts 006 med 007 cr mnuuuuuu 008 030729s2003 flua sbm s0000 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0000060 035 (OCoLC)52809293 9 AJJ4774 b SE SFE0000060 040 FHM c FHM 049 FHME 090 BF121 1 100 Kisamore, Jennifer L. 0 245 Validity generalization and transportability h [electronic resource] : an investigation of randomeffects metaanalytic methods / by Jennifer L. Kisamore. 260 [Tampa, Fla.] : University of South Florida, 2003. 502 Thesis (Ph.D.)University of South Florida, 2003. 500 Includes vita. 504 Includes bibliographical references. 516 Text (Electronic thesis) in PDF format. 538 System requirements: World Wide Web browser and PDF reader. Mode of access: World Wide Web. Title from PDF of title page. Document formatted into pages; contains 134 pages. 520 ABSTRACT: Validity generalization work over the past 25 years has called into question the veracity of the assumption that validity is situationally specific. Recent theoretical and methodological work has suggested that validity coefficients may be transportable even if true validity is not a constant. Most transportability work is based on the assumption that the distribution of rho ( ) is normal, yet, no empirical evidence exists to support this assumption. The present study used a competing model approach in which a new procedure for assessing transportability was compared with two more commonly used methods. Empirical Bayes estimation (Brannick, 2001; Brannick & Hall, 2003) was evaluated alongside both the SchmidtHunter multiplicative model (Hunter & Schmidt, 1990) and a corrected HedgesVevea (see Hall & Brannick, 2002; Hedges & Vevea, 1998) model. The purpose of the present study was twofold. The first part of the study compared the accuracy of estimates of the mean, standard deviation, and the lower bound of 90 and 99 percent credibility intervals computed from the three different methods across 32 simulated conditions. The mean, variance, and shape of the distribution varied across the simulated conditions. The second part of the study involved comparing results of analyses of the three methods based on previously published validity coefficients. The second part of the study was used to show whether choice of method for determining whether transportability is warranted matters in practice. Results of the simulation analyses suggest that the SchmidtHunter method is superior to the other methods even when the distribution of true validity parameters violates the assumption of normality. Results of analyses conducted on real data show trends consistent with those evident in the analyses of the simulated data. Conclusions regarding transportability, however, did not change as a function of method used for any of the real data sets. Limitations of the^present study as well as recommendations for practice and future research are provided. 590 Adviser: Brannick, Michael 653 credibility intervals. metaanalysis. validity transport. selection. empirical bayes estimation. 690 Dissertations, Academic z USF x Psychology Doctoral. 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.60 PAGE 1 VALIDITY GENERALIZATION AND TRANSPORTABILITY: AN INVESTIGATION OF DISTRIBUTIONAL ASSUMPTIONS OF RANDOMEFFECTS METAANALYTIC METHODS by JENNIFER L. KISAMORE A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Psychology College of Arts and Sciences University of South Florida Major Professor: Michael Brannick, Ph.D Walter Borman, Ph.D. Carnot Nelson, Ph.D. Douglas Nelson, Ph.D. Jeffrey Kromrey, Ph.D. Date of Approval: June 9, 2003 Keywords: metaanalysis, validity transport, selection, empirical bayes estimation, credibility intervals Copyright 2003, Jennifer L. Kisamore PAGE 2 Dedication I would like to dedicate this dissertation to several people who have been instrumental in my life and have been key in my perseverance to complete this project. First, I dedicate this dissertation to my parents, Linda and Jerry, who have provided immeasurable love and support to me both while I was working on this project and throughout my entire life. I would like to thank them for supporting life choices I have made, even when those choices have meant my separating from them geographically. Second, I would like to dedicate this dissertation to Richard Bingham whose love, support, and regular reminders to just get it done helped me focus on and achieve this goal. And last, but definitely not least, I would like to dedicate this dissertation to my wonderful and loving cat, Ed, who kept my notes warm when I was working on other projects and who kept my heart and house warm at all times. In Memory of Ed 19872003 PAGE 3 Acknowledgements I would like to acknowledge a number of people who have lent their guidance and support to me throughout my work on this project. First, I would like to acknowledge my major professor, Mike Brannick, for his dedication to this project and to my development as an academician. Second, I would like to acknowledge the members of my committee, Walter Borman, Jeffrey Kromrey, Douglas Nelson, Carnot Nelson, Lou Penner, and Stuart Silverman and thank them for their insights and suggestions regarding this project. Third, I would like to acknowledge my friends Mike Brazas, Walter Chason, Laurie Diamond, Steve Hall, Don Miles, Melissa Nelson, Vicky Pace, and Marty Sutton who, in each of their own ways, were involved in helping me through the dissertation process in general, and this project in particular. Fourth, I would like to acknowledge Judy Bryant, administrator extraordinaire, for her help and support throughout the graduate school process. Last, I would like to acknowledge Florida taxpayers and tourists who have helped to finance my education through support of the Florida educational system. PAGE 4 Table of Contents List of Tables iii List of Figures v List of Formulae vi Abstract vii Chapter One Introduction 1 Chapter Two Metaanalysis 4 Fixedeffects Metaanalysis 9 Randomeffects Metaanalysis 10 Chapter Three Situational Specificity 12 Testing for Situational Specificity 13 Explaining Variance in Rho 15 Chapter Four Validity Generalization and MetaAnalytic Models 20 Differentiating between Validity Generalization and Transportability 20 Validity Generalization 22 Transportability 23 Chapter Five SchmidtHunter Methods 26 The 75 Percent Rule 26 The Lower Credibility Value Method 27 Criticisms of SchmidtHunter Methods 29 Overall Criticisms 29 Criticisms of the 75 Percent Rule 31 Criticisms of the Lower Credibility Value Method 34 Strengths of the SchmidtHunter Method 35 Chapter Six The Distribution of Rho 37 Theoretical Considerations 37 Empirical Findings 39 i PAGE 5 Chapter Seven The Present Study 44 Research Questions 47 Chapter Eight Method 49 Overview 49 Part One 50 Data Generation 50 Analysis 53 Part Two 56 Data Retrieval 56 Analysis 56 Accuracy Checks 57 Chapter Nine Results 59 Part One 59 Empirical Bayes Estimates 59 SchmidtHunter Multiplicative Model 60 HedgesVevea Corrected Model 60 Other Findings 61 Part Two 62 Chapter Ten Discussion 64 Research Questions 64 Question 1 64 Question 2 65 Question 3 66 Limitations of the Current Study 67 Modeling the Shape of the Rho Distribution 68 Conclusions and Recommendations 70 References 72 Appendices 111 Appendix A: Sample SAS program (Rho Distribution is Moderate Normal with a Mean of .50) 112 Appendix B: Simulated Observed Validity Coefficient Distributions based on Kemery et al. (1987) 121 About the Author 123 ii PAGE 6 List of Tables Table 1 Simulated Conditions 78 Table 2 True Validity Values Used by Callender et al. (1982) 79 Table 3 Artifact Distributions 80 Table 4 Rho is Equal to .25 and is Constant () 81 000.2 Table 5 Rho is Equal to .50 and Variance is Constant () 82 000.2 Table 6 Rho Distribution is Normal () with a Mean of .25 83 012.2 Table 7 Rho Distribution is Normal () with a Mean of .50 84 012.2 Table 8 Rho Distribution is Normal () with a Mean of .25 012.2 and Correlated Artifacts 85 Table 9 Rho Distribution is Normal () with a Mean of .50 012.2 and Correlated Artifacts 86 Table 10 Rho Distribution is Wide Normal () with a Mean of .25 87 034.2 Table 11 Rho Distribution is Wide Normal () with a Mean of .50 88 034.2 Table 12 Rho Distribution is Positively Skewed () with a Mean of .25 89 012.2 Table 13 Rho Distribution is Positively Skewed () with a Mean of .50 90 012.2 Table 14 Rho Distribution is Negatively Skewed () with a Mean of .25 91 012.2 Table 15 Rho Distribution is Negatively Skewed () with a 012.2 Mean of .50 92 iii PAGE 7 Table 16 Rho Distribution is Flat () with a Mean of .25 93 012.2 Table 17 Rho Distribution is Flat () with a Mean of .50 94 012.2 Table 18 Rho Distribution is Uniform () with a Mean of .25 95 012.2 Table 19 Rho Distribution is Uniform () with a Mean of .50 96 012.2 Table 20 Lower Bound Values of Normal and NonNormal Distributions 97 Table 21 Lower Bound of the 99 Percent Confidence Interval when Mean Rho=.25 98 Table 22 Lower Bound of the 99 Percent Confidence Interval when Mean Rho=.50 99 Table 23 Comparison of Model Results for Published Data 100 iv PAGE 8 List of Figures Figure 1. Population Validity Distributions Used by Callender et al. (1982) 101 Figure 2. Population Validity Distributions Used in the Current Study 102 Figure 3. Lower Bound Estimates for 90 Percent Credibility Intervals when Validity is Constant 103 Figure 4. Normal Distribution 90 Percent Credibility Interval Lower Bound Estimates 104 Figure 5. Normal Distribution with Correlated Artifacts 90 Percent Credibility Interval Lower Bound Estimates 105 Figure 6. Wide Normal Distribution 90 Percent Credibility Interval Lower Bound Estimates 106 Figure 7. Positively Skewed Distribution 90 Percent Credibility Interval Lower Bound Estimates 107 Figure 8. Negatively Skewed Distribution 90 Percent Credibility Interval Lower Bound Estimates 108 Figure 9. Flat Distribution 90 Percent Credibility Interval Lower Bound Estimates 109 Figure 10. Uniform Distribution 90 Percent Credibility Interval Lower Bound Estimates 110 v PAGE 9 List of Formulae Formula 1 Components of Observed Validity Coefficients in the FE Case 9 Formula 2 Relationship Between Variance of Observed Validity and Error in FE Case 10 Formula 3 Variance of Population Validity in FE Case 10 Formula 4 Components of Observed Validity Coefficients in the RE Case 10 Formula 5 Effects of Context in the RE Case 11 Formula 6 Variance in Observed Validity Coefficients as a Function of Context and Error in the RE Case 11 Formula 7 Variance in Observed Validity Coefficients as a Function of Variation in Rho and Error in the RE Case 11 Formula 8 Disattenuation Formula 55 Formula 9 Expected Sampling Error Variance of Disattenuated Correlations 55 Formula 10 Empirical Bayes Weight of Disattenuated Correlations 55 Formula 11 Empirical Bayes Estimate of Rho 56 vi PAGE 10 Validity Generalization and Transportability: An Investigation of Distributional Assumptions of RandomEffects MetaAnalytic Methods Jennifer L. Kisamore ABSTRACT Validity generalization work over the past 25 years has called into question the veracity of the assumption that validity is situationally specific. Recent theoretical and methodological work has suggested that validity coefficients may be transportable even if true validity is not a constant. Most transportability work is based on the assumption that the distribution of rho () is normal, yet, no empirical evidence exists to support this assumption. The present study used a competing model approach in which a new procedure for assessing transportability was compared with two more commonly used methods. Empirical Bayes estimation (Brannick, 2001; Brannick & Hall, 2003) was evaluated alongside both the SchmidtHunter multiplicative model (Hunter & Schmidt, 1990) and a corrected HedgesVevea (see Hall & Brannick, 2002; Hedges & Vevea, 1998) model. The purpose of the present study was twofold. The first part of the study compared the accuracy of estimates of the mean, standard deviation, and the lower bound of 90 and 99 percent credibility intervals computed from the three different methods across 32 simulated conditions. The mean, variance, and shape of thedistribution varied across the simulated conditions. The second part of the study involved comparing i i vii PAGE 11 results of analyses of the three methods based on previously published validity coefficients. The second part of the study was used to show whether choice of method for determining whether transportability is warranted matters in practice. Results of the simulation analyses suggest that the SchmidtHunter method is superior to the other methods even when the distribution of true validity parameters violates the assumption of normality. Results of analyses conducted on real data show trends consistent with those evident in the analyses of the simulated data. Conclusions regarding transportability, however, did not change as a function of method used for any of the real data sets. Limitations of the present study as well as recommendations for practice and future research are provided. viii PAGE 12 Chapter One Introduction Almost a century ago, the use of tests in the selection process began when Hugo Munsterberg used tests to help decide which individuals were most likely to succeed as railway motormen (Ghiselli, 1966). The use of tests for making personnel decisions increased markedly as a result of the United States involvement in World War I when the need to assess recruits abilities led to the development of the Army Alpha and Beta tests. Since that time, organizational use and acceptance of tests for the purpose of making selection decisions has grown considerably, albeit rather sporadically. The use of tests and other predictors of job performance in the selection of organizational personnel now is commonplace. The use of tests in the process of making selection decisions does not guarantee selection efficiency or effectiveness. Benefits from the use of tests in selection are dependent not only on whether a test is used, but how scores on the test are used in the selection process. In other words, validity is not a property of the test itself, but rather a reflection of the quality of inferences and decisions made based on test scores (Cronbach, 1970; 1984; Standards for Educational and Psychological Testing, 1999). The process of evaluating the degree to which scores on a specific test can be useful in the prediction of future performance for a particular job is called test validation. Validity is a matter of degree, not a binary phenomenon (Landy, 2003; Murphy & Newman, 2003). 1 PAGE 13 The extent to which test scores used in personnel selection are valid predictors of future job success can have considerable financial and legal ramifications for organizations. Validation studies, however, can be quite expensive and timeconsuming to conduct. According to Walter Borman, local validation studies can cost as much as $500,000 per job (personal communication, June 11, 2002). Furthermore, many organizations do not employ adequate numbers of people in a given job to be able to conduct a statistically sound validation study. Alternatives to conventional validation procedures include the use of synthetic validity (see Guion, 1965; Hollenbeck & Whitener, 1988), Bayesian prior distributions (see Brannick, 2001; Pearlman, Schmidt, & Hunter, 1980), transportability (see Pearlman et al., 1980; Kemery, Mossholder & Roth, 1987), and validity generalization (see Hunter & Schmidt, 1990; Schmidt & Hunter, 1977; Schmidt, Law, Hunter, Rothstein, Pearlman, & McDaniel, 1993). Although each of these techniques have been used and can provide value to the organization, transportability and validity generalization may well be the most robust (Cascio, 1998; Landy, 2003). The extent to which validity inferences based on test scores in one context (i.e., validity coefficients based on a specific job in a specific company) can be extended or generalized to the same or substantially similar job at the same or a similar organization can provide substantial benefits to organizations. In addition to the financial benefits afforded, validity generalization may be the key to personnel psychologys establishment of general laws, the primary goal of all sciences (Landy, 2003). As discussed by Landy, early personnel psychologists realized the need to identify general laws regarding test validity in order to elevate the scientific merit of personnel psychology practices to a 2 PAGE 14 level comparable to other fields of psychology (e.g., learning, sensation and perception). Initial personnel psychology work that examined the possibility of generalizing validity (see Ghiselli, 1966; Guion, 1965), however, suggested that validity is affected by situational factors to a considerable degree. Various undefined situational variables were believed to prevent the generalization of validity coefficients to any great extent. Metaanalytic work over the past 25 years (e.g., Schmidt & Hunter, 1977; Schmidt et al., 1993), however, has shown that situational factors are less responsible and methodological factors are more responsible for observed variations in validity coefficients than was previously believed. These findings suggest that validity generalization may be more tenable and situational specificity less so than early personnel psychologists thought. As Murphy and Newman (2003) stated, Instead of asking whether there is any consistency in validity, personnel psychologists now wrestle with the question of whether there are any meaningful differences in validity across jobs, organizations, settings, etc. (p. 406). The shift away from thinking in terms of situational specificity toward validity generalization has been met with controversy (see Landy, 2003; Murphy & Newman, 2003; Sackett, Schmitt, Tenopyr, Kehoe, & Zedeck, 1985; Schmidt & Hunter, 2003, Schmidt, Hunter, Pearlman, & Hirsh, 1985). Additionally, some institutions that are key in regulating personnel practices still lag behind in their understanding, acceptance, and application of validity generalization principles (Landy, 2003). Nonetheless, recent work has served to renew interest and optimism regarding the extent to which generalization of validity coefficients is reasonable. 3 PAGE 15 Chapter Two Metaanalysis The practice of combining research findings in order to better understand trends or relationships between phenomena has been employed for over two centuries (see National Research Council, 1992; Shaddish & Haddock, 1994). The term metaanalysis was not coined until 1976, however, in an address made by Glass to the American Educational Research Association. Glass (1976) noted the existence of vast discrepancies in the outcomes of studies that purported to measure the same or similar phenomena. In order to better resolve such discrepancies, Glass developed metaanalytic techniques that serve as the basis of current metaanalytic methods. The use of metaanalysis has grown rapidly in recent years (Field, 2001; Oswald, 1999). Metaanalysis is a higherorder quantitative analysis that involves the synthesis of outcomes from numerous primary research studies. Metaanalysis allows researchers to resolve discrepancies in primary study outcomes by statistically controlling for differences in characteristics of the primary studies (e.g., sample size, range restriction, levels of moderator) so that true relationships between variables are clearer. In a metaanalysis, the outcomes of primary studies that investigated the relationship between the same or similar variables are treated as the observations of interests. Thus, metaanalysis uses study outcomes rather than individual scores as the data of interest. 4 PAGE 16 Primary study outcomes are referred to as effect sizes and indicate the degree of association between two variables. A number of different but comparable effect size statistics are available (e.g., standardized mean difference, correlation coefficient, oddsratio). Two examples are provided to illustrate the use of effect sizes in metaanalysis. A recent metaanalysis conducted by Birkeland, Manson, Kisamore, Brannick, and Liu (2003) examined the (standardized) mean difference in personality scores between job applicants and nonapplicants (e.g., job incumbents who took the personality test for research or developmental purposes). In this case, the primary studies were studies that collected personality scores for applicants and nonapplicants. The standardized difference between the average applicant and average nonapplicant score for each study served as the data (effect sizes) included in the metaanalysis. The metaanalysis showed that, across a variety of contexts, applicants personality scores were significantly higher than were nonapplicant scores, suggesting that people tend to inflate scores in the application process. The second example uses the correlation coefficient as the effect size statistic of interest, the effect size pertinent in the study of validity generalization and transportability. Vinchur, Schippmann, Switzer, and Roth (1998) examined validity coefficients separately for several predictors of sales performance, including tests of sales ability. Primary studies were studies that correlated scores on tests of sales ability with objective sales volume. The data used in the metaanalysis consisted of correlation coefficients (corrected for range restriction) from each primary study. Results revealed that, in general, tests of sales ability significantly predict objective sales performance. Use of metaanalytic techniques confers certain advantages over the exclusive use of more established methods such as literature reviews and primary analyses. Unlike 5 PAGE 17 traditional literature reviews that qualitatively evaluate and integrate research conducted in a particular field, metaanalysis provides a quantitative means of synthesizing research outcomes. Although subjective decisions made in the process of conducting the metaanalysis (such as inclusion criteria, choice of model, method, weighting strategy, and artifact corrections) can substantially influence results obtained (Hall & Brannick, 2002; Kisamore & Brannick, 2003, Wanous, Sullivan, & Malinak, 1989; see also Johnson, Mullen, & Salas, 1995; Schmidt & Hunter, 1999), choices made are generally reported (Lipsey & Wilson, 2001) and supported by previous empirical or theoretical work. Also, although the methodological choices are subjective, the analyses can be easily replicated using the same or alternative methodologies and results can be verified objectively. In traditional literature reviews, conclusions also are greatly influenced decisions made by the authors (Landy, 2003). As compared to metaanalyses, decisions made and weighting of included studies in terms of conclusions drawn in a traditional literature review are more likely to be narrow (Glass, 1976; Lipsey & Wilson, 2001), bias (Glass, 1976), affected by retrieval errors (Cooper & Rosenthal, 1980), and less likely to support the research hypothesis (Cooper & Rosenthal, 1980). Another weakness of qualitative reviews mentioned by Landy (2003) is that they tend not to be replicated. Landy explains that conclusions drawn are rarely reassessed and may be overly generalized in application as was the case for the seminal review completed by Guion and Gottier (1965) on the use of personality testing for predicting job performance. The use of metaanalysis also presents benefits to researchers over exclusive use of primary research studies. According to Schmidt et al. (1985), metaanalytic results are more likely to replicate than are results obtained from a primary study. Also, meta6 PAGE 18 analysis can be especially beneficial in explaining discrepancies among outcomes from primary studies. Such discrepancies may be attributable to sampling error, differences in statistical power among the primary studies (Cohen, 1992; Schmidt, Hunter, & Urry, 1976) or the presence of moderators of the observed relations among variables. A moderator variable is a variable that changes the size or direction of the relationship between two other variables. For example, in the Birkeland et al. (2003) metaanalysis, type of job influenced the size of the difference in scores between incumbents and applicants. For sales jobs, applicant scores were lower on Agreeableness than were incumbent scores. For other jobs, there was essentially no difference between applicants and incumbents on Agreeableness. The result makes sense because sales personnel are thought to be dominant in taking positions. Thus, although individual studies included in the metaanalysis may differ substantially in results, the differences may be attributable to methodological factors (e.g., small sample size, unreliability of measures), to substantive systematic factors (i.e., moderators), or to unexplained random variation. In the case of the Birkeland et al. (2003) metaanalysis, the authors were able not only to show the average inflation in personality scores for applicants compared to nonapplicants, but also show that the size of the inflation was related to the type of job studied. Currently, the goal of most metaanalysts is twofold: to estimate the average magnitude of a given effect and to investigate the presence of moderators of the effect. Personnel psychologists using metaanalysis for the purpose of assessing whether validity can be transported also are interested in the variance of the population (infinitesample size) validity coefficients. This variance is called the randomeffects variance component 7 PAGE 19 (denoted by ) and represents variance believed to be due to real differences in effect sizes between studies (situations). The randomeffects variance component provides information regarding how well the population mean represents the distribution of validity parameters and whether moderators of validity for the given predictorcriterion pairing are likely. 2 There are two broad metaanalytic models that researchers can use, (a) fixedeffects and (b) randomeffects models. Fixedeffects models are based on the assumption that all studies estimate a common association or effect size (i.e., the randomeffects variance equals zero). If it were possible to collect infinite sample sizes in each study, all the study effect sizes would be equal to one another. In other words, in the fixedeffects analysis, sampling error is assumed to be the only source of disagreement in primary study outcomes. Randomeffects models assume that if it were possible to collect infinite sample sizes in each study, there would still be differences across studies in the outcomes (Shaddish & Haddock, 1994). Randomeffects models admit the possibility of a distribution of infinite sample effect sizes across studies (randomeffects variance is greater than zero). Choice of fixedversus randomeffects models can have important implications on conclusions drawn (Kisamore & Brannick, 2003; National Research Council, 1992; Overton, 1998). If the randomeffects variance component is large, use of randomeffects analysis gives a better appreciation for the uncertainty about the magnitude of the mean true validity whereas use of a fixedeffects analysis would suggest greater consistency of validity coefficients than is supported by the data. Recent work has suggested that the assumptions made by randomeffects models are more tenable in 8 PAGE 20 general than are those made by fixedeffects models (Hedges & Vevea, 1998; Hunter & Schmidt, 2000; National Research Council, 1992). The focus of the current study will be on the use of metaanalysis as it applies to validity generalization and the transportability of validity coefficients. Discussion of effect sizes will be limited to consideration of the coefficients r and because of their relevance to test validation and validity generalization. Additionally, the term local study will be used to refer to any primary research study that was conducted for the purposes of validating a particular test or selection instrument. Fixedeffects Metaanalysis Fixedeffects metaanalytic models are based on the assumption that observed validity coefficients (r i ) derived from the local studies included in the metaanalyses are all estimates of one population parameter iier (1) where is the parameter and e i represents sampling error. Each r is an estimate of the common value of Thus, if we average many values of r, we expect the sampling errors to cancel one another and that the average will present an accurate estimate of In the fixedeffects case, is believed to have a fixed (constant) but unknown value; the metaanalysis is conducted for the purpose of estimating the true value of As indicated in Equation 1, in the fixedeffects model, observed differences in validity coefficients for a given predictorcriterion pairing are believed to be solely attributable to sampling error (Brannick, 2001; Hedges & Vevea, 1998; Lipsey & Wilson, 2001; Overton, 1998; Shaddish & Haddock, 1994). According to the assumptions of the fixed9 PAGE 21 effects model, if infinite sample sizes could be used, sampling error would be eliminated, resulting in local validity coefficients that are equal to each other and to the true value of Observed variance in effect sizes is believed to be due to sampling error only therefore, 22er (2) meaning that the variance of the true value of is zero, hence: 02 (3) Thus, the fixedeffects case assumes there is one true validity coefficient (parameter) for a given predictorcriterion pairing and that all local validity studies estimate that parameter. Randomeffects Metaanalysis Randomeffects metaanalytic models, on the other hand, do not assume that the true population effect can best be described by a single value. Randomeffects models assume that a distribution of effect sizes, rather than a constant value, best describes a given predictorcriterion relationship in the population (Brannick, 2001; Hedges & Vevea, 1998; Lipsey & Wilson, 2001; Overton, 1998; Shadish & Haddock, 1994). In other words, even if sampling error could be eliminated (i.e., all individual studies would have samples of infinite size), the effect size values for each of the primary studies would not be equal. Variance observed among local validity study coefficients is believed to be a function of both variation among true population effect sizes and sampling error. Thus, iiieCr (4) 10 PAGE 22 where is the meanover all contexts, C i i represents the effects of contextual variables, andrepresents sampling error. Another way of stating Equation 4, is that if we could conduct validation studies with infinite sample sizes (no sampling error), there would be a distribution of ie i across contexts. That is, iiiC (5) meaning that the context effect is represented by the difference between the infinite sample size local study and the grand mean of parameters. There is a distribution of parameters; individual local studies estimate one of the parameters in that distribution. In the randomeffects case, true (infinite sample size) validities are said to be heterogeneous. Every predictorcriterion pairing is associated with a population value ofdue to differences in job context or content. The consequence of Equation 4 (assuming that context and sampling error are uncorrelated) is that in the randomeffects case, the variance in observed correlations is due to two different sources, context and sampling error, thus: i 222eCr (6) Because the effect of context (C i ) results in a distribution of, we write Equation 7 as: i 222er (7) Equation 7 says that the variance of observed correlations is due to the sum of the variance of true (infinite sample) correlations plus the variance of sampling error. 11 PAGE 23 Chapter Three Situational Specificity The notion of situational specificity was discussed by Ghiselli (1966) after noting wide discrepancies in observed validity coefficients derived from the same testjob pairing. According to Ghiselli, sampling error alone could not account for all variance in observed validity coefficients; he remarked, For a specific type of test applied to a particular type of job the validity coefficient given in one report may be high and positive and in another it may be even negative. It is not easy to explain this variation. Unquestionably, to some extent it results from the effects of sampling error. Very probably the crude systems for the classification of jobs and of tests are a factor. Certainly much of the variation results from differences in the nature of and the requirements for nominally the same job in different organizations, and in the same organization from one time period to another. (1966, p. 28) In other words, Ghiselli concluded that variance in observed validities is a function of several factors including, but not limited to, sampling error. His research suggested that differences in job context or content were responsible for variance in observed validity coefficients beyond variance expected due to sampling error. With development of meta12 PAGE 24 analytic models, we can better evaluate whether crosssituational differences in observed validity are due to sampling error alone or are also due to contextual factors. Even though researchers have been discussing the concept of situational specificity of validity for decades, the exact nature of situational specificity, in particular what constitutes a situation remains illdefined (Algera, Jansen, Roe, & Vijn, 1984; Gutenburg, Arvey, Osburn & Jeanneret, 1983). Murphy (2003) stated that validity is situationally specific if the correlation between test scores and job performance truly depends on the job, organization, or the situation (p. 20). Algera et al. (1984) said discussion of situational factors by researchers has fallen into three categories: external determinants of behavior, characteristics of the job and criterion themselves (e.g., job content, performance dimensions, nature of performance measures), and aspects of the validation study design (e.g., time lag between measures, restriction of range, severe unreliability, etc.). Algera et al. (1984) explained that there is a discrepancy between definition of situational factors as defined by Schmidt, Hunter, and colleagues, who limit the notion of situation to characteristics relevant to the job and criterion only, and the definition of situation adopted by other researchers. Testing for Situational Specificity Various methods exist for testing whether situational specificity of validity coefficients is likely. An early method developed by Schmidt, Hunter, and colleagues (see Pearlman et al., 1980, Schmidt et al., 1993; Schmidt, Hunter, & Pearlman, 1981) involves computing the variance of observed validity coefficients and then subtracting variance attributable to correctable statistical artifacts (e.g., variance due to sampling error, criterion unreliability, predictor unreliability, and range restriction) from observed 13 PAGE 25 variance. If 25 percent or less variance remains (i.e., if 75 percent or more of observed variance can be explained by correctable artifacts), then the hypothesis that validity is situationally specific is rejected. The 75 percent rule to testing for situational specificity seldom is used by metaanalysts or personnel psychologists any longer due to inherent methodological weaknesses (see Chapter 5). Another method for evaluating the likelihood of situational specificity is to conduct a homogeneity test on the validity coefficients (see Shaddish & Haddock, 1994). If the homogeneity test is significant, the validity coefficients are considered to be heterogeneous and the hypothesis that validity coefficients are equivalent across studies is rejected (i.e., the possibility of situational specificity cannot be rejected). Various homogeneity tests have been proposed that use statistics based on approximations of the chisquare distribution (where df=number of studies minus one). The method proposed by Hedges is based on observed validity coefficients weighted by the reciprocal of the study variance estimate (Hedges, 1992; Hedges & Vevea, 1998). The method proposed by Hunter, Schmidt, and colleagues uses statistical artifacts in the correction and weighting of observed validity coefficients (Hunter & Schmidt, 1990). Hunter and Schmidt (1990) discourage reliance on homogeneity tests for detecting moderators in general and suggest that such tests only be used to explore the possible existence of unexpected moderators; they advocate use of a ttest to detect significant differences in mean effect sizes grouped by moderator level when a significant moderator is hypothesized to exist (Hunter & Schmidt, 1990). Hedges and Vevea (1998), on the other hand, use the term conditionally randomeffects procedures to describe instances when choice of model is based solely on the outcome of the homogeneity test. Hedges and 14 PAGE 26 Vevea (1998) strongly advocated basing model choice primarily on the type of inferences (i.e., conditional vs. unconditional) one wants to draw from the analysis. Situational specificity may be the result of random or systematic (i.e., moderator) sources of variation. As indicated in Chapter Two, the randomeffects model can be used in cases where the infinitesampleis believed to be best described by a distribution, rather than by a single value. Thus, the randomeffects model is the most appropriate model to use when an individual believes not one, but multiple validity parameters fit a given predictorcriterion relationship in as much as the relationship varies randomly as a result of situational factors. When the situational variable is systematic, either a fixedeffect with moderator analysis or randomeffects with moderator analysis (i.e., the mixed model) is suggested (Lipsey & Wilson, 2001; see also Overton, 1998). Determination of fixed versus random depends upon whether random variance remains after accounting for the effects of the moderator (Lipsey & Wilson, 2001; Shaddish & Haddock, 1994) and the nature of the conclusions one wishes to draw from the analysis (Hedges & Vevea, 1998). i Explaining Variance in Rho Despite empirical work conducted to support validity generalization and transportability, work by Schmidt and colleagues suggests that the mean value of after accounting for correctable artifacts, is approximately 0.10 (Schmidt et al., 1993). Given that the mean is 0.10, the amount of variance not attributable to correctable statistical artifacts, in some cases, may be substantially higher than 0.10. This implies that, in general, variance in observed validity coefficients may not be attributable to 15 PAGE 27 statistical artifacts alone. In other words, contextual factors may explain a small but significant amount of variance in observed validity coefficients. Even though situational specificity of validity was the prevailing belief for some time, the exact nature of a situation remains poorly defined. The concept of situation can be characterized in two ways, as an identifiable source of variance (i.e., a moderator) or as an unknown, possibly unidentifiable source of variance. Situational specificity is said to occur when there is residual variance after variance due to statistical artifacts is subtracted from observed variance. When this occurs in practice, typically, the cause of the remaining variance is unknown. Although this situation is not desirable, it corresponds well to the randomeffects model where residual variance is random and assumed to be normally distributed. The randomeffects tests factors unexplained variance into the mean effect size estimate and confidence interval calculations resulting in more conservative conclusions than would be the case if fixedeffects tests were used (Shaddish & Haddock, 1994). Actual identification of moderators that can explain the residual variance is undoubtedly superior to the conclusion that validities vary for unknown reasons. As described by Landy (2003), moderators have typically been blamed for differences in validity yet few researchers actually seek to confirm the existence of specific moderators. There has been much speculation regarding what factor(s) can account for variance across, yet there remains relatively little research on this phenomenon and thus, few moderators have been identified. The identification of moderators of validity, however, is both theoretically and practically important. The value of moderator identification, research that has been conducted to address this issue, and suggestions regarding possible moderators of validities will be discussed later. i 16 PAGE 28 Identification of moderators is vital in transportability research given that moderators affect the size or direction of the relationship between predictor and criterion. If moderators to the predictorcriterion relationship exist, then subpopulation parameter distributions will exist that will affect the shape of the overall population distribution. The shape of the population distribution has important consequences for the accuracy of conclusions regarding transportability based on current models (see Chapter 6). If a moderator or moderators exist and can be identified, information about the moderators can be used to estimate or calculate the underlying distribution of. i Early work conducted by Ghiselli (1966) was successful in identifying factors associated with variance in validity, including but not limited to sources now considered to be statistical artifacts. For example, Ghiselli identified test administrator as source of variance in validity, noting that observed validity coefficients for a motor test varied as a result of the specific individual who administered the motor test. Another moderator of validity was noted by Ferguson (1951) who showed that validity of selection decisions made from a sales aptitude test varied as a function of quality of management. This finding is consistent with results obtained by Brown (1981) who demonstrated that quality of management moderated the validity of scores on a biodata inventory used in the selection of insurance agents. Although inferences about performance made based on biodata information were valid for both agents working under high and low quality management conditions, validity of inferences under high quality management was greater than under the low quality management. Brown (1981) also conducted a utility analysis to determine the financial impact of the moderator had on the organization. The discrepancy in validity based on 17 PAGE 29 management quality corresponded to a difference in profit of approximately $3,887 per agent per year. Several researchers have created models or simulated conditions in order to address the impact of situational variance. Although these studies did not seek to identify specific moderators, they did provide suggestions for factors that may moderate validity. Kemery et al. (1987) simulated a condition where situational constraints moderated validity such that workers will be differentially affected by situational constraints. Similarly, James, Demaree, Muliak, and Ladd (1992) suggested restrictivenessofclimate could be a potential moderator of validity when developing a model of validity generalization that did not require metaanalysts to assume that variance due to artifacts and variance due to situational factors are independent. James et al.s (1992) model integrates situational moderators into validity generalization procedures. Additionally, other possible moderators of validity were suggested in James et al. (1992) including composition of applicant populations, stability of organizational environment, job security, and unionization of workers. The examples presented above have dealt with moderators of validity when dealing with the same job (i.e., withinjob moderators). Pearlman et al. (1980), however, argued that job family is the proper level in which to generalize (i.e., betweenjob moderators). For example, a study by Gutenburg et al. (1983) was able to show that informationprocessing and decisionmaking requirements for 111 different jobs moderated the validity of tests comprising the General Aptitude Test Battery (GATB) for predicting job performance. This finding is consistent with the assertion made by Schmidt, et al. (1981) that informationprocessing and problem solving demands are the 18 PAGE 30 most likely moderators of test validities (see p. 177); although results by Gutenburg et al. (1983) suggested that informationprocessing and decisionmaking job requirements are likely stronger moderators than Schmidt et al. (1981) would have predicted. A metaanalysis by Schmidt, et al. (1981) was conducted to investigate the possibility that task differences in jobs moderated validity for cognitive ability tests. Results led Schmidt et al. (1981) to conclude that task differences do not moderate validity. The definition used by Schmidt et al. to describe a moderator, however did not agree with common usage. Additionally, as explained by Landy (2003) and Murphy and Newman (2003), problems inherent in the database used by Schmidt et al. (1981) also led to erroneous conclusions regarding the existence of moderator variables; now most personnel psychologist recognize that differences in job complexity do moderate cognitive ability test validity. This also calls into question Schmidt et al.s (1981) assertion that other moderators (e.g., technology changes, leadership style, organizational climate, region, etc.) are unlikely to account for nontrivial differences in validity. The aforementioned research and suggestions regarding possible explanations for variance inhave focused on moderators of validity for specific attributes (e.g., cognitive ability). Work by Gaugler, Rosenthal, Thorton, and Bentson (1987), however, focused on moderators of validity when using assessment center scores. That is, their metaanalysis considered moderators of validity for a specific procedure rather than an attribute (see Landy, 2003). The metaanalysis results suggested several moderators of assessment center validity including group composition, number of assessment devices used, background of assessors, use of peer evaluations, and overall judgment of quality of study. i 19 PAGE 31 Chapter Four Validity Generalization and Metaanalytic Models Validity generalization is a special application of metaanalysis used primarily by industrial psychologists in the realm of employee selection. According to Murphy (2003), validity generalization represents a marriage between metaanalysis and psychometrics. Metaanalysis is a valuable tool for quantitatively examining discrepancies in observed validity coefficients. The situational specificity hypothesis prevailed prior to the late 1970s because discrepancies in observed validity coefficients were larger than what was expected due to sampling error alone (Ghiselli, 1966). Schmidt and Hunter (1977) were able to show using metaanalytic methods that sampling error may completely account for discrepancies in local study validity coefficients. Such work has led to renewed interest in the possibility of generalizing and transporting validity. Differentiating between Validity Generalization and Transportability As is the case of situational specificity, the concepts of validity generalization and transportability (also called validity transport) are illdefined (Brannick, 2001; Murphy & Newman, 2003). Although some psychologists distinguish between the two concepts in terms predictor and situation similarity (e.g., Landy, 2003), others based the distinction on statistical properties of validity parameters (e.g., Brannick, 2001; Kemery et al., 1987). 20 PAGE 32 Landy (2003), considers validity transport to be related to, but more narrow than, the concept of validity generalization. Landy describes validity transport as a kind of oneoff application of the test used in situation A for situation B (p. 180). Thus, transportability is characterized by using validity evidence obtained about a predictor (e.g., test) in one situation to infer validity for that same predictor in another, sufficiently similar job or situation (as demonstrated using a job analysis). Validity generalization, on the other hand, is characterized by broader inferences about predictorcriterion relationships based on previous validity evidence. As described by Landy, validity generalization involves making inferences based on previous validity work supporting the use of tests of a particular construct (e.g., cognitive ability) for jobs within a particular job family. Thus, according to Landy, the difference between validity transport and validity generalization appears to be the extent to which one wishes to infer validity information. In validity transport, evidence of the validity of test score use in the prediction of performance can be inferred only from similar situations using the same test. Validity generalization, on the other hand, allows for much broader inferences, allowing for the use of a specific test in a new situation provided that the test is the same type of test (e.g., cognitive ability test) as has been shown to be valid for jobs that are members of the same job family as is the job in question. Another conceptualization of the difference between validity generalization and transportability (referred to as Validity Generalization 2 by Brannick, 2001) is based on statistical properties of validity parameters. In this case, the term transportability has been developed to clarify whether validity inferences are being made about a constant (i.e., the fixedeffects case) or variable (i.e., the randomeffects case) population parameters. The 21 PAGE 33 goal of both validity generalization and transportability is the same, to extend inferences made about the validity of a predictorcriterion pairing in one situation to a sufficiently similar situation. Transportability and validity generalization differ, however, in the theoretical assumptions on which they are based. Validity generalization is based on the assumption that there exists one validity parameter for a given predictorcriterion pairing. Transportability, on the other hand, is based on the assumption that there exists a distribution of validity parameters for a given predictorcriterion pairing. Thus, assumptions made about the true nature ofdetermine whether transportability or validity generalization may be applicable in a given situation. i Although transportability is sometimes discussed synonymously with or as a special case of validity generalization, a stringent definition of validity generalization will be used in this study to avoid confusing the two concepts. Notes will be provided when discussing the work of others who do not distinguish between the two concepts. Additionally, differentiation of validity generalization and transportability will be based on parameter considerations (e.g., Brannick, 2001; Kemery et al., 1987) rather than on scope of inferences (e.g., Landy, 2003) for the purposes of this paper. 22 Validity Generalization. Validity generalization in the purest case corresponds to a situation where the assumptions of the fixedeffects model of metaanalysis are true and >0. For the purposes of the current paper, I will use the term validity generalization to refer to instances in which This definition corresponds to the validity generalization 1 definition used by Brannick (2001) and the validity generalization term used by Kemery, et al. (1987). In the fixedeffects case, all local validity coefficients are estimating one validity parameter. Thus, validity of a predictorcriterion pairing 0and02 PAGE 34 established in one situation can be inferred as applicable to another situation when the two situations are sufficiently similar (as determined by a job analysis), thus eliminating the need for an additional local validation study. This definition of validity generalization is more restrictive than that used by some researchers (e.g., Pearlman, et al., 1980) Transportability. Validity generalization is not reasonable when even if 02 0 Making inferences about validity in one situation based on validity evidence gathered in another situation, however, still may be warranted when (Pearlman et al., 1980). Transportability of validity may be reasonable in such cases provided the variance (due to situational factors) ofis modest (e.g., Callender & Osburn, 1981). In other words, according to Kemery et al., (1987), although situational specificity may exist for a predictorcriterion pairing (meaning variance of the distribution ofis greater than zero), if the average validity is large and the variance in validity is fairly small, validity information about a given predictorcriterion pairing in one situation may be transported to another, similar situation. The assumptions upon which validity generalization is based are that true validity is positive and constant. Transportability, on the other hand, is assumed reasonable if insteadis a random variable described by a distribution in which a particular minimum value is above zero. Transportability conceptually corresponds to a case in which validities are generalizable but not consistent (Murphy, 2003, p. 21). 0i 2 i i Brannick (2001) and Kemery et al. (1987) disagreed about which minimum value must be greater than zero and thus, when transportability is reasonable. Both agreed that transportability is only acceptable when 0and02 According to Brannick (2001, e.g., Validity Generalization 2), however, transportability of validity coefficients can be 23 PAGE 35 assumed reasonable only when the above conditions are met and. Thus, transportability is only warranted if the mean population validity is above zero, the variance of population validities is greater than zero, and the minimum value in the population distribution of validities is also above zero (i.e., all validity parameters are above zero). Kemery et al. (1987), on the other hand, indicated that transportability is reasonable provided that the above two conditions are met and the minimum value of the 90 percent credibility interval is above zero. Thus, Kemery et al. (1987) asserted that transportability is reasonable when the mean population validity is greater than zero, the variance of population validities is greater than zero, and the upper 90 percent of values in the distribution of population validities are also above zero. This definition is consistent with assertions made by Pearlman et al. (1980) and Hunter and Schmidt (1990). 0)min(ii The difference in Brannick and Kemery et al.s definitions is noteworthy. Brannicks definition corresponds to a situation that, assuming the distribution of approximates a normal curve, will never exist. As stated in Brannick (2001), i if we assume that is normally distributed, it will always be the case that the minimum is less than zero. Therefore, if we complete a randomeffects metaanalysis (an analysis in which >0 and is normally distributed), we must conclude that validity is situationally specific rather than conclude that validity generalizes. (p. 478) i i 2 Thus, according to Brannicks definition, if the distribution of population validities conforms to the properties of the normal curve, transportability of validities will never be 24 PAGE 36 warranted. Kemerys definition of transportability, on the other hand, only requires that the minimum value in the 90 percent credibility interval (i.e., the credibility value) be above zero. In such a case, as much as 10 percent of validity coefficients can be negative resulting in incorrect application and conclusions regarding test scores. Thus, in up to 10 percent of cases where transportability is assumed reasonable according to Kemery et al. (1987), selection professionals will actually base selection decisions on a test with zero (i.e., use of the test scores is as valuable as selecting employees at random) or negative validity (i.e., employees predicted by test scores to perform successfully will actually be poor performers while employees predicted to perform poorly would actually perform successfully). Kemerys definition may violate the 1987 SIOP Principles given that validity may be transported if there is no significant situational effect on validity (i.e., is zero or minimal) or the probability that true validity falls below a specified value is low. An exact definition of low has yet to be provided. Using Kemerys definition, validity may fall below zero in up to 10 percent of cases in which transportability is assumed reasonable provided that the distribution of parameters is normal. Computation of the lower credibility value, however, is based on the assumption that the parameter distribution is normal. If this assumption is false, incorrect conclusions regarding transportability may occur in even more than 10 percent of cases (see Chapter 6; Kemery, et al., 1987; Kemery, Mossholder, & Dunlap, 1989). 2 25 PAGE 37 Chapter Five SchmidtHunter Methods The 75 Percent Rule Schmidt, Hunter, and colleagues have presented two different approaches to assessing the likelihood that validity information can be extended to new situations based on previous validation studies. 1 Their early work (Schmidt & Hunter 1977; Pearlman et al., 1980) said that validity can be generalized if all or most of variance in observed validity coefficients can be attributable to statistical artifacts. Schmidt and Hunter (1977) listed seven sources of artifactual variance that could account for such observed differences: sampling error, criterion unreliability, predictor unreliability, range restriction on the predictor, criterion contamination and deficiency, clerical errors, and imperfect predictor construct validity. Statistical corrections are available only for the first four artifacts (Schmidt & Hunter, 1977). Thus, they inferred that if at least 75 percent of variance in observed validities can be accounted for by the correctable statistical artifacts, then the variance of is assumed to be zero because the uncorrected artifacts would likely account for the remaining 25 percent of variance (Schmidt & Hunter, 1977; Pearlman et al., 1980). In other words, if at least 75 percent of the variance can be accounted for by correctable artifacts, than the assumption of situational specificity is rejected (i.e., ) and validity generalization is considered appropriate. i 02 26 1 Hunter and Schmidt do not differentiate between validity generalization and transportability. PAGE 38 Pearlman et al. (1980) noted that the 75 percent cutoff is an arbitrary one but also suggested that even when less than 75 percent of variance in observed validities is accounted for by correctable artifacts, situational specificity is unlikely to be present thus, still suggesting validity generalization is warranted. Pearlman et al. (1980) indicated that even if situational specificity exists, it is unlikely have a notable impact on validity of selection instruments and thus transportability may still be reasonable. The Lower Credibility Value Method More recent work by Schmidt and Hunter has focused on the use of a 90 percent credibility interval for determining whether validity generalization (transportability) is reasonable. Credibility intervals and confidence intervals are both used in validity generalization and transportability research but the two serve different purposes. A credibility interval is special type of (i.e., Bayesian) confidence interval that is based on the artifact corrected distribution of effect sizes (Whitener, 1990). As used by Schmidt, Hunter, and colleagues (Pearlman et al., 1980), once observed effect sizes are corrected for artifacts, the standard deviation of the corrected effect size distribution is used to construct a onetailed, 90 percent credibility interval around the mean of the distribution (i.e., the interval is constructed around the best estimate of mean true validity, using the standard deviation of infinitesample effect sizes). Construction of the credibility interval is based on assumption of a normal distribution of population parameters (Kemery et al., 1987). The credibility interval is believed to contain the upper 90 percent of the distribution of infinitesample effect sizes. That is, a onetailed 90 percent credibility interval provides the lower bound estimates for the top 90 percent of the population validities. When the 90 percent interval does not contain zero, 27 PAGE 39 transportability is considered reasonable because 90 percent of the population validity coefficients are above zero 2 and thus transportation of validity information to relevant situations should be beneficial in at least 90 percent of cases. When validity transport is reasonable, not the lower bound of the credibility interval, is considered to be the best estimate of validity in a new situation (Pearlman et al., 1980). Credibility intervals also are used to detect possible moderators of validity (Whitener, 1990). Given that variance in observed validities is a function of variance in population validities () and variance due to sampling error (), if is constant, the size of the credibility interval will be small because there will be no variance in (i.e., no moderators of validity). If the credibility interval is large, moderators are likely because variance will be a result of sampling error variance and variance in 2 2e i Credibility intervals tend to be much wider than are confidence intervals. This is true because credibility intervals are based on standard deviation rather than standard error and credibility intervals are estimating the range of values within which 90 percent of population validities lie. Confidence intervals, on the other hand, are estimating the range of values within which the population mean is expected to lie. Although credibility intervals are a type of confidence interval, credibility intervals and confidence intervals each have distinct uses in the realm of validity generalization. Confidence intervals are based on the standard error of the mean validity in the population or subpopulation of interest. Confidence intervals should be used to indicate how much sampling error affects the estimated mean validity for a subpopulation (i.e., 28 2 I am speaking of the desirable range of validity coefficients in terms of positive values only. It is possible that validity coefficients for predictorcriterion pairing (e.g., neuroticism and job performance) could be negative and thus the 90 percent credibility interval should be below, but still not include, zero. PAGE 40 validity coefficients for a particular level of a moderator) or the population in general (i.e., how well does the overall mean validity represent the subpopulation mean validities?). 3 Criticisms of SchmidtHunter Methods Overall criticisms. Several authors have criticized Schmidt and Hunters validity generalization methods based on overall empirical evidence or theoretical weaknesses. Algera et al. (1984) criticized Schmidt and Hunters methods on several grounds. First, they asserted that Schmidt and Hunter have ignored the distinction between theoretical validity (validity between constructs) and observed validity (validity between specific measures). Specifically, Schmidt and Hunter have based conclusions on a compilation of various predictor, criteria, and samples when their model is more appropriate to use in cases where all local validity coefficients are based on the same predictorcriterion pairing. Second, Algera et al. (1984) asserted that the SchmidtHunter method does not provide any checks on the classification of validity data. Algera et al. (1984) suggested that this failure could lead to outlandish conclusions to be drawn regarding the situations to which validity may be generalized. Third, Algera et al. (1984) criticized the SchmidtHunter procedure for not providing diagnostic information as to what factors contribute to situational specificity when it is detected. Lastly, Algera et al. (1984) noted that Type I and Type II error rates were not available at that time for the SchmidtHunter methods. Another criticism posed against the methods used by Schmidt and Hunter involves the weighting strategy used. Hunter and Schmidt (1990) advocated weighting local study validity coefficients by sample size weights. This type of weighting strategy 29 3 See Whitener, 1990 for a more thorough examination of the use of credibility and confidence intervals. PAGE 41 will lead to a suboptimal estimate of the mean when the true betweenstudy variance is greater than zero (Hedges & Vevea, 1998). 4 Hedges and Vevea (1998) advocated the use of a more complicated weighting strategy where initial validity coefficient weights are derived by taking the reciprocal of the estimated sampling variance of each validity coefficient (i.e., inverse variance weights). The weighted mean validity coefficient is calculated using the initial weights. Then, the weighted mean is used to estimate the randomeffects variance component, which is subsequently used to revise the initial weights. Modification of the inverse variance weights by the randomeffects variance component serves (REVC) to increase the estimated sampling variance for each local validity coefficient when the REVC is positive. The modified weights are used to compute a revised weighted estimate of the mean observed validity (see also Lipsey & Wilson, 2001). Although both weighting strategies produce equivalent results as the number of studies grows large, when the number of local validity coefficients is small (as is typical in many metaanalyses), the modified inverse variance weights should produce more accurate estimates of the mean validity than will the sample size weights (Hedges & Vevea; 1998; Kisamore & Brannick, 2002; Shadish & Haddock, 1994). Thomas (1990), on the other hand, criticized Schmidt and Hunters methods because they are not modelbased procedures. Estimations made using these methods are limited and have no underlying probability model. Thomas (1990) proposed a new model of validity generalization (transportability), special features of which include the capacity to estimate multiple populationvalues based on local validation data, develop i 30 4 According to Schmidt and Hunter (2003) inverse variance weights produce more accurate estimates of the mean while sample size weights produce more accurate estimates of variance. PAGE 42 confidence intervals around the estimates, and estimate the variance between the multiple population values. i i Criticisms of the 75 Percent Rule. Some criticisms expressed regarding methods advocated by Schmidt, Hunter, and colleagues has focused on use of the 75 percent rule. Like Thomas (1990), James et al. (1992) also criticized the SchmidtHunter method for not being modelbased. James et al.s (1992) criticism differs from that of Thomas (1990), however, in that James et al. focused on the lack of an organizational model for the method. James et al. (1992) asserted that the residualization method (the 75 percent rule) is lacking in that it is designed to test the moderating effects of situational variables yet no situational variables are measured or hypothesized to exist. The authors explained that organizational models are necessary for explaining the impact of situational factors on validities. James et al. (1992) also questioned the SchmidtHunter procedures assumption that variance due to situational variables and artifacts are independent. The authors suggested that: some artifacts are not really artifacts because their occurrence may be traced back to the same situational causes that produce variation in true validities over situationsif artifacts and true validities are causally linked to the same or common causes, then the foundation of residualization on which current VG (validity generalization) mathematical modeling rests is invalid (James et al., 1992). The authors presented an organizational model whereby restrictivenessofclimate acts as a situational moderator of validity in that nonrestrictive climates will be associated with higher validity coefficients than will restricted climates. James et al. (1992) showed that 31 PAGE 43 artifacts and situational variables are not necessarily independent even though the SchmidtHunter residualization procedure (i.e., the 75 percent rule) assumes they are independent. According to the authors, restriction of range on the criterion (and thus criterion unreliability) is more likely in restrictive than unrestrictive organizational climates, creating a situation where situational variables and artifacts are correlated. In such a case, using a residualization method whereby variance due to artifacts is subtracted from observed variance underestimates the importance of the situation because variance due to real situational factors is subtracted out with variance considered to be error (James et al., 1992; see also James, Demaree, Mulaik, & Mumford, 1988). In addition to their overall criticisms of SchmidtHunter methods, Algera et al. (1984) also specifically criticized use of the 75 percent rule, asserting that it is a univariate test of situational specificity when a multivariate test is more appropriate and more powerful. The authors noted that since the univariate test may lack power, researchers using the 75 percent rule are likely to conclude erroneously that situational specificity does not exist. In order to address the issue of Type I and Type II error rates for the SchmidtHunter procedures, Sackett, Harris, and Orr (1986) and Spector and Levine (1987) conducted separate Monte Carlo simulations. Both sets of authors found that the SchmidtHunter method was prone to high Type I error rates especially when a small number of validity coefficients were included in the analysis. Conversely, both sets of authors also noted unacceptably low power (e.g., unable to detect situational specificity that really exists) in some situations. This result suggests that when using the Schmidt32 PAGE 44 Hunter 75 percent rule, conclusions whether situational specificity exists are fairly likely to be erroneous. Callender and Osburn (1981) criticized the 75 percent rule because it is unaffected by the number of studies that go into a metaanalysis. Callender and Osburn (1980) showed that both sample size and number of studies affect Type I error rate yet the 75 percent rule fails to weight the impact of the number of studies included. Paese and Switzer also criticized the 75 percent rule used by Schmidt and Hunter. Results of a Monte Carlo study conducted by Paese and Switzer (1988) 5 suggested that adjustments be made to the 75 percent rule depending on the availability of reliability and selection ratio values. Paese and Switzer examined the effects of reliance on hypothetical distribution when reliability and selection values are not available compared to when these values are available. They concluded that use of hypothetical distributions when using SchmidtHunter procedures may result in erroneous conclusions and that instead available reliability should be used even when only partial information is available. In his dissertation, Oswald (1999) considered accuracy of conclusions drawn regarding situational specificity and transportability conclusions for the SchmidtHunter 75 percent rule. The focus of Oswalds (1999) study, however, was on accuracy of conclusions when subpopulation distributions were not normal. Oswald found that based on the 75 percent rule, metaanalysts would conclude that there was no variance across (i.e., validity is constant) when true variance was small (0.00125) or moderate (0.005) for normal, negatively skewed, positively skewed, bimodal, and trimodal subpopulation i 33 5 See also Paese (1990) correction to Paese and Switzer (1988). PAGE 45 distributions. 6 Only when true variance was actually zero (e.g., the fixedeffects case) were conclusions based on the 75 percent rule correct. In other words, Oswald noted that the 75 percent rule was biased against situational specificity (i.e., the 75 percent rule is prone to Type II errors) in that conclusions were likely to suggest validity generalizes even when there existed substantial variance in. Results of the study also suggested that the stability of the 75 percent rule was very low. Oswald said that when based Schmidt and Hunters 75 percent rule, metaanalytic conclusions that reject the possibility of situational specificity have a high chance of being incorrect (p. 100). i Use of Schmidt and Hunters 75 percent rule has been met with substantial criticism. In light of the weaknesses inherent in the 75 percent rule, few metaanalysts rely on its use any longer. Metaanalysts now rely more heavily on use of credibility intervals for determining whether transportability is warranted. Criticisms of that method are discussed below. Criticisms of the lower credibility value method. Oswald and Johnson (1998) also noted problems with error rates for SchmidtHunter (Hunter & Schmidt, 1990) metaanalytic procedures including a negative bias for when local study data conformed to various shapes including shapes that meet (i.e., the normal, bivariate distribution) and violate distribution assumptions (e.g., fan, football, and sigmoid distributions). This finding suggests that when moderators to validity exist, the rate of Type II error may be unacceptably high. The extent of negative bias was not affected by shape of local study distribution, number of studies included in the metaanalysis, or sample size. Additionally, the use of numerous corrections in the SchmidtHunter (Hunter & Schmidt, 2 34 6 Distributions were approximations only. They were not continuous. PAGE 46 1990) procedure decreased the stability of meaning that estimates ofbased on individual metaanalyses may be highly inaccurate, especially when the product of N (sample size) x k (number of studies) is less than 5,000. Oswald suggested that use of credibility intervals in such cases are very likely to lead to erroneous conclusions regarding the presence of moderators. Hall and Brannick (2002) investigated the accuracy of Schmidt and Hunters (Hunter and Schmidt, 1990) multiplicative model of validity generalization. Results of the analysis suggested that accuracy of the SchmidtHunter credibility method deteriorates as and increase. Additionally, when the number of studies included in the metaanalysis was small (k=10), credibility intervals tended to be too narrow. 2 Strengths of the SchmidtHunter Method. Despite criticisms, the SchmidtHunter method does have a number of strengths. First, a number of Monte Carlo simulations have shown that use of the SchmidtHunter model results in fairly accurate estimates of (Callender, Osburn, Greener, & Ashworth, 1982; Field, 2001; Hall & Brannick, 2002; Oswald, 1999; Oswald & Johnson, 1998) when the parameter distribution is normal. Second, use of the SchmidtHunter model results in more accurate estimates of and than does reliance on the HedgesVevea model. Estimates of both 2 and are severely underestimated by the HedgesVevea model which does not use artifact corrections (Hall & Brannick, 2002). Third, even when artifact corrections are used the HedgesVevea model estimates both 2 and less accurately than does the SchmidtHunter model. Fourth, Hall and Brannick (2002) showed that when the distribution of is assumed to be normal, the HedgesVevea model (even when corrections are applied) 2 i 35 PAGE 47 results in more conservative and less accurate conclusions regarding transportability than does the SchmidtHunter model. 36 PAGE 48 Chapter Six The Distribution of Rho Theoretical Considerations When considering the randomeffects case, the most common assumption about the underlying distribution of is that it is normal (Brannick, 2001; Hall & Brannick, 2002; Hedges & Vevea, 1998; Pearlman et al, 1980; Schmidt et al., 1981). Given that the normal distribution is asymptotic, the tails of the distribution extend infinitely. Such an assumption presents problems for transportability work given that the minimum value of must be negative and that the assumed normal distribution of would conceptually extend beyond the bounds of possible validity coefficients (e.g., validity coefficients are bound between and +1 but the distribution of, if normal, must extend from to +). Thus, in the randomeffects case, if the assumption thatis normally distributed is true, even when i i i i i is significantly greater than zero, it will always be the case that transporting validity from a given situation will be beneficial in some cases and detrimental (e.g., have negative validity) in others (Brannick, 2001). Metaanalytic procedures are based on the same kind of statistical methods (Lipsey & Wilson, 2001) and assumptions that are common in other statistical applications. Just as in firstorder data analysis, we use statistics to combine data (observed validity coefficients) to obtain a mean (validity coefficient for a particular 37 PAGE 49 sample of studies). The mean r value then represents a data point in a sampling distribution of mean validity coefficients, the mean of which is approximately 7 equal to the mean population validity coefficient. As in primary analyses, the shape of the population distribution does not matter much for drawing inferences about the mean population parameter value given that the population distribution can be skewed while the sampling distribution increasingly approximates the normal curve as the number of independent effect size estimates increases (Central Limit Theorem). The shape of the population distribution matters little when dealing with confidence intervals around the mean. Confidence intervals are based on standard error of the mean (Whitener, 1990), which is a property of a sampling distribution derived from a given population. Properties of sampling distributions specify that a given sampling distribution will be more normal than the population distribution upon which it is based if the population distribution is nonnormal. Thus, inferences drawn based upon the sampling distribution merit assumption of a normal distribution. The central limit theorem states that the sampling distribution quickly becomes normal as the number of studies (k) in the metaanalysis increases. As explained in Whitener (1990), credibility intervals are based on the population standard deviation (of infinite sample effect sizes). Credibility intervals are used to determine whether transportability of validity can be assumed. Calculation of credibility intervals is based on the assumption of a normal distribution. According to Mulaik (as 38 7 The r statistic is only an approximation of for several reasons. Even when no attenuation in observed validity occurs, the coefficient r is a biased estimator of (except when =0) because the sampling distribution of r becomes more skewed as becomes more extreme because is bound between and +1 (See also Schmidt et al., 1981). When attenuation of r occurs due to statistical artifacts, r is a negatively biased estimator of PAGE 50 cited in Thomas, 1997), creation of credibility intervals is only possible under the assumption that errors and range restriction effects are distributed normally even though this condition is unlikely in reality. Thus, knowledge of the true shape of the population distribution of a given predictorcriterion relationship is essential to determine whether conclusions about transportability are valid given that transportability conclusions are based on the lower credibility value which is affected by both the mean and variance estimates. Empirical Findings Transportability work tends to be based on the assumption that the underlying distribution of validity coefficients in normal. Both SchmidtHunter (1990) and Hedges and colleagues (Hedges & Vevea, 1998) models are randomeffects models based on the assumption of a normal distribution. According to Kemery et al. (1989), however, there is no compelling substantive theoretical reason for the assumption of a unimodal distribution (p. 31). The shape of the underlying distribution of has received some attention over the past two decades but little has been done to actually model the distribution. Callender et al. (1982) conducted a Monte Carlo study to examine the impact that sample size, mean population validity, and shape of the underlying distribution have on the conclusions drawn from a transportability study. They used five different distribution shape manipulations: flat, constant, normal, positively skewed, and negatively skewed. They also considered multiple models of transportability including Schmidt and Hunters (Hunter & Schmidt, 1990) noninteractive and barebones methods. For the noninteractive method, Callender et al. (1982) found that the shape of the underlying i 39 PAGE 51 distribution did substantially affect the conclusions that were drawn from the transportability study, when artifacts were uncorrelated. Skew greatly affected the accuracy of the lower credibility estimate leading to over or underestimation of the value depending on whether the skew was negative or positive, respectively. In other words, if the population distribution were negatively skewed, conclusions about the extent to which transportability is assumed reasonable would be exaggerated while positive skew would be associated with transportability conclusions that would be too strict. Flat distributions also were associated with overestimation of the credibility value leading to overly confident conclusions about the transportability of validity coefficients. On the other hand, when the population validity was constant (i.e., the FE case), the lower credibility estimate was actually too low, again leading to transportability conclusions that were too strict. Problems with overestimation of the credibility value were also noted for the normal distribution case. They also noted that the SchmidtHunter noninteractive procedure consistently overestimated the 90 percent credibility value leading to conclusions about transportability that were too lenient. The barebones method, on the other hand, tended to underestimate the credibility value leading to conservative conclusions about transportability. Results of the study led Callender et al. (1982) to conclude, It could be argued that there is no need to be concerned with this issue unless one has first reached the conclusion that the true validities were not constant (p. 867). Callender et al.s (1982) conclusion is based on their finding that when the population validity is a constant parameter, the credibility value will be slightly underestimated (meaning that conclusions about transportability will be conservative) yet when it is not a constant, the credibility 40 PAGE 52 value will be biased, but in an unknown direction. Even with their refined model, however, Schmidt et al. (1993) were unable objectively conclude that true validities for cognitive ability tests as predictors of performance were constant; the estimated mean value of was 0.10. Assumptions of randomeffects models appears more tenable than fixedeffects models, however, any case in which the randomeffects model is assumed to be appropriate, conclusions regarding transportability are questionable because the exact shape of the distribution is unknown, yet the shape of the distribution may have a substantial impact on the accuracy of conclusions drawn. Work by Kemery et al. (1987) and Kemery et al. (1989) also focused on the shape of the underlying distribution of. Both studies used Monte Carlo simulations to examine the effectiveness of the SchmidtHunter additive model of validity generalization when a nonnormal distribution was assumed. The approach was different than that of Callender et al. (1982). In both studies, the underlying distribution of was assumed to be bimodal. In Kemery et al. (1987), this assumption was used to illustrated a case where a particular test is a valid predictor of performance (i.e., i 1 =0.60) in some organizations but completely invalid (e.g., 2 =0; due to situational constraints) in other organizations. In other words, situational constraints served to moderate validity for the predictorcriterion pairing. Kemery et al. (1989) extended these conditions by considering a series of bimodal distributions where 1 was always equal to zero but 2 varied between 0.1 and 0.9. Results of both studies support and extend those of Callender et al. (1982), again suggesting that conclusions about the transportability of validity coefficients may be too lenient depending on the shape of the true distribution of. Kemery et al. (1987) i 41 PAGE 53 and Kemery et al. (1989) found that a substantial percentage (e.g., over 30 percent in Kemery et al., 1987) of population validity coefficients would have to be zero before transportability could not be inferred using the SchmidtHunter approach. Furthermore, Oswald (1999) considered distribution shape on accuracy of transportability conclusions based on methods presented by Schmidt and Hunter (1990). Unlike the studies described above, Oswald manipulated shape of subpopulations rather than the overall population of validity parameters. Oswald used six different subpopulation distribution shapes: constant, positively skewed, negatively skewed, bimodal, trimodal, and normal. Subpopulation distributions were not continuous. Oswald noted that across all conditions studied (shape, true values of, variance of sample size, and number of studies per metaanalysis were manipulated), estimated variance in true validity was negatively biased such that true variance in was consistently underestimated. The degree of underestimation lessened as true variance acrossand number of studies in the metaanalysis increased. Oswald did not find any effects of subpopulation shape on bias of the estimate of true population variance. Estimated credibility intervals, however, were smaller than intervals based on true validity parameters. i i i 8 When variance was small (0.00125), the credibility interval range was estimated to be zero, suggesting that was constant, when it was not. Oswald indicated, Because one generally conducts only one metaanalysis (or very few metaanalyses) in a particular area precision in metaanalytic statistics, and the conclusions that are based on them, is crucial (p. 96). Thus, knowledge of the shape of subpopulation and 42 8 Oswalds (1999) used a twotailed 90 percent credibility interval. His 90 percent credibility values correspond to onetailed 95percent credibility values. PAGE 54 population parameter distributions is critical to draw accurate conclusions regarding transportability. 43 PAGE 55 Chapter Seven The Present Study Previous work done to examine the transportability of validity coefficients is based on the assumption that the underlying distribution of is normal. Little work has been conducted to study the underlying parameter distribution although several authors have examined or discussed what the consequences that violation of the assumptions might be (e.g., Algera et al., 1984; Callender et al., 1982; Kemery et al., 1989). Accurate estimates of the lower bound of the credibility interval are crucial for applications of metaanalysis to test validation. If the underlying distribution is not normal, then either of two approaches might be used to compute the lower bound estimate. First, one can model the underlying distribution of effect sizes and use that model (e.g., chisquare, uniform, etc.) as the basis for calculating the lower bound. Second, one can use a method that makes no assumption about the distribution of effect sizes (that is, a nonparametric method). i In the absence of a theoretical justification for a particular model, it is difficult to choose a model for a metaanalysis (the normal distribution is the default option). Therefore, a method that does not depend upon any assumption about the underlying distribution of effect sizes was chosen. The method is known as empirical Bayes estimation. In this method, local studies are revised to make them closer to the overall mean of the metaanalysis; the larger the expected sampling variance for a local study, 44 PAGE 56 the greater the revision. The distribution of empirical Bayes estimates should more closely approximate the population distribution of effect sizes than does the observed distribution. The lower bound estimate is then calculated directly from the distribution of empirical Bayes estimates. The present study used a competing model approach in which a new procedure for assessing transportability was compared with two more commonly used methods. Empirical Bayes (EB) estimation (Brannick, 2001; Brannick & Hall, 2003) was evaluated alongside both the SchmidtHunter (SH) multiplicative model (Hunter & Schmidt, 1990) and a corrected HedgesVevea (HV) (see Hall & Brannick, 2002; Hedges & Vevea, 1998) model. The purpose of the present study was twofold. The first part of the study compared the accuracy of estimates of the mean, standard deviation, and the lower bound of 90 and 99 percent credibility intervals computed from the three different methods across 32 simulated conditions. Sample sizes (N), number of study coefficients (k), and shape and variance of the population distribution were based on work by Callender et al. (1982) and extended to include additional conditions not considered by Callender et al. The simulations allow for evaluation of the effects of population distribution shape on accuracy of conclusions regarding transportability of validity based on the three methods. Basics of the SchmidtHunter procedure were discussed in Chapter Five. The HedgesVevea procedure is another randomeffects metaanalytic method that is based on the assumption of normally distributed population validities. Although work has been conducted to address consequences of violation of this assumption for the SchmidtHunter model (e.g. Callender et al., 1982; Kemery et al., 1987; Kemery et al., 1989; Oswald, 1999), consequences of violation of the normal assumption have not previously 45 PAGE 57 been examined for the HedgesVevea model. A variant of the corrected HedgesVevea model used by Hall and Brannick (2002) was utilized in the present study. 9 Unlike Hall and Brannick (2002) who used assumed artifact distributions as the basis for attenuation corrections, the current study used studylevel data as the basis for corrections used in the HedgesVevea method. The third method tested was empirical Bayes estimation, which involves adjusting local observed validity coefficients based on the distribution of effect sizes contained in a metaanalysis. In empirical Bayes applications, once the overall mean validity and variance of the true validity coefficients have been estimated in a metaanalysis, adjustments can be made to each individual observed validity coefficient. The adjustment has the effect of pulling observed validity coefficients with a large sampling error variance closer toward the overall mean while local validity coefficients with a small sampling error variance are adjusted little. In other words, local validity coefficients that are imprecise estimates are adjusted the most and in the direction of the mean, while more precisely estimated local validity coefficients receive only minor adjustments. The resulting lower bound should be more accurate, on average, than those derived from more conventional methods (e.g., SchmidtHunter) regardless of the shape of the underlying population distribution. Empirical Bayes estimates were computed using the SchmidtHunter technique because the SchmidtHunter procedure does not require validities to be converted to z values for analysis. 46 9 Work by Hall and Brannick (2002) has shown the absence of corrections in the HV model results in substantial negative bias in and They demonstrated that use of artifact corrections for the HV model should be used to produce more accurate estimates of 2 and and thus more accurate conclusions regarding transportability. 2 PAGE 58 The second part of the study involved analyzing estimated means and lower bounds of the 90 percent credibility intervals from observed validity coefficients obtained from several published test validation reviews or metaanalyses (Brown, 1981; McDaniel & Schmidt, 1985; Shannon, 1989; Tett, Jackson, & Rothstein, 1991). 10 This part of the study was conducted to investigate whether use of the different methods of assessing transportability result in different conclusions in practice. Research Questions The present study is designed to address the following questions: 1. How accurate are estimates of and for the three methods when the underlying distribution has different shapes? Although this question has been addressed in other analyses the for SchmidtHunter method, this study appears to be the first to assess the impact of a nonnormal underlying distribution on results obtained using the HV and empirical Bayes methods. Results for SchmidtHunter and HedgesVevea model accuracy were compared to results obtained by Callender et al. (1982) and Hall and Brannick (2002), which showed convergence of results. 2 2. What errors/biases regarding transportability are most likely for various shapes of the distribution for each of the three methods? This question was addressed by Callender et al. (1982) with regard to SchmidtHunter model accuracy for various parameter distribution shapes. The current study, however, includes more conditions (e.g., uniform distribution conditions) than were used by Callender et i 47 10 Usable data could not be extracted from Ghiselli (1966) or Levine, Spector, Menon, Naraynan, & CannonBowers (1996) for the purposes of the present study and had to be excluded from analyses. PAGE 59 al. The current study also uses a different version of the SchmidtHunter method (multiplicative rather than noninteractive), and utilizes continuous underlying paramater distributions. Additionally, the current study tests the accuracy of the HedgesVevea model and empirical Bayes estimation method for drawing conclusions about transportability. 3. Is the empirical Bayes procedure more accurate than SchmidtHunter and HedgesVevea models for determining the lower bound of the credibility interval and thus accurate conclusions regarding transportability? Hall and Brannick (2002) assessed differences in accuracy for both SchmidtHunter and HedgesVevea procedures when the parameter distribution is normal. Hall and Brannick (2002) found that the SH model is more accurate than is the HedgesVevea model when the parameter distribution is normal. The current study addressed whether the SchmidtHunter model is also more accurate when the parameter distribution is nonnormal as well as whether empirical Bayes estimation can be used to provide a more accurate method for assessing transportability than either the SchmidtHunter or HedgesVevea methods. 48 PAGE 60 Chapter Eight Method Overview The present study is a twopart study. In Part One, Monte Carlo methodology was used to compare the accuracy of conclusions regarding transportability made using SchmidtHunter (Hunter & Schmidt, 1990), HedgesVevea (Hedges & Vevea, 1998), and empirical Bayes (Brannick & Hall, 2003) procedures when a variety of parameter distribution shapes were presumed to exist. Schmidt and Hunters multiplicative model was used for all SchmidtHunter analyses unless noted otherwise. 11 Additionally, values derived from SchmidtHunter procedures were used in the empirical Bayes estimation procedures. All HedgesVevea analyses utilized a variant of the corrected HedgesVevea model used by Hall & Brannick (2002) in which studybased information was used for corrections, unless noted otherwise. For Part One of the study, data were generated based on conditions outlined in Callender et al. (1982). The simulated data were analyzed using each of the three metaanalytic methods. Estimates of the population mean, randomeffects variance component, and lower 90 and 99 percent credibility values were produced for each method. To assess stability and accuracy of the estimates, mean, standard deviation, and 49 11 Schmidt and Hunter (2003) indicated that their interactive model is superior to their multiplicative model. The multiplicative model, however, was used so comparisons could be made between the present study and the study conducted by Hall and Brannick (2002). PAGE 61 root mean squared error values were calculated over iterations for each condition. Values were back transformed into r, as necessary, for comparison purposes. For Part Two of the study, validity coefficients were obtained from archival sources. SchmidtHunter, HedgesVevea, and empirical Bayes methods were used to calculate , and the lower bound of onetailed 90 and 99 percent credibility intervals. Conclusions regarding transportability for each of the three methods were assessed. 2 Part One Data Generation. Data were generated to create range restricted and attenuated bivariate distributions using SAS/IML (SAS Institute, 1990) based on mean, variance, and shape specifications used by Callender et al. (1982). Data generation in the current study, however, differed from procedures used by Callendar et al. (1982). Callendar et al. did not use continuous distributions but rather specified a population of 50 validity coefficients from which the computer sampled without replacement 50 times (See Table 2 and Figure 1). Artifact parameters were sampled jointly to ensure proper artifact correlations. For the current study, random number generating equations were written to create continuous distributions with approximately the same mean, variance, and shape as the distributions used by Callendar et al (See Figure 2). As appropriate, minimum and maximum values for distributions were also considered in light of values used by Callender et al. (1982). 12 Known mean, randomeffects variance component and lower bound 90 percent credibility interval values were computed by sampling from the 50 12 For example, when the distribution of was positively skewed with a mean of .50, the simulation was programmed to eliminate validities lower than .39 (see Table 2). i PAGE 62 distributions 1 million times. The values thus computed (i.e., known values) were inserted into a Monte Carlo program which generated sample values based on the population validity, range restriction, and criterion reliability parameters. The current study followed Callendar et al. in that artifact parameters were selected jointly to ensure appropriate levels of artifact correlation. Metaanalytic simulations were conducted for various combinations of the following conditions: 1. Mean population validity coefficient: 50.and25. 2. Shape of population distribution: constant, normal, flat, uniform, positively skewed, and negatively skewed. Distributions were continuous. 3. Variance for parameter distributions () was set at .012, a moderate value, except for the constant and wide normal distribution conditions. Variance for the constant conditions was zero while the variance for the wide normal distribution conditions was set at 0.034. 2 4. Artifact correlation was allowed in four normal distribution conditions (see column 4 in Table 1). The correlation between restriction ratio for the predictor (U) and population criterion reliability in those four cases was .30, on average. Each transportability metaanalysis included either 25 or 150 simulated validity studies (k). Sample size (N) was treated as a random variable. N for each local study was chosen at random from a normal distribution with a mean of 125 and a standard deviation of 25 (cf. Hall & Brannick, 2002). In addition to Callendar et al.s 26 conditions, uniform distribution conditions were included in the present study. There were 32 conditions total (see Table 1) in the present study, as opposed to Callender et al.s (1982) 26 conditions 51 PAGE 63 due to the inclusion of the four uniform distribution conditions and two additional correlated artifact conditions. Simulated attenuation parameters were obtained by randomly choosing criterion reliability and range restriction values, with replacement, from Table 3. 13 Selection of range restriction and criterion reliability values were sampled so that correlation between the artifacts conformed approximately to the values of 0 and .30, as appropriate to each condition. In practice, for any given iteration in which artifacts were to be uncorrelated, the artifact correlation could range from .05 to 0.05. When artifacts were to be correlated at .30, any given iteration could have artifacts correlated between .25 and .35. One thousand iterations were used for each of the 32 conditions. This allowed for the generation of sampling distributions for and as well as the 90 and 99 percent credibility value estimates. The sampling distributions allow for the assessment of the stability and accuracy of the estimates. Mean, standard deviation, and root mean squared error values were calculated over the 1,000 iterations for each condition. Root mean squared error values were computed by comparing estimated mean, randomeffects variance component and lower 90 and 99 percent credibility interval values to the known population values. 2 Population artifact values were obtained from Callendar et al. (1982). Given that each of the three methods employed in the present study used studylevel corrections, samplebased criterionreliability and range restriction values also had to be generated from sampling distributions based on the relevant population values. 52 13 Values were chosen with replacement as k values for the current study differed from those of Callendar et al. (1982). PAGE 64 Several restrictions were included in the program. First, given that correlation coefficients are bound between .0 and +1.0, the computer program imposed bounds on correlation coefficients both as they were generated and after they were corrected for artifacts. Any correlations greater than .99 or less than .99 were set to .99 or .99, respectively. Second, although negative REVC values are computationally possible, they are theoretically illogical. Thus, REVC values that fell below zero were set equal to zero. Third, in order to mimic realistic data conditions, a lower limit of N=10 was set for study sample size. Last, local criterion reliability values that were sampled below .10 were set to .10. The accuracy of the Monte Carlo program was checked by comparing the output values (empirical estimates) to the results expected on the basis of input parameters. Such a check was conducted to determine whether the program was handling attenuation due to artifacts correctly. The check suggested that the program was both choosing artifact values that were appropriately correlated and attenuating local population validity parameters correctly. Analysis. The SchmidtHunter (Hunter & Schmidt, 1990) multiplicative method for validity generalization was used in order to generate empirical estimates of the mean ( ) and variance () estimates as well as 90 and 99 percent credibility intervals and the lower bound values. Studylevel artifact corrections were used. As suggested by Schmidt and Hunter (1990), corrections for range restriction were done prior to corrections for attenuation due to criterion reliability. Distributions of the empirical estimates were compared to the true validity values to assess accuracy. Comparisons 2 53 PAGE 65 were made between values obtained in this analysis and results obtained by Callender et al. (1982) to ensure the accuracy of calculations. The HedgesVevea method was also used. The HedgesVevea method, like the SchmidtHunter method, is a randomeffects method of metaanalysis. The HedgesVevea method differs from the SchmidtHunter method in that r values are transformed to z values prior to analysis, inverse variance weights (rather than sample size weights) adjusted by the randomeffects variance component are used, and artifact corrections are typically not used. For the purposes of this analysis, resulting z values were back transformed into r values so that the lower bounds of the 90 and 99 percent credibility intervals obtained with the HedgesVevea method could be compared to those from the SchmidtHunter method. Additionally, consistent with results obtained by Hall and Brannick (2002), corrections for attenuation were used with the HedgesVevea methods. Unlike Hall and Brannick (2002), however, corrections were based on studylevel data. Consistent with corrections made in the SchmidtHunter method, corrections for range restriction were conducted prior to criterion reliability corrections. 54 The empirical Bayes procedure was also used to generate mean ( ) and variance () estimates as well as lower bound 90 and 99 percent credibility values. These values were compared with the true validity values. Unlike for the SchmidtHunter and HedgesVevea methods, calculation of tenth and first percentile values of the empirical Bayes estimates did not rely on known properties of the normal curve. For example, the 10 2 th percentile value (i.e., the lower bound of the 90 percent credibility interval) of the empirical Bayes adjusted coefficients was calculated by ranking values and then choosing the value that fell at the 10 th percentile of the ranked distribution. PAGE 66 Conversion of r values to z would present problems in empirical Bayes estimation. Sampling variance estimates are used as weights in empirical Bayes estimation procedures. The sampling variance for disattenuated local correlations in z is unknown. The sampling variance for disattenuated local correlations in r is known, however (e.g., Brannick & Hall, 2003). Thus the SchmidtHunter analysis provided a theoretically defensible basis for computing empirical Bayes estimates; the HedgesVevea technique did not. Specifically, empirical Bayes estimates were computed along the following lines. As part of the metaanalysis, each local study was disattenuated for range restriction and reliability, thus: 222iiiyyxxiiirmrrrrm (8) where m i is 1/u i or the reciprocal of range restriction (u is the ratio of the restricted standard deviation to the unrestricted standard deviation), r i is the local correlation, r xx and r yy are the reliabilities of the predictor and criterion, respectively. In practice, r xx was considered to be 1.0. The expected sampling variance of the disattenuated correlation is: 3222))((iiixxiyyyyxxieWNrrrrrrmVi (9) where 222iiiyyxxirmrrrW (10) and all other terms were defined previously. From the SchmidtHunter metaanalysis, estimates of the mean and variance of the underlying distribution of parameters, denoted and were already available. Subsequently, the empirical Bayes estimate was computed using the equation 2 55 PAGE 67 221111iiieieEBVV (11). In words, empirical Bayes estimate is a weighted average of the local study and the overall mean from the metaanalysis. The weights are the reciprocal of the expected sampling variance for the local study and the reciprocal of the estimated population variance for the global mean. Part Two The first part of the study was used to show whether the three techniques were better able to provide accurate estimates regarding transportability. The second part, using real data, was used to show whether use of the various techniques to estimate the lower credibility values is likely to matter in practice. Data Retrieval. Observed validity coefficients were obtained from published test validation metaanalyses and reviews that contain a large number of validity coefficients (Brown, 1981; McDaniel & Schmidt, 1985; Shannon, 1989; Tett et al., 1991). Validity coefficients obtained from Brown (1981), Shannon (1989) and Tett, et al. (1991) are based on personality and biodata measures while those collected from McDaniel and Schmidt (1985) are based on validity of training and experience ratings. Analysis. SchmidtHunter, HedgesVevea, and empirical Bayes methods were used to generate mean ( ) and variance () estimates as well as 90 percent credibility intervals and lower 90 percent credibility values. This part of the study differs from Part One in that data used to generate mean ( 2 ) and variance () estimates as well as 90 2 56 PAGE 68 percent credibility intervals and lower 90 percent credibility values is real rather than simulated. Whenever possible, studylevel data was used for corrections in each of the methods. Only Brown (1981), however, included information regarding both criterion reliability and range restriction, thus allowing for full corrections to be conducted. Both Shannon (1989) and Tett et al. (1991) included some information about criterion reliability, thus analyses using both uncorrected and partially corrected coefficients (using the mean of the criterion reliability values provided) were conducted for each data set. McDaniel and Schmidt (1985) did not include any artifact information so the analysis was based only on uncorrected coefficients. Accuracy Checks. Four of the archival sources proposed for use in the present study were used by Hall and Brannick (2002) to compare SchmidtHunter and HedgesVevea models. For Brown (1981), comparisons could be made with the mean and variance estimates calculated by Hall and Brannick (2002) using studylevel corrections. 14 For Shannon (1989), Tett et al. (1991), and McDaniel and Schmidt (1985), mean and variance estimates for analyses based on the uncorrected coefficients could be compared with results obtained by Hall and Brannick (2003). Results from the present study for those three data sets were not comparable for the purposes of checking accuracy, however, because Hall and Brannick (2002), used assumed artifact distributions for corrections whereas the present study only used studybased information for corrections. Hall and Brannick (2002), also used twotailed 95 percent credibility intervals, thus lower bound estimates were not comparable for the purpose of checking accuracy of calculations. 57 14 Hall and Brannick (2002) reported standard deviation, not variance. PAGE 69 Relevant mean and variance estimates were essentially equivalent between the present study and Hall and Brannick (2002), lending support to the accuracy of calculation in the present study. Estimates of the mean and variance ofwere equivalent between Hall and Brannick and the present study for the uncorrected Tett et al. (1991) data set. For the Brown (1981) data set, results differ slightly for the SchmidtHunter and HedgesVevea models using samplebased correction given that Hall and Brannick corrected the overall mean using mean artifact values whereas the present study corrected coefficients individually using studybased data prior to deriving an overall mean and variance estimate. i 15 Mean and variance estimates for the SchmidtHunter and HedgesVevea uncorrected analyses based on Shannons (1989) data set differ slightly due to a miskeyed entry in the Hall and Brannick (2002) data file. The discrepancies in estimates of the mean and REVC are in the direction expected resulting from the miskeyed entry. Results for the uncorrected SchmidtHunter and HedgesVevea analyses based on the McDaniel and Schmidt (1985) data set are also essentially equivalent. Minor disagreement in estimates of the mean and REVC Hall and Brannick (2002) and the present study are due to minor differences in the data culled from McDaniel and Schmidt (1985). 16 15 Results are also somewhat different than those reported by Brown (1981) due to the use of different SchmidtHunter methods. Both Hall and Brannick (2002) and the present study used Hunter and Schmidts (1990) multiplicative model while Brown (1981) used an older method developed by Schmidt and Hunter (1977). 58 16 Hall and Brannick (2002) data file included several coefficients that were not readily accessible in McDaniel and Schmidt (1985). PAGE 70 Chapter Nine Results Part One Results from the Monte Carlo simulation and the three types of analyses are presented in Tables 4 through 19 and Figures 3 through 10. Results are grouped by mean, variance and population distribution shape. In each figure, the known lower bound value is depicted by a horizontal line while estimates of the lower bound are denoted by points. Vertical lines extend one standard deviation in each direction from the lower bound estimates and root mean squared error values (RMSE) are provided in parentheses. Overall, results support the continued use of the SchmidtHunter multiplicative model in the randomeffects case even when the distribution of is not normal. As expected, estimates for all three methods tend to become more accurate as the number of studies (k) included in the metaanalysis increases. This is evident by comparing respective RMSE values when k=25 and k=150. All three methods resulted in estimates of the population mean that were slightly positively biased. This result is likely a function of the artifact corrections employed. i Empirical Bayes Estimates. Use of the empirical Bayes method does not normally produce superior estimates of the mean, REVC, or lower bound of the 90 percent credibility interval compared to the SchmidtHunter method under the conditions studied. Empirical Bayes method produces more consistent (lower SD) and accurate 59 PAGE 71 (lower RMSE) estimates of the REVC than does the SchmidtHunter method when k is small and the population REVC is small or zero. This effect reverses, however, as k increases or the population REVC increases. Empirical Bayes estimates of the lower 90 percent credibility value tend to be positively biased. As k increases, the positive bias increases when is small (.25) and decreases when is moderate (.50). SchmidtHunter Multiplicative Model. The SchmidtHunter method produced more accurate estimates than did the corrected HedgesVevea and empirical Bayes methods in almost every case. Estimates of REVC and lower bound values tended to be more accurate than the other methods as evidenced in lower RMSE values. However, SchmidtHunter estimates of the lower bound of the 90 percent credibility intervals are biased, depending on the shape of thedistribution. When the distribution of is flat with a low i i or normal, negatively skewed or uniform, the SchmidtHunter lower bound estimates tend to be positively biased resulting in conclusions regarding transportability that are too lenient. When is a constant, estimates tend to be negatively biased as is the case when the distribution of is flat and i is moderate. When the distribution is positively skewed, lower bound estimates tend to become more negatively biased as k increases. These results are consistent with those found by Callendar et al. (1982) using SchmidtHunters noninteractive model. i HedgesVevea Corrected Model. The corrected HedgesVevea model was the weakest of the three models tested. HedgesVevea estimates of the lower bound tended to be slightly to extremely negatively bias resulting in overly conservative conclusions regarding transportability. The HedgesVevea lower bound estimates were quite 60 PAGE 72 variable, however, as indicated by fairly large SD values for the lower bound estimates. As shown in Tables 14 through 17, HedgesVevea estimates of the lower bound were only more accurate than SchmidtHunter estimates when the distribution ofwas flat and i was small (.25) or thedistribution was negatively skewed. i Other Findings. A surprising finding that has not been mentioned in the literature regarding the shape of the distribution ofis that the lower bound of a onetailed 90 percent credibility interval differs little if the distribution is normal, flat, or uniform (assuming equal means and variances). As shown in Table 20, there is less than a .02 difference in lower bounds between the normal distribution and flat and uniform distributions (when ). Thus, as shown in Table 20, variance of the distribution ofseems to impact the true 90 percent credibility intervals lower bound value more than its shape. i 012.02 i 17 When comparing the lower bound of the normal distribution to the lower bounds of skewed distributions, the difference is only about .03. Larger differences in lower bound values are evident when one is looking at the lower bound of the 99 th percent credibility interval. Because it was possible that larger differences in underlying distribution would yield more informative comparisons among the techniques, further analyses were conducted comparing the three methods lower bound estimates for the 99 percent credibility interval. Consistent with results based on 90 percent credibility intervals, SchmidtHunter estimates of the lower bounds of 99 percent credibility intervals are generally more accurate than estimates made using the empirical Bayes or HedgesVevea methods. 61 17 This statement is limited to the distribution shapes examined in this study. Distributions of rho () more pathological in shape than the ones examined here may have very different lower bounds. i PAGE 73 Means, standard deviations and RMSE values across the 1,000 iterations for the 99 percent credibility intervals are presented in Tables 21 and 22. As indicated in the tables, SchmidtHunter lower bound estimates tend to be somewhat biased, positively or negatively, depending on the shape of the distribution and in somewhat different ways than the bias that exists for lower bounds of 90 percent credibility interval estimates. When was constant or the distribution ofwas positively skewed, flat, or uniform, SchmidtHunter lower bound estimates were negatively biased, leading to conservative conclusions regarding transportability. When the distribution was normal or negatively skewed, lower bound estimates were positively biased, leading to transportability conclusions that were too lenient. When the distribution was normal, lower bound estimates were slightly positively biased. HedgesVevea estimates of the lower bound were still negatively biased, resulting in exceptionally conservative conclusions regarding transportability. Empirical Bayes lower bound estimates remained superior to SchmidtHunter estimates when is a constant. Additionally, empirical Bayes estimates were somewhat superior to SchmidtHunter estimates in all cases in which the distribution of was uniform and in some cases in which the distribution ofwas flat. i i i i Part Two Results for analyses using published coefficients are presented in Table 23. As expected based on results from Part One, lower bounds of the 90 percent credibility intervals derived from the empirical Bayes procedures were consistently higher than estimates resulting from the other two methods suggesting that transportability conclusions reached from empirical Bayes analyses are too lenient in general. Lower 62 PAGE 74 bound values derived from SchmidtHunter and HedgesVevea analyses do not agree although SchmidtHunter lower bound values were not consistently higher or lower than respective HedgesVevea values. The difference in lower bounds for the SchmidtHunter and HedgesVevea analyses ranged from .01 and .06. Differences were evident for lower bounds for the three methods; differences ranged from less than .01 to over .11. Although differences in lower bounds were noted for the three different methods, actual conclusions regarding transportability were congruent across methods for all four of the included studies. Results of all methods suggested transportability was warranted based on data from Brown (1981), Shannon (1989), and Tett et al. (1991) but not based on coefficients in McDaniel and Schmidt (1985). When comparing mean values for the three methods, estimates were fairly consistent; differences ranged from almost 0 to .05. Both the empirical Bayes and HedgesVevea methods produced higher estimates than did the SchmidtHunter method. All three methods suggested that the average validity in all four studies was positive, but the three disagreed about the magnitude of the mean effects. In terms of REVC estimates, the HedgesVevea method produced higher estimates than did the empirical Bayes and SchmidtHunter methods. 63 PAGE 75 Chapter Ten Discussion Previous work (Callender et al., 1982; Kemery et al, 1987; Kemery et al., 1989) has focused on biases in estimates of the lower bound of the credibility interval when the shape of the distribution of true population validity coefficients is nonnormal. The present study sought to investigate whether a new method, empirical Bayes estimation, is superior to the SchmidtHunter method in estimating the lower bound. Additionally, the current study investigated potential biases of a corrected version of the HedgesVevea method, another widely used randomeffects metaanalytic method. Research Questions Question 1How accurate are estimates of and for the three methods when the underlying distribution has different shapes? Overall, results of this study showed that, as expected, accuracy of mean, REVC, and lower bound estimates derived from each of the methods tend to improve as k increases. Of the three methods compared, however, the multiplicative method developed by Hunter and Schmidt is generally superior to both the corrected HedgesVevea and empirical Bayes methods in terms of estimating the global mean and REVC, regardless of the shape of the distribution of population validity coefficients. The HedgesVevea and empirical Bayes methods both also produce fairly accurate estimates of 2 although their estimates tend not to be as accurate as those derived from the SchmidtHunter method. All the methods produced 64 PAGE 76 estimates of the mean that were too large on average. The HedgesVevea method produces estimates of the mean that were noticeably lager than the other two methods. The SchmidtHunter estimates of the REVC appear more dependent on k, the number of studies, than do the other two methods. None of the three procedures make any assumptions about the shape of the underlying distribution for computing the global mean and REVC. Both the SchmidtHunter and HedgesVevea methods make the assumption that the underlying distribution is normal to compute lower bound estimates. The empirical Bayes method makes no such assumption. The results showed the SchmidtHunter method generally produces the most accurate lower bound estimates, although the accuracy of the estimates does vary notably depending on properties of the distribution of. The corrected HedgesVevea model is the poorest of the three methods for estimating the lower credibility values. The lower bound estimates derived from the HedgesVevea procedure are slightly to extremely negatively bias depending on properties of the distribution of i Question 2What errors/biases regarding transportability are most likely for various shapes of thedistribution for each of the three methods? The type of bias evident for each of the methods regarding transportability conclusions was affected by shape of thedistribution and whether the 90 or 99 percent lower credibility value was used to assess transportability. All three methods produced negatively biased 90 and 99 percent lower bounds whenwas constant, resulting in transportability conclusions that were too strict. Whenwas normally distributed, however, both SchmidtHunter and ii i 65 PAGE 77 empirical Bayes methods produced positively biased (lenient) conclusions whereas the HedgesVevea method produced negatively biased conclusions for the 90 and 99 percent credibility values. When the distribution ofwas positively skewed results were more mixed. Estimates from the HedgesVevea method were negatively biased, as were most of the estimates from the SchmidtHunter method. The empirical Bayes procedure, on the other hand, produced positively biased 90 percent credibility values. Empirical Bayes credibility values for the 99 percentile were positive if i was moderate (.50) and negative when was small (.25). When the distribution ofwas negatively skewed, 90 and 99 percent credibility estimates derived from the SchmidtHunter and empirical Bayes methods were positively biased resulting in too lenient conclusions regarding validity transport. HedgesVevea estimates tended to be positively bias for 90 percent credibility intervals and negatively biased for the 99 percent credibility intervals. Hedges and Vevea estimates were also negatively biased for both flat and uniform distributions. Empirical Bayes 90 and 99 percent lower bound estimates, were positively biased for flat and uniform distributions. SchmidtHunter 90 percent lower bound estimates were also positively bias, but the 99 percent lower bound estimates were negatively bias. i 66 Question 3Is the empirical Bayes procedure more accurate than SchmidtHunter and HedgesVevea models for determining the lower bound of the credibility interval and thus accurate conclusions regarding transportability? Results of the current study suggest that the empirical Bayes procedure is not more accurate than the SchmidtHunter procedure for assessing transportability in most cases. When is constant, the empirical Bayes procedure is more accurate for estimating lower bounds of both 90 and 99 percent credibility intervals than is the SchmidtHunter procedure. The empirical Bayes PAGE 78 procedure also produces more accurate 99 percent lower credibility values whenis positively skewed or uniformly distributed. i Limitations of the Current Study There are several limitations of the current study. First, data simulated in Part One of the study was generated based on the assumption that data in local studies conform to a bivariate normal distribution. Previous research (Lancaster, 1957; Micceri, 1989), however, suggests that this assumption may not be valid. Future research should be conducted to determine how violation of the bivariate normal assumption affects results for each of the conditions and methods examined. Second, accuracy of RMSE values may be inadequate based on the use of 1,000 iterations for each condition (Robey & Barcikowski, 1992). The amount of error in RMSE values in the present study is unknown. In other words, while differences in RMSE values were noted, the exact magnitude of a difference in RMSE values that would be needed to represent a meaningful difference in results is unknown. Third, conclusion that the SchmidtHunter method is generally superior to the other two methods examined regardless of the shape of the distribution of infinite sample validity coefficients is limited to the shapes investigated in the present study. Population validity distributions more pathological than the ones examined here (e.g., the bimodal distribution described by Kemery et al. (1987) in which 30 percent of coefficients were 0 and 70 percent of coefficients were .60) will likely have true lower bound values that are very different than normal distributions with comparable means and REVCs. Use of the SchmidtHunter method in these cases will produce incorrect conclusions regarding transportability (Kemery et al., 1987; Kemery et al., 1989) potentially costing rather than saving organizations money by using selection 67 PAGE 79 instruments whose scores are invalid. Because of this, being able to model the distribution of population validities is essential to ensure accurate conclusions regarding transportability. Modeling the Shape of the Rho Distribution The current study, like other Monte Carlo simulation studies done regarding transportability, addresses questions regarding the consequences of violating the assumption thatis normally distributed. The current study, however, is not able to address what the shape of the distribution ofactually is. A program such as the NLMIX i i 18 program may be beneficial in the effort to model the distribution offor a given predictorcriterion relationship. NLMIX is a program developed by Davidian and Gallant (1991) that uses maximum likelihood methods to jointly estimate the fixed parameters of the nonlinear mixed effects model and the density of the randomeffects (p. ii). The NLMIX program assumes only that the density of the randomeffects is smooth, that is, that the distribution ofis continuous; the program does not force the resulting model to fit a line, the normal curve, or some other predetermined shape. i i While the NLMIX program could be helpful for this task, modeling the shape of the underlying distribution, however, is difficult given that multiple sources of noise (e.g., sampling error) mask the true magnitude of the relationship between a predictor and criterion. The effect of the noise on the observed validities is that a distribution of 68 18 NLMIX is available via anonymous FTP at ftp.econ.duke.edu In order to obtain the code and guide as well as the NPSOL program, go to the pub/arg/nlmix directory. PAGE 80 observed r values may show little resemblance to the distribution from which it was sampled. 19 Although the empirical Bayes estimates provided a method for estimating the lower bound that does not depend on assumed parameters, it is clear that in the present conditions the resulting estimates were less accurate than results based on the wrong model. That is, the SchmidtHunter estimate based on the assumption of the normal distribution actually worked better than the empirical Bayes estimates, even when the underlying distribution was not normal. Future research may be conducted that employees corrections to empirical Bayes methodology, such as those presented by Louis (1984). The use of Louiss (1984) corrected (ensemble) estimates may result in transportability conclusions that are as accurate as or more so than those derived using SchmidtHunter methods. Even when using the SchmidtHunter method to assess transportability, results may be biased and in an unknown direction if the shape of the distribution ofis unknown. If the lower bound value is positive but very close to zero, transportability may appear reasonable, but may actually not be defensible in a significant proportion of cases. A more conservative strategy is to not use zero as the cutoff for assessing whether transportability is reasonable. Another method would be to use Tchebycheffs inequality (Hayes, 1994) in transportability calculations. According to Tchebychecffs inequality, i 69 19 In Appendix B, simulated r values are shown for metaanalyses with different numbers of included studies (k). Population values were based on Kemery et al.s (1987) distribution in which 30 percent of population validities were 0 and 70 percent were .60. Observed r values reflect attenuation due to range restriction and criterion reliability as well as sampling error. While the population distribution consists of a distinct bimodal distribution with values only at 0 and .60, the distributions of r when k=25 and 150 appear fairly normal. Only when k is 250 do spikes suggesting the bimodal nature of the population distribution begin to appear. Metaanalyses rarely, however, include 250 or more studies. PAGE 81 even in the most pathological distribution, the 10 th percentile value will fall no more than 2.3 standard deviations from the mean. Thus, to be conservative in transportability conclusions, one could use 2.3 (rather than 1.28) when calculating the lower bound using the SchmidtHunter method. An alternative approach to modeling the underlying distribution ofis to make explicit use of moderators (e.g., James et al., 1992). If we can estimate the relations between values ofand values of moderators, and if we can estimate the frequency distribution of the moderators, then we can estimate the underlying distribution of. Of course, if we know what the moderators are and what their joint effects are, then we can compute directly those situations in which validity is expected to be too low to support testing. i i i Conclusions and Recommendations The Monte Carlo comparison of three randomeffects techniques showed that the SchmidtHunter method typically produced the most accurate estimates of the mean, randomeffects variance component and lower bound of the credibility interval. Further, the SchmidtHunter estimates of the lower bound were often fairly accurate even when the underlying distribution was not normal. This was due in part to the surprisingly similar lower bound values across different underlying distributions for the onetail 90 percent interval. One possibility is that the underlying distributions are not pathological enough to produce poor estimates. Additional research should be conducted to determine whether more pathological distributions could alter the conclusions reached in this study. 70 The reanalysis of published metaanalyses showed results in line with those of the Monte Carlo portion of the study. Specifically, the empirical Bayes estimate of the lower PAGE 82 bound was higher (more lenient or liberal) in each of the four datasets than the lower bound estimates from the other two techniques. Such a result is likely unless there are large sample (large N) effect sizes that are distant from the global mean. Such effect sizes will remain outliers in the empirical Bayes estimates and thus could provide lower bound estimates that are relatively low. Although the SchmidtHunter method appears to be the best method available at the moment for estimating the global mean, REVC and lower bound, this does not mean that the SchmidtHunter estimates are always near their true population values nor that inferences based in the estimates are necessarily correct. Continued research aimed at a better understanding of the nature of the underlying distribution ofin test validation appears warranted. i 71 PAGE 83 REFERENCES Algera, J. A., Jansen, P. G., Roe, R. A., & Vijn, P. (1984). Validity generalization: Some critical remarks on the SchmidtHunter procedure. Journal of Occupational Psychology, 57, 197210. American Educational Research Association. (1999). Standards for educational and psychological testing. Washington, DC: Author. Birkeland, S., Manson, T., Kisamore, J., Brannick, M. & Liu, Y. (2003). A metaanalytic investigation of job applicant faking on personality measures. Manuscript in preparation, University of South Florida. Brannick, M. T. (2001). Implications of empirical Bayes metaanalysis for test validation. Journal of Applied Psychology, 86, 468480. Brannick, M. T., & Hall, S. M. (2003). Validity generalization from a Bayesian perspective. In K. R. Murphy (Ed.) Validity Generalization: A Critical Review. Mahwah, NJ: Erlbaum. Brown, S. H. (1981) Validity generalization and situational moderation in the life insurance industry. Journal of Applied Psychology, 66, 664670. Callender, J. C., & Osburn, H. G. (1980). Development and test of a new model for validity generalization. Journal of Applied Psychology, 65, 543558. Callender, J. C., & Osburn, H. G. (1981). Testing the constancy of validity with computer generated sampling distributions of the multiplicative model variance estimate: Results for petroleum industry validation research. Journal of Applied Psychology, 66, 274281. Callender, J. C., Osburn, H. G., Greener, J. M., & Ashworth, S. (1982). Multiplicative validity generalization model: Accuracy of estimates as a function of sample size and mean, variance, and shape of distribution of true validities. Journal of Applied Psychology, 67, 859867. Cascio, W. F. (1998). Applied Psychology in Human Resource Management (5 th ed.). PrenticeHall: Upper Saddle River: NJ. Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155159. Cooper, H. M., & Rosenthal, R. (1980). Statistical versus traditional procedures for summarizing research findings. Psychological Bulletin, 87(3), 442449. Cronbach, L. J. (1970). Essentials of psychological testing (3 rd ed.). New York, NY: Harper & Row. 72 PAGE 84 Cronbach, L. J. (1984). Essentials of psychological testing (4 th ed.). New York, NY: Harper & Row. Davidian, M., & Gallant, R. (1991). NLMIX: A program for maximum likelihood estimation of the nonlinear mixed effects model with a smooth random effects density. [Computer software and manual]. Retrieved from ftp.econ.duke.edu (pub/arg/nlmix directory). Ferguson, L. W. (1951). Management quality and its effect on selection test validity. Personnel Psychology, 4, 141150. Field, A. P. (2001). Metaanalysis of correlation coefficients: A Monte Carlo comparison of fixedand randomeffects methods. Psychological Methods, 6, 161180. Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., III, & Bentson, C. (1987). Metaanalysis of assessment center validity. Journal of Applied Psychology, 72(3), 493511. Ghiselli, E. E. (1966). The validity of occupational aptitude tests. New York: Wiley. Glass, G. V. (1976). Primary, secondary, and metaanalysis of research. Educational Researcher, 5, 38. Guion, R. M. (1965). Personnel Testing. New York: McGrawHill. Guion, R. M., & Gottier, R. F. (1965). Validity of personality measures in personnel selection. Personnel Psychology, 18, 135164. Gutenburg, R. L., Arvey, R. D., Osburn, H. G., & Jeanneret, P. R. (1983). Moderating effects of decisionmaking/informationprocessing job dimensions on test validities. Journal of Applied Psychology, 68, 602608. Hall, S. M., & Brannick, M. T. (2002). Comparison of two randomeffects methods of metaanalysis. Journal of Applied Psychology, 87, 377389. Hays, W. L. (1994). Statistics (5 th ed.). Fort Worth: Harcourt College Publishers. Hedges, L. V. (1992). Metaanalysis. Journal of Educational Statistics, 17, 279296. Hedges, L. V. & Vevea, J. L. (1998). Fixedand randomeffects models in metaanalysis. Psychological Methods, 3, 486504. Hollenbeck, J. R. & Whitener, E. M. (1988). Criterionrelated validation for small sample contexts: An integrated approach to synthetic validity. Journal of Applied Psychology, 73, 536544. 73 PAGE 85 Hunter, J. E., & Schmidt, F. L. (1990). Methods of metaanalysis: Correcting error and bias in research findings. Newbury Park, CA: Sage. Hunter, J. E., & Schmidt, F. L. (2000). Fixedeffects vs. randomeffects metaanalysis models: Implications for cumulative research knowledge. International Journal of Selection and Assessment, 8, 275292. James, L. R., Demaree, R. G., Mulaik, S. A., & Ladd, R. T. (1992). Validity generalization in the context of situational models. Journal of Applied Psychology, 77, 314. James, L. R., Demaree, R. G., Mulaik, S. A., & Mumford, M. D. (1988). Validity generalization: Rejoinder to Schmidt, Hunter, and Raju (1988). Journal of Applied Psychology, 73, 673678. Johnson, B. T., Mullen, B., and Salas, E. (1995). Comparison of three major metaanalytic approaches. Journal of Applied Psychology, 80, 94106. Kemery, E. R., Mossholder, K. W., & Dunlap. (1989). Metaanalysis and moderator variables: A cautionary note on transportability. Journal of Applied Psychology, 74, 168170. Kemery, E. R., Mossholder, K. W., & Roth, L. (1987). The power of the Schmidt and Hunter additive model of validity generalization. Journal of Applied Psychology, 72, 3037. Kisamore, J. L., & Brannick, M. T. (2003). Pygmalion in organizations: An illustration of the consequences of the metaanalysts choices. University of South Florida. Manuscript under review. Landy, F. J. (2003). Validity generalization: Then and now. In K. R. Murphy (Ed.) Validity Generalization: A Critical Review. Mahwah, NJ: Erlbaum. Lancaster, H. O. (1957). Some properties of the bivariate normal distribution considered in the form of a contingency table. Biometrika, 44, 289292. Levine, E. L., Spector, P. E., Menon, S., Naraynan, L., & CannonBowers, J. (1996). Validity generalization for cognitive, psychomotor, and perceptual tests of craft jobs in the utility industry. Human Performance, 9, 122. Lipsey, M. W., & Wilson, D. B. (2001). Practical metaanalysis. London: Sage. Louis, T. A. (1984). Estimating a population of parameter values using Bayes and empirical Bayes methods. Journal of the American Statistical Association, 79, 393398. 74 PAGE 86 McDaniel, M. A., & Schmidt, F. L. (1985). A metaanalysis of the validity of training and experience ratings in personnel selection. Washington, DC: U.S. Office of Personnel Management, Office of Staffing Policy, Examining Policy Analysis Division. Micceri, T. (1989). The unicorn, the normal curve, and other improbably creatures. Psychological Bulletin, 105, 156166. Murphy, K. R. (2003). The logic of validity generalization. In K. R. Murphy (Ed.) Validity Generalization: A Critical Review. Mahwah, NJ: Erlbaum. Murphy, K. R. & Newman, D. A. (2003). The past, present and future of validity generalization. In K. R. Murphy (Ed.) Validity Generalization: A Critical Review. Mahwah, NJ: Erlbaum. National Research Council (1992). Combining information: Statistical issues and opportunities for research. Washington, DC: National Academy Press. Oswald, F. L. (1999). On deriving validity generalization and situational specificity from metaanalysis: A conceptual review and some empirical findings (Doctoral dissertation, University of Minnesota, 1999). Dissertation Abstracts International, 60, 399. Oswald, F. L., & Johnson, J. W. (1998). On the robustness, bias, and stability of statistics from metaanalysis of correlation coefficients: Some initial Monte Carlo findings. Journal of Applied Psychology, 83, 164178. Overton, R. C. (1998). A comparison of fixedeffects and mixed (randomeffects) models for metaanalysis tests of moderator variable effects. Psychological Methods, 3, 354379. Paese, P.W. (1990). Correction to Paese and Switzer. Journal of Applied Psychology, 75, 234. Paese, P. W., & Switzer, F. S., III (1988). Validity generalization and hypothetical reliability distributions: A test of the SchmidtHunter Procedure. Journal of Applied Psychology, 73, 267274. Pearlman, K., Schmidt, F. L., & Hunter, J. E. (1980). Validity generalization results for tests used to predict job proficiency and training success in clerical occupations. Journal of Applied Psychology, 71, 302310. Robey, R. R., & Barcikowski, R. S. (1992). Type I error and the number of iterations I Monte Carlo studies of robustness. British Journal of Mathematical and Statistical Psychology, 45, 283288. 75 PAGE 87 Sackett, P. R., Harris, M. M., & Orr, J. M. (1986). On seeking moderator variables in the metaanalysis of correlational data: A Monte Carlo investigation of statistical power and resistance to Type I error. Journal of Applied Psychology, 71, 302310. Sackett, P. R., Schmitt, N., Tenopyr, M. L., Kehoe, J., & Zedeck, S. (1985). Commentary on forty questions about validity generalization and metaanalysis. Personnel Psychology, 38, 697798. SAS Institute. (1990). SAS/IML software: Usage and reference (Version 6). Cary, NC: Author. Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529540. Schmidt, F. L., & Hunter, J. E. (1999). Comparison of three metaanalysis methods revisited: An analysis of Johnson, Mullen, and Salas (1995). Journal of Applied Psychology, 84, 144148. Schmidt, F. L., & Hunter, J. E. (2003). History, development, evolution, and impact of validity generalization and metaanalysis methods, 19752001. In K. R. Murphy (Ed.) Validity Generalization: A Critical Review. Mahwah, NJ: Erlbaum. Schmidt, F. L., Hunter, J. E., & Pearlman, K. (1981). Task differences as moderators of aptitude test validity in selection: A red herring. Journal of Applied Psychology, 66, 166185. Schmidt, F. L., Hunter, J. E., Pearlman, K., & Hirsh, H. R. (1985). Forty questions about validity generalization and metaanalysis. Personnel Psychology, 38, 697798. Schmidt, F. L., Hunter, J. E., & Urry, V. W. (1976). Statistical power in criterionrelated validity studies. Journal of Applied Psychology,61, 473485. Schmidt, F. L., Law, K., Hunter, J. E., Rothstein, H. R., Pearlman, K., & McDaniel, M. (1993). Refinements in validity generalization methods: Implications for the situational specificity hypothesis. Journal of Applied Psychology, 78, 312. Shadish, W. R., & Haddock, C. K. (1994). Combining estimates of effect size. In H. Cooper and L. V. Hedges (Eds.) The handbook of research synthesis (pp. 261281). New York: Russell Sage Foundation. Shannon, P. R. (1989). An examination of the generalization of validity coefficients for personality and biographical inventories. Dissertation Abstracts International, 50, 775776. 76 PAGE 88 Spector, P. E., & Levine, E. L. (1987). Metaanalysis for integrating study outcomes: A Monte Carlo study of its susceptibility to Type I and Type II errors. Journal of Applied Psychology, 72, 39. Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A metaanalytic review. Personnel Psychology, 44, 703741. Thomas, A. L. (1997). Accounting for correlated artifacts and true validity in validity generalization procedures: An extension of model 1 for assessing validity generalization (Doctoral dissertation, Georgia Institute of Technology, 1997). Dissertation Abstracts International, 59, 1401. Thomas, H. (1990). A likelihood based model for validity generalization. Journal of Applied Psychology, 75, 1320. Vinchur, A. J., Schippmann, J. S., Switzer, F. S., III, & Roth, P. L. (1998). A metaanalytic review of predictors of job performance for salespeople. Journal of Applied Psychology, 83(4), 586597. Wanous, J. P., Sullivan, S. E., & Malinak, J. (1989). The role of judgment calls in metaanalysis. Journal of Applied Psychology, 74(2), 259264. Whitener, E. M. (1990). Confusion of confidence intervals and credibility intervals in metaanalysis. Journal of Applied Psychology, 75, 315321. 77 PAGE 89 Table 1 Simulated Conditions Constant 00.02 Normalmoderate 012.02 Normalmoderate 012.02 Normalwide 034.02 Positively Skewed 012.02 Negatively Skewed 012.02 Flat 012.02 Uniform 012.02 Artifacts Uncorrelated Uncorrelated Correlated Uncorrelated Uncorrelated Uncorrelated Uncorrelated Uncorrelated k=25 k=25 k=25 k=25 k=25 k=25 k=25 k=25 =0.25 k=150 k=150 k=150 k=150 k=150 k=150 k=150 k=150 k=25 k=25 k=25 k=25 k=25 k=25 k=25 k=25 =0.50 k=150 k=150 k=150 k=150 k=150 k=150 k=150 k=150 Note: N is a random variable. 78 PAGE 90 Table 2 True Validity Values Used by Callender et al. (1982) NormalWide NormalModerate Flat Positively Skewed Negatively Skewed Uniform 0.90 1 0.74 1 0.70 1 0.82 1 0.61 3 .304 1 0.85 1 0.71 1 0.69 1 0.78 1 0.60 7 .312 1 0.80 2 0.68 2 0.68 1 0.74 1 0.58 10 .320 1 0.75 2 0.65 2 0.67 1 0.70 2 0.56 6 .328 1 0.70 3 0.62 3 0.66 1 0.66 2 0.53 5 .336 1 0.65 4 0.59 4 0.65 1 0.62 2 0.50 4 .344 1 0.60 4 0.56 4 0.64 1 0.58 3 0.46 3 .352 1 0.55 5 0.53 5 0.63 1 0.54 3 0.42 3 .360 1 0.50 6 0.50 6 0.62 1 0.50 4 0.38 2 .368 1 0.45 5 0.47 5 0.61 1 0.47 5 0.34 2 .376 1 0.40 4 0.44 4 0.60 1 0.44 6 0.30 2 .384 1 0.35 4 0.41 4 0.59 1 0.42 10 0.26 1 .392 1 0.30 3 0.38 3 0.58 1 0.40 7 0.22 1 .400 1 0.25 2 0.35 2 0.57 1 0.39 3 0.18 1 .408 1 0.20 2 0.32 2 0.56 1 .416 1 0.15 1 0.29 1 0.55 2 .424 1 0.10 1 0.26 1 0.54 2 .432 1 0.53 2 .440 1 0.52 2 .448 1 0.51 2 .456 1 0.50 2 .464 1 0.49 2 .472 1 0.48 2 .480 1 0.47 2 .488 1 0.46 2 .496 1 0.45 2 .504 1 0.44 1 .512 1 0.43 1 .520 1 0.42 1 .528 1 0.41 1 .536 1 0.40 1 .544 1 0.39 1 .552 1 0.38 1 .560 1 0.37 1 .568 1 0.36 1 .576 1 0.35 1 .584 1 0.34 1 .592 1 0.33 1 .600 1 0.32 1 .608 1 0.31 1 .616 1 0.30 1 .624 1 .632 1 .640 1 .648 1 .656 1 .664 1 .672 1 .680 1 .688 1 .696 1 Note: Values are for 50.0 ; 25.0 values can be obtained by subtracting 0.25 from values above. 79 PAGE 91 Table 3 Artifact Distributions Criterion Unreliability yy Frequency 0.90 1 0.85 2 0.80 3 0.75 4 0.70 5 0.65 6 0.60 8 0.55 6 0.50 5 0.45 4 0.40 3 0.35 2 0.30 1 Range Restriction U Frequency 1.000 3 0.701 5 0.649 8 0.603 9 0.559 9 0.515 8 0.468 5 0.411 3 Note: Artifact distributions are the same as those presented in Callender et al. (1982). 80 PAGE 92 Table 4 Rho is Equal to .25 and is Constant () 000.2 Known SH HV EB k=25 Mean .2500 .2526 .2596 .2547 SD .0341 .0360 .0345 Mean RMSE .0342 .0372 .0348 Mean .0000 .0028 .0011 SD .0048 .0025 REVC RMSE .0055 .0027 Mean .2500 .2124 .1888 .2321 SD .0642 .0772 .0472 Lower Bound RMSE .0744 .0985 .0504 k=150 Mean .2500 .2507 .2583 .2521 SD .0141 .0146 .0141 Mean RMSE .0141 .0168 .0142 Mean .0000 .0015 .0003 SD .0022 .0007 REVC RMSE .0027 .0008 Mean .2500 .2181 .1831 .2409 SD .0407 .0506 .0207 Lower Bound RMSE .0517 .0839 .0226 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 81 PAGE 93 Table 5 Rho is Equal to .50 and Variance is Constant () 000.2 Known SH HV EB k=25 Mean .5000 .5020 .5212 .5047 SD .0341 .0382 .0341 Mean RMSE .0341 .0437 .0344 Mean .0000 .0028 .0011 SD .0048 .0022 REVC RMSE .0055 .0025 Mean .5000 .4620 .3789 .4840 SD .0646 .0972 .0449 Lower Bound RMSE .0749 .1552 .0476 k=150 Mean .5000 .5016 .5191 .5035 SD .0127 .0143 .0128 Mean RMSE .0128 .0239 .0133 Mean .0000 .0013 .0003 SD .0019 .0006 REVC RMSE .0022 .0007 Mean .5000 .4723 .3699 .4932 SD .0375 .0426 .0179 Lower Bound RMSE .0466 .1369 .0191 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 82 PAGE 94 Table 6 Rho Distribution is Normal () with a Mean of .25 012.2 Known SH HV EB k=25 Mean .2500 .2546 .2637 .2598 SD .0429 .0441 .0432 Mean RMSE .0431 .0462 .0442 Mean .0120 .0115 .0060 SD .0107 .0074 REVC RMSE .0108 .0095 Mean .1096 .1396 .1083 .1820 SD .0872 .0929 .0777 Lower Bound RMSE .0922 .0929 .1062 k=150 Mean .2500 .2523 .2640 .2598 SD .0179 .0188 .0183 Mean RMSE .0181 .0234 .0207 Mean .0120 .0127 .0057 SD .0052 .0034 REVC RMSE .0052 .0071 Mean .1096 .1116 .0759 .1810 SD .0370 .0407 .0326 Lower Bound RMSE .0374 .0529 .0784 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 83 PAGE 95 Table 7 Rho Distribution is Normal () with a Mean of .50 012.2 Known SH HV EB k=25 Mean .5000 .5062 .5343 .5122 SD .0417 .0465 .0402 Mean RMSE .0421 .0578 .0420 Mean .0120 .0117 .0058 SD .0102 .0062 REVC RMSE .0102 .0088 Mean .3596 .3869 .2913 .4369 SD .0832 .1027 .0706 Lower Bound RMSE .0875 .1233 .1047 k=150 Mean .5000 .5058 .5324 .5137 SD .0166 .0188 .0160 Mean RMSE .0176 .0375 .0210 Mean .0120 .0122 .0056 SD .0042 .0025 REVC RMSE .0042 .0069 Mean .3596 .3667 .2794 .4346 SD .0317 .0415 .0281 Lower Bound RMSE .0325 .0903 .0801 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 84 PAGE 96 Table 8 Rho Distribution is Normal () with a Mean of .25 and Correlated Artifacts 012.2 Known SH HV EB k=25 Mean .2500 .2548 .2654 .2603 SD .0449 .0467 .0451 Mean RMSE .0451 .0491 .0462 Mean .0120 .0121 .0063 SD .0107 .0072 REVC RMSE .0107 .0092 Mean .1096 .1355 .0986 .1793 SD .0888 .0942 .0795 Lower Bound RMSE .0925 .0948 .1057 k=150 Mean .2500 .2524 .2653 .2601 SD .0186 .0195 .0188 Mean RMSE .0188 .0247 .0213 Mean .0120 .0127 .0058 SD .0051 .0035 REVC RMSE .0052 .0072 Mean .1096 .1117 .0724 .1820 SD .0383 .0432 .0342 Lower Bound RMSE .0384 .0570 .0801 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 85 PAGE 97 Table 9 Rho Distribution is Normal () with a Mean of .50 and Correlated Artifacts. 012.2 Known SH HV EB k=25 Mean Mean .5000 .5057 .5329 .5116 SD .0395 .0439 .0380 RMSE .0399 .0549 .0397 Mean .0120 .0115 .0058 SD .0094 .0058 REVC RMSE .0094 .0085 Mean .3596 .3861 .2950 .4353 SD .0802 .0974 .0670 Lower Bound RMSE .0844 .1169 .1010 k=150 Mean .5000 .5065 .5322 .5143 SD .0165 .0184 .0158 Mean RMSE .0177 .0371 .0213 Mean .0120 .0124 .0057 SD .0041 .0024 REVC RMSE .0041 .0067 Mean .3596 .3661 .2848 .4338 SD .0299 .0387 .0269 Lower Bound RMSE .0306 .0842 .0789 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 86 PAGE 98 Table 10 Rho Distribution is Wide Normal () with a Mean of .25 034.2 Known SH HV EB k=25 Mean .2500 .2563 .2762 .2645 SD .0589 .0629 .0578 Mean RMSE .0593 .0681 .0596 Mean .0340 .0314 .0213 SD .0184 .0156 REVC RMSE .0186 .0201 Mean .0137 .0422 .0147 .0874 SD .0990 .1111 .1098 Lower Bound RMSE .1030 .1146 .1322 k=150 Mean .2500 .2566 .2739 .2647 SD .0237 .0253 .0236 Mean RMSE .0246 .0348 .0277 Mean .0340 .0352 .0230 SD .0077 .0066 REVC RMSE .0078 .0128 Mean .0137 .0176 .0357 .0868 SD .0367 .0445 .0426 Lower Bound RMSE .0369 .0665 .0846 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 87 PAGE 99 Table 11 Rho Distribution is Wide Normal () with a Mean of .50 034.2 Known SH HV EB k=25 Mean .5000 .5145 .5581 .5159 SD .0560 .0646 .0516 Mean RMSE .0578 .0868 .0539 Mean .0340 .0325 .0195 SD .0170 .0123 REVC RMSE .0170 .0190 Mean .2637 .2924 .1445 .3438 SD .0868 .1337 .0945 Lower Bound RMSE .0914 .1790 .1238 k=150 Mean .5000 .5151 .5595 .5153 SD .0215 .0246 .0195 Mean RMSE .0262 .0644 .0247 Mean .0340 .0355 .0205 SD .0067 .0046 REVC RMSE .0069 .0143 Mean .2637 .2748 .1175 .3440 SD .0329 .0615 .0356 Lower Bound RMSE .0347 .1586 .0879 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 88 PAGE 100 Table 12 Rho Distribution is Positively Skewed () with a Mean of .25 012.2 Known SH HV EB k=25 Mean .2555 .2589 .2701 .2643 SD .0451 .0480 .0460 Mean RMSE .0452 .0502 .0468 Mean .0116 .0114 .0061 SD .0113 .0079 REVC RMSE .0113 .0096 Mean .1407 .1459 .1034 .1898 SD .0861 .0976 .0761 Lower Bound RMSE .0862 .1045 .0905 k=150 Mean .2555 .2592 .2713 .2669 SD .0182 .0191 .0186 Mean RMSE .0185 .0248 .0218 Mean .0116 .0125 .0056 SD .0050 .0033 REVC RMSE .0051 .0068 Mean .1407 .1196 .0799 .1896 SD .0357 .0389 .0310 Lower Bound RMSE .0414 .0722 .0578 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 89 PAGE 101 Table 13 Rho Distribution is Positively Skewed () with a Mean of .50 012.2 Known SH HV EB k=25 Mean .5055 .5092 .5442 .5155 SD .0412 .0506 .0403 Mean RMSE .0414 .0637 .0415 Mean .0116 .0123 .0063 SD .0102 .0062 REVC RMSE .0102 .0082 Mean .3908 .3863 .2627 .4385 SD .0777 .1022 .0643 Lower Bound RMSE .0777 .1639 .0800 k=150 Mean .5055 .5137 .5415 .5219 SD .0169 .0192 .0161 Mean RMSE .0188 .0408 .0230 Mean .0116 .0123 .0058 SD .0043 .0025 REVC RMSE .0043 .0063 Mean .3908 .3741 .2751 .4436 SD .0292 .0415 .0253 Lower Bound RMSE .0336 .1229 .0585 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 90 PAGE 102 Table 14 Rho Distribution is Negatively Skewed () with a Mean of .25 012.2 Known SH HV EB k=25 Mean .2486 .2533 .2632 .2584 SD .0448 .0458 .0446 Mean RMSE .0450 .0480 .0456 Mean .0118 .0112 .0057 SD .0104 .0072 REVC RMSE .0104 .0094 Mean .0805 .1390 .1057 .1812 SD .0925 .0968 .0830 Lower Bound RMSE .1094 .1000 .1304 k=150 Mean .2486 .2516 .2616 .2582 SD .0179 .0187 .0181 Mean RMSE .0182 .0227 .0205 Mean .0118 .0120 .0052 SD .0050 .0031 REVC RMSE .0050 .0074 Mean .0805 .1151 .0883 .1815 SD .0396 .0405 .0356 Lower Bound RMSE .0526 .0412 .1070 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 91 PAGE 103 Table 15 Rho Distribution is Negatively Skewed () with a Mean of .50 012.2 Known SH HV EB k=25 Mean .4987 .5062 .5254 .5118 SD .0380 .0405 .0369 Mean RMSE .0387 .0485 .0391 Mean .0118 .0106 .0051 SD .0088 .0054 REVC RMSE .0089 .0086 Mean .3306 .3905 .3321 .4350 SD .0814 .0842 .0727 Lower Bound RMSE .1010 .0842 .1272 k=150 Mean .4987 .5038 .5283 .5115 SD .0167 .0185 .0162 Mean RMSE .0175 .0349 .0207 Mean .0118 .0117 .0051 SD .0045 .0025 REVC RMSE .0045 .0072 Mean .3306 .3681 .2952 .4352 SD .0358 .0400 .0316 Lower Bound RMSE .0518 .0534 .1092 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 92 PAGE 104 Table 16 Rho Distribution is Flat () with a Mean of .25 012.2 Known SH HV EB k=25 Mean .2502 .2546 .2652 .2601 SD .0422 .0441 .0425 Mean RMSE .0424 .0465 .0437 Mean .0111 .0112 .0059 SD .0102 .0072 REVC RMSE .0102 .0089 Mean .0978 .1399 .1040 .1837 SD .0866 .0925 .0775 Lower Bound RMSE .0963 .0926 .1157 k=150 Mean .2502 .2525 .2637 .2596 SD .0178 .0187 .0181 Mean RMSE .0179 .0230 .0204 Mean .0111 .0115 .0050 SD .0047 .0030 REVC RMSE .0048 .0069 Mean .0978 .1185 .0865 .1859 SD .0360 .0375 .0317 Lower Bound RMSE .0416 .0391 .0937 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 93 PAGE 105 Table 17 Rho Distribution is Flat () with a Mean of .50 012.2 Known SH HV EB k=25 Mean .5005 .5052 .5323 .5111 SD .0387 .0451 .0378 Mean RMSE .0390 .0552 .0392 Mean .0112 .0105 .0051 SD .0090 .0054 REVC RMSE .0090 .0081 Mean .3478 .3922 .2988 .4404 SD .0784 .0970 .0645 Lower Bound RMSE .0901 .1086 .1128 k=150 Mean .5005 .5065 .5284 .5143 SD .0153 .0165 .0147 Mean RMSE .0164 .0325 .0202 Mean .0112 .0115 .0052 SD .0040 .0024 REVC RMSE .0040 .0064 Mean .3478 .3716 .3010 .4382 SD .0315 .0383 .0275 Lower Bound RMSE .0395 .0604 .0945 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 94 PAGE 106 Table 18 Rho Distribution is Uniform () with a Mean of .25 012.2 Known SH HV EB k=25 Mean .2500 .2529 .2676 .2588 SD .0463 .0507 .0468 Mean RMSE .0464 .0536 .0476 Mean .0128 .0135 .0075 SD .0124 .0089 REVC RMSE .0124 .0103 Mean .0933 .1289 .0745 .1728 SD .0961 .1090 .0851 Lower Bound RMSE .1024 .1105 .1164 k=150 Mean .2500 .2532 .2649 .2607 SD .0186 .0198 .0191 Mean RMSE .0189 .0248 .0219 Mean .0128 .0135 .0062 SD .0057 .0037 REVC RMSE .0057 .0076 Mean .0933 .1079 .0727 .1776 SD .0390 .0413 .0344 Lower Bound RMSE .0416 .0414 .0911 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 95 PAGE 107 Table 19 Rho Distribution is Uniform () with a Mean of .50 012.2 Known SH HV EB k=25 Mean .5001 .5073 .5363 .5132 SD .0411 .0468 .0401 Mean RMSE .0418 .0592 .0422 Mean .0128 .0124 .0063 SD .0099 .0063 REVC RMSE .0100 .0091 Mean .3433 .3821 .2836 .4321 SD .0805 .0975 .0686 Lower Bound RMSE .0894 .1142 .1123 k=150 Mean .5001 .5053 .5314 .5135 SD .0160 .0178 .0156 Mean RMSE .0169 .0360 .0206 Mean .0128 .0130 .0073 SD .0042 .0293 REVC RMSE .0042 .0298 Mean .3433 .3610 .2813 .4298 SD .0305 .0397 .0280 Lower Bound RMSE .0353 .0735 .0909 Note: All values are reported in terms of r. HV REVC values could not be backtransformed from z into r and thus were omitted. 96 PAGE 108 Table 20 Lower Bound Values of Normal and NonNormal Distributions Constant Normal Positively Skewed Negatively Skewed Flat Uniform Percentile 012.02 10 th .5000 .3593 .3907 .3311 .3477 .3431 5 th .5000 .3197 .3827 .2726 .3215 .3235 1 st .5000 .2445 .3765 .2034 .3003 .3079 034.02 10 th .5000 .2636 .2948 .2145 N/A .2391 5 th .5000 .1968 .2796 .1325 N/A .2071 1 st .5000 .0721 .2679 .0459 N/A .1815 97 PAGE 109 Table 21 Lower Bound of the 99 Percent Confidence Interval when Mean Rho=.25 Constant 00.02 Normalmoderate 012.02 Normalmoderate 012.02 Normalwide 034.02 Positively Skewed 012.02 Negatively Skewed 012.02 Flat 012.02 Uniform 012.02 Artifacts Uncorrelated Uncorrelated Correlated Uncorrelated Uncorrelated Uncorrelated Uncorrelated Uncorrelated Known .2500 .0048 .0048 .1790 .1265 .0468 .0503 .0579 Mean .1844 .0042 .0010 .1710 .0072 .0030 .0074 .0082 SD .0739 .0599 .0624 .0571 .0642 .0639 .0668 .0547 SH RMSE .0988 .0599 .0625 .0576 .1355 .0810 .0793 .0859 Mean .0972 .0787 .0851 .2709 .0800 .0630 .0748 .0733 SD .0939 .6730 .0703 .0731 .0749 .0697 .0724 .0609 HV RMSE .1793 .0999 .0988 .1175 .2196 .0715 .1446 .1447 Mean .2237 .0761 .0789 .1255 .1006 .0753 .0913 .0781 SD .0406 .0767 .0800 .0973 .0711 .0802 .0817 .0692 EB RMSE .0484 .1115 .1159 .1110 .0756 .1461 .0914 .0720 98 PAGE 110 Table 22 Lower Bound of the 99 Percent Confidence Interval when Mean Rho=.50 Constant 00.02 Normalmoderate 012.02 Normalmoderate 012.02 Normalwide 034.02 Positively Skewed 012.02 Negatively Skewed 012.02 Flat 012.02 Uniform 012.02 Artifacts Uncorrelated Uncorrelated Correlated Uncorrelated Uncorrelated Uncorrelated Uncorrelated Uncorrelated Known .5000 .2452 .2452 .0710 .3765 .2032 .3002 .3079 Mean .4483 .2509 .2522 .0796 .2574 .2549 .2593 .2434 SD .0634 .0498 .0512 .0503 .0477 .0529 .0493 .0469 SH RMSE .0818 .0501 .0516 .0510 .1283 .0740 .0641 .0798 Mean .2347 .0384 .0432 .2832 .0047 .0909 .0669 .0496 SD .0804 .0826 .0796 .1077 .0874 .0733 .0781 .0770 HV RMSE .2772 .2226 .2171 .3702 .3819 .1342 .2460 .2695 Mean .4846 .3572 .3556 .1634 .3790 .3497 .3650 .3553 SD .0271 .0058 .0586 .0877 .0514 .0620 .0538 .0514 EB RMSE .0311 .1249 .1251 .1274 .0514 .1590 .0842 .0699 99 PAGE 111 Table 23 Comparison of Model Results for Published Data SH HV EB Mean REVC LB Mean REVC LB Mean REVC LB Brown (1981) Corrected .2642 .0141 .1118 .2885 .0294 .0772 .2787 .0166 .1805 Uncorrected .2355 .0161 .0727 .2777 .0219 .0953 .2652 .0161 .1346 Shannon (1989) Partially Corrected .2672 .0208 .0825 .3193 .0307 .1058 .3025 .0214 .1520 Uncorrected .1780 .0033 .1040 .1743 .0078 .0627 .1804 .0028 .1488 Tett et al. (1991) Partially Corrected .1992 .0042 .1165 1994 .0139 .0511 .2030 .0042 .1665 McDaniel & Schmidt (1985) Uncorrected .0874 .0142 .0652 .1105 .0167 .0548 .1052 .0125 .0024 100 PAGE 112 101 Note: Graphs depict distributions in which =0.50 Normal DistributionModerateValidity Parameter1.0.9.8.7.6.5.4.3.2.10.0Frequency876543210 Normal DistributionWideValidity Parameter1.0.8.6.4.20.0Frequency876543210 Positively Skewed DistributionModerateValidity Parameter1.0.9.8.7.6.5.4.3.2.10.0Frequency121086420 Negatively Skewed DistributionModerateValidity Parameter1.0.9.8.7.6.5.4.3.2.10.0Frequency121086420 Flat DistributionModerateValidity Parameter1.0.9.8.7.6.5.4.3.2.10.0Frequency8.07.06.05.04.03.02.01.00.0 Figure 1. Population Validity Distributions Used by Callender et al. (1982) PAGE 113 Note: Graphs depict distributions in which =0.50 Normal DistributionModerateValidity Parameter1.21.0.8.6.4.20.0.2Frequency Normal DistributionWideValidity Parameter1.51.0.50.0.5Frequency Positively Skewed DistributionModerateValidity Parameter1.0.9.8.7.6.5.4.3.2.10.0Frequency Negatively Skewed DistributionModerateValidity Parameter1.0.9.8.7.6.5.4.3.2.10.0Frequency Flat DistributionModerateValidity Parameter1.0.9.8.7.6.5.4.3.2.10.0Frequency Uniform DistributionModerateValidity Parameter1.0.9.8.7.6.5.4.3.2.10.0Frequency Figure 2. Population Validity Distributions Used in the Current Study 102 PAGE 114 rho=.25(.0226)(.0839)(.0517)(.0504)(.0744)(.0985)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 rho=.50(.1369)(.0466)(.0476)(.1552)(.0749)(.0191)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 Figure 3. Lower Bound Estimates for 90 Percent Credibility Intervals when Validity is Constant 103 PAGE 115 rho=.25(.0922)(.0929)(.1062)(.0374)(.0529)(.0784)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 rho=.50(.0875)(.1233)(.1047)(.0325)(.0903)(.0801)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 Figure 4. Normal Distribution 90 Percent Credibility Interval Lower Bound Estimates 104 PAGE 116 rho=.25(.0925)(.0948)(.1057)(.0384)(.0570)(.0801)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 rho=.50(.0844)(.1169)(.1010)(.0306)(.0842)(.0789)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 Figure 5. Normal Distribution with Correlated Artifacts 90 Percent Credibility Interval Lower Bound Estimates 105 PAGE 117 rho=.25(.0030)(.1146)(.1322)(.0369)(.0665)(.0846)0.200.20.40.60.8SHHVEBSHHVEB k=25 k=150 rho=.50(.0914)(.1790)(.1238)(.0347)(.1586)(.0879)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 Figure 6. Wide Normal Distribution 90 Percent Credibility Interval Lower Bound Estimates 106 PAGE 118 rho=.25(.0862)(.1045)(.0905)(.0414)(.0722)(.0578)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 rho=.50(.1639)(.0777)(.0800)(.0336)(.1229)(.0585)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 Figure 7. Positively Skewed Distribution 90 Percent Credibility Interval Lower Bound Estimates 107 PAGE 119 rho=.25(.1094)(.1000)(.1304)(.0526)(.0412)(.1070)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 rho=.50(.1010)(.0842)(.1272)(.0518)(.0534)(.1092)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 Figure 8. Negatively Skewed Distribution 90 Percent Credibility Interval Lower Bound Estimates 108 PAGE 120 rho=.25(.0963)(.0926)(.1157)(.0416)(.0391)(.0937)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 rho=.50(.0901)(.1086)(.1128)(.0395)(.0604)(.0945)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 Figure 9. Flat Distribution 90 Percent Credibility Interval Lower Bound Estimates 109 PAGE 121 rho=.25(.1024)(.1105)(.1164)(.0416)(.0414)(.0911)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 rho=.50(.0894)(.1142)(.1123)(.0353)(.0735)(.0909)00.20.40.60.81SHHVEBSHHVEB k=25 k=150 Figure 10. Uniform Distribution 90 Percent Credibility Interval Lower Bound Estimates 110 PAGE 122 Appendices 111 PAGE 123 Appendix A Sample SAS program (Rho Distribution is Moderate Normal with a Mean of .50) data d1; proc iml; options nodate pagesize=60 linesize=80; *************************************************************** *************************************************************** SETUP SECTION SETUP SECTION SETUP SECTION *************************************************************** ***************************************************************; ***************************************************************; *This program generates a sampling distribution of correlation ; *coefficients. N is a random variable with a mean ; *of 125 and SD of 25; ***************************************************************; r= {1 .0, .0 1}; *******KNOWN DISTRIBUTION**********; Prho = .50; *#1SET KNOWN RHO; Varrho=.012; *#2SET VARIANCE OF RHO; PSDrho=sqrt(Varrho); *CALCULATE SD FOR RHO; ***********************************; *SELECT CRITERION RELIABILITY(ryy)* ** FROM CALLENDAR ET AL (1982) ** ***********************************; ryy = {.90 .85 .85 .80 .80 .80 .75 .75 .75 .75 .70 .70 .70 .70 .70 .65 .65 .65 .65 .65 .65 .60 .60 .60 .60 .60 .60 .60 .60 .55 .55 .55 .55 .55 .55 .50 .50 .50 .50 .50 .45 .45 .45 .45 .40 .40 .40 .35 .35 .30}; *****************RANGE RESTRICTION****************; ** SELECT RANGE RESTRICTION VALUE FROM ** ** DISTRIBUTION IN CALLENDAR ET AL (1982) ** **************************************************; popu = {1.0 1.0 1.0 .701 .701 .701 .701 .701 .649 .649 .649 .649 .649 .649 .649 .649 .603 .603 .603 .603 .603 .603 .603 .603 .603 .559 .559 .559 .559 .559 .559 .559 .559 .559 .515 .515 .515 .515 .515 .515 .515 .515 .468 .468 .468 .468 .468 .411 .411 .411}; 112 PAGE 124 *print popu; ********NUMBER OF STUDIES (K) PER METAANALYSIS**********; k=150; *#3Pick number of studies (k)in each metaanalysis; *********************************************************; ********************************************************* Pick values of ryy and U for set of k studies *********************************************************; do z = 1 to 1; qxt1: randu = j(k,2,9); do set = 1 to k; s1: r1 = round(uniform(0)*49)+1; if r1 > 50 then goto s1; if r1 < 1 then goto s1; s2: r2 = round(uniform(0)*49)+1; if r2 > 50 then goto s2; if r2 < 1 then goto s2; randu[set,1]=ryy[1,r1]; randu[set,2]=popu[1,r2]; end; corruryy = corr(randu); cor2 = corruryy[2,1]; if (abs(cor2) > .05) then goto qxt1;*#4eliminate artifact correlation; *if cor2 < .25 then goto qxt1; *set correlation between artifacts; *if cor2 > .35 then goto qxt1; *(Pick correlated/uncorrelated artifacts as relevant for condition); *print randu cor2; *Check on artifact correlation; end; *********************************************************; **** CALCULATE KNOWN CREDIBILITY INTERVAL PROPERTIES ***; *********************************************************; LowerCV= prho (1.28155*psdrho); *Only true if pop is normal; *lower bound of onetail 90% cred. int.; ********************************************************* DATA GENERATION DATA GENERATION DATA GENERATION DATA GENERATION ********************************************************* *********************************************************; *********************************************************; *** out1 and out10 are placeholders for study outcomes; *********************************************************; out1 = j(k,7,9); i=1000; *Number of iterations of metaanalyses; out10 = out1; outest = j(i,12,9); outad = j(100); ****; *****NUMBER OF METAANALYTIC ITERATIONS******; *repeat the metaanalysis i times(set above) *; *********************************************; do main= 1 to i; *Start Number of iterations (I); 113 PAGE 125 ********************************************; *Create the studies for one metaanalysis *; ********************************************; do h=1 to k; *Start Number of studies (K); ******************SAMPLE SIZE****************; *Set the sample size to be random (normally *distributed) variable with a mean of 125 *and SD of 25. **********************************************; take:N=round(normal(10)*25+125); *Select N for each study; if N < 10 then goto take; *Eliminate samples where N<10; sample=j(N,2,0); ***********SAMPLING RHO FROM A DISTRIBUTION****; Sample rho from a distribution; Note the distribution was set up above; ***********************************************; r[2,1]=((normal(10)*PSDrho)+Prho); *#5EQUATION DEFINES SHAPE OF UNDERLYING DISTRIBUTION OF RHO; *currently normal WITH A MEAN OF .5 AND SD OF 0.10954 (var=.012); if r[2,1] > 1 then r[2,1]=1; *place bounds on simulated r so not greater than 1.0; if r[2,1] < 1 then r[2,1] = 1; *place bounds on simulated r so not lower than 1.0; r[1,2]=r[2,1]; ***********************************************; *****SIMULATE CRITERION UNRELIABILITY**********; *Find local population reliability. *Estimate local sample reliability from local population reliability. ********************************************; rhoryy={1 0, 0 1}; rhoryy[2,1] = randu[h,1]; rhoryy[1,2] = randu[h,1]; localryy = normal(j(N,2,10))*half(rhoryy); localryy=corr(localryy); if localryy[2,1] < .1 then localryy[2,1] = .1; *multiply simulated r by square root of criterion; *make correlation matrix symmetrical again; r[2,1]=r[2,1]#sqrt(rhoryy[2,1]); r[1,2] = r[2,1]; ***********************************************; ******* Simulate range restriction ********; ***********************************************; rr=10; *Convert range restriction from pop u to z*; if randu[h,2] = 1 then rr = 10; if randu[h,2] = .701 then rr = .52; if randu[h,2] = .649 then rr = .25; if randu[h,2] = .603 then rr = .00; if randu[h,2] = .559 then rr = .25; if randu[h,2] = .515 then rr = .52; if randu[h,2] = .468 then rr = .84; if randu[h,2] = .411 then rr = 1.28; 114 PAGE 126 ******************************************************* use the value of z coded as rr to pick applicants whose scores on the predictor fall above the cutoff. *******************************************************; sams = j(1,2,9); do s = 1 to N; *DO loop #3; Start Sample size (N); dip: app = normal(j(1,2,10)); app = app*half(r); if app[1,1] > rr then sample[s,]=app; if app[1,1] <= rr then goto dip; end; *END #3; *End sample size (N); localm=sample[+,1]/nrow(sample); localss = sample[,1]j(nrow(sample),1,localm); localss = localss#localss; localu=sqrt(localss[+,]/(nrow(sample)1)); ************************************************* *** COMPUTE SAMPLE CORRELATION FROM RHO VALUE*** *** (RHO VALUE ATTENUATED BY SAMPLING ERROR, *** *** RR, AND CRITERION UNRELIABILITY)************ *************************************************; corr1=corr(sample); if corr1[2,1] > .99 then goto take; out1[h,1]=r[2,1]; *(population rho); out1[h,2]=randu[h,1]; *population ryy; out1[h,3]=randu[h,2]; *population u; out1[h,4]=N; *local N; out1[h,5]=corr1[2,1]; *local r; out1[h,6]=localryy[2,1]; *local ryy estimate; out1[h,7]=localu; *local u estimate; end; *End Number of Studies (K); print out1; *********************************************************************** HunterSchmidt Individually Corrected Correlations MetaAnalysis **********************************************************************; rs = out1[,5]; *Find the raw correlations; rc = rs; *Placeholder for corrected corr.; littleu=out1[,7]; *Find little u, Sdrest/SDunrest; bigu=1/littleu; *Calculate big u, U; k=nrow(out1); *Find k, the number of studies; N = out1[,4]; *Find the study sample sizes; *Correct individual studies for range restriction (rr) and criterion reliability (ryy); do l = 1 to k; rc[l,] = (bigu[l,]#rs[l,])/sqrt((bigu[l,]#bigu[l,]1)#(rs[l,]#rs[l,])+1); *rr; rc[l,] = rc[l,]/sqrt(out1[l,6]); *ryy; end; a1 = rs/rc; *Calculate attenuation Factor (A); do t = 1 to k; if rc[k,1] > .99 then rc[k,1]=.99; if rc[k,1] < .99 then rc[k,1]=.99; end; 115 PAGE 127 w = a1#a1#N; *Calculate study weights; shrbar=w`*rc/w[+,]; *Calculate weighted corrected mean r; ss=rcj(k,1,shrbar); *Calculate sum of squares observed; ss = ss#ss; vobs=w`*ss/w[+,]; *Calc weighted observed variance; rbar2=N`*rs/N[+,]; *Calc uncorrected (bare bones) mean; vsimple=j(k,1,9); *Placeholder for simple variance of error; vrefine=vsimple; *Placeholder for refined variance of error; alpha=j(k,1,9); do l = 1 to k; *Calculate simple& refined error variance; vsimple[l,1]= ((1rbar2#rbar2)#(1rbar2#rbar2)/(N[l,1]1))/(a1[l,1]#a1[l,1]); alpha[l,1] = 1/((bigu[l,1]#bigu[l,1]1)#(rs[l,1]#rs[l,1])+1); vrefine[l,1]=vsimple[l,1]#(alpha[l,1]#alpha[l,1]); end; ve = w`*vrefine/w[+,]; *Calc weighted average of refined error var.; shrevc = vobsve; *Calc random effects variance component; if shrevc < 0 then shrevc=0; shlbound = shrbar 1.28155*sqrt(shrevc); sdrho=sqrt(shrevc); *print littleu n rs bigu rc A1 w vsimple alpha vrefine ve vobs ss; *********************************************************************** ********** HEDGES AND VEVEA WITH STUDY LEVEL CORRECTIONS ****************** **********************************************************************; rhvu = rs; *Placeholder for corrected correlations; whv=n3; SEhv=1/sqrt(N3); localryy=out1[,6]; df = k 1 ; rhv=j(k,1,9); rhv1=j(k,1,9); *Placeholder for U corrected r values; whv1=j(k,1,9); *Placeholder for U correced weights; SEhv1=j(k,1,9); *Placeholder for U correced SE; z1=j(k,1,9); rhvc=j(k,1,9); *Placeholder for U and ryy corrected r values; whvc=j(k,1,9); *Placeholder for U and ryy corrected r values; SEhvc=j(k,1,9); *Placeholder for U and ryy corrected r values; zhvc=j(k,1,9); zhvwre=j(k,1,9); q1=j(k,1,9); q2=j(k,1,9); weightz=j(k,1,9); squarew=j(k,1,9); rewhvc=j(k,1,9); rewmean=j(k,1,9); weightrez=j(k,1,9); Vplus=j(k,1,9); wre=j(k,1,9); sumweightrez=j(k,1,9); sumwre=j(k,1,9); 116 PAGE 128 *Correct individual studies for range restriction (rr) and criterion reliability (ryy); do m = 1 to k; rhv[m,1] = rhvu[m,1]((1/(2#(N[m,1]3)))#(rhvu[m,1])#(1(rhvu[m,1]#rhvu[m,1]))); *HOTELLING'S TRANSFORMATION; end; do m = 1 to k; rhv1[m,1] = (bigu[m,1]#rhv[m,1])/sqrt((bigu[m,1]#bigu[m,1]1)#(rhv[m,1]#rhv[m,1])+1); *rr; whv1[m,1]=((rhv[m,1]#rhv[m,1])/(rhv1[m,1]#rhv1[m,1]))#whv[m,1]; *correct weights for range restriction; SEhv1[m,1]=(rhv1[m,1]/rhv[m,1])#SEhv[m,1]; *correct SE for Range Rest; end; do m = 1 to k; rhvc[m,1]= rhv1[m,1]/sqrt(localryy[m,1]); if rhvc[m,1] > .99 then rhvc[m,1] = .99; if rhvc[m,1] < .99 then rhvc[m,1] = .99; whvc[m,1]=whv1[m,1]#localryy[m,1]; SEhvc[m,1]=SEhv1[m,1]/sqrt(localryy[m,1]); *should equal sqrt(1/whvc); end; do m = 1 to k; z1[m,1]=(1+rhvc[m,1])/(1rhvc[m,1]); end; do m = 1 to k; zhvc[m,1]=.5#log(z1[m,1]); *transform r values that have been corrected for unreliability and range restriction to z scores ; weightz[m,1]=zhvc[m,1]#whvc[m,1]; end; meanzhvc=weightz[+]/whvc[+]; do m = 1 to k; q1[m,1]=(zhvc[m,1]meanzhvc)#(zhvc[m,1]meanzhvc); q2[m,1]=whvc[m,1]#q1[m,1]; adjq = q2[+]; squarew[m,1]=whvc[m,1]#whvc[m,1]; end; sumweight=whvc[+]; wratio=squarew[+]/sumweight; hvREVC=(adjQdf)/(sumweightwratio); if hvREVC<0 then hvREVC=0; *REVC adj for RR and ryy; do m = 1 to k; Vplus[m,1]=(1/whvc[m,1])+hvREVC; *calculating revised variance to get weights; Wre[m,1]=(1/Vplus[m,1]); *RE weights; weightrez[m,1]=wre[m,1]#zhvc[m,1]; end; sumweightrez=weightrez[+]; sumwre=wre[+]; meanrez=weightrez[+]/Wre[+]; *mean Randomeffects z; sumrezw=weightrez[+]; 117 PAGE 129 SErez=1/sqrt(sumrezw); *standard error of the mean; SDHV=sqrt(hvrevc); HVlowerbz = meanrez 1.28155*(sqrt(hvREVC)); meanrer=tanh(meanrez); *mean of HV transformed back to r; hvlowerbr=tanh(hvlowerbz); *LB of HV transformed back to r; *print N rhvu localryy bigu rhv rhv1 z1; *********************************************************************** ********** EMPIRICAL BAYES ESTIMATES FOR INDIVIDUAL STUDIES ****** ********** BASED ON SH METHOD ****** **********************************************************************; *Find local disattenuated correlations and disattenuated sampling variances; **********************************************************************; sumn=n[+]; ebcorrho=j(k,1,9); ebsampervar=ebcorrho; localw=ebcorrho; do ap = 1 to k; ebcorrho[ap,1]=(bigu[ap,1]#rs[ap,1])/sqrt(localryy[ap,1](rs[ap,1]#rs[ap,1]) +(bigu[ap,1]#bigu[ap,1])#(rs[ap,1]#rs[ap,1])); *Disattenuated corr.; ebw=localryy[ap,1](rs[ap,1]#rs[ap,1])+((bigu[ap,1]#bigu[ap,1])#(rs[ap,1]#rs[ap,1])); *Big W hat; ebsampervar[ap,1]= ((bigu[ap,1]#bigu[ap,1]#localryy[ap,1])#(localryy[ap,1](rs[ap,1]#rs[ap,1]))#(1(rs[ap,1]#rs[ap,1])))/(N[ap,1]#ebw#ebw#ebw); localw[ap,1] = 1/ebsampervar[ap,1]; if localw[ap,1] < 0 then localw[ap,1]=0; end; wes = localw#ebcorrho; *weighted disattenuated correlations; *The likelihood; if shrevc > 0 then do; invp=1/shrevc; *weight for pior is inverse of REVC; end; if shrevc = 0 then do; invp = (sumn1)/((1shrbar#shrbar)#(1shrbar#shrbar)); end; prior = invp#(shrbar#j(k,1,1)); *column vector of weighted prior, that is; *overall mean (mesre) weighted by REVC; ebayes = wes+prior; *Add weighed prior and local; ebayes = ebayes#1/(localw+j(k,1,invp)); *Divide by sum of weights; mbay = ebayes[+,]/k; varbay = ebayesj(k,1,mbay); varbay = varbay#varbay; varbay = varbay[+,]/(k1); SDbay = sqrt(varbay); ranked=rank(ebayes); 118 PAGE 130 *********** Calculate EB tenth percentile value based on K ********; if k=25 then tenth=((ebayes[loc(ranked=2)])+(ebayes[loc(ranked=3)]))/2; if k=150 then tenth=ebayes[loc(ranked=15)]; lowbay=tenth; ****************************************************************; *outh1=ebayes//obsr//out1[,4]; *outh2=j(k,1,1)//j(k,1,2)//j(k,1,3); *outhold= outh1outh2; *create ebay from outhold; *append from outhold; *print 'Known' prho Varrho lowercv; *print 'SH' shrbar shrevc shlbound; *print 'HV' meanrer hvrevc HVLowerbr; *print 'EmBayes' mbay varbay lowbay; **************************************************************** ***********OUTPUT VALUES *************************************** ****************************************************************; outest[main,1] = prho; *KNOWN MEAN RHO; outest[main,2] = varrho; *KNOWN VARIANCE OF RHO; outest[main,3] = lowercv; *KNOWN 90% LOWER CV; outest[main,4] = shrbar; *SH EST CORRECTED MEAN RHO; outest[main,5] = shrevc; *SH EST VARIANCE OF RHO; outest[main,6] = shlbound; *SH EST 90% LOWER CV; outest[main,7] = meanrer; *HV EST CORRECTED MEAN RHO; outest[main,8] = hvrevc; *HV EST VARIANCE of rho (in z!!!!!); outest[main,9] = HVLowerbr; *HV EST 90% LOWER CV; outest[main,10] = mbay; *EMPIRICAL BAYES EST MEAN RHO; outest[main,11] = varbay; *EMPIRICAL BAYES EST VARIANCE; outest[main,12] = lowbay; *EMPIRICAL BAYES 90% LOWER CV; end; sherr1 = (outest[,4]outest[,1])#(outest[,4]outest[,1]); shrmsem = sqrt(sherr1[+]/i); hverr1 = (outest[,7]outest[,1])#(outest[,7]outest[,1]); hvrmsem = sqrt(hverr1[+]/i); eberr1 = (outest[,10]outest[,1])#(outest[,10]outest[,1]); ebrmsem = sqrt(eberr1[+]/i); sherr2 = (outest[,5]outest[,2])#(outest[,5]outest[,2]); shrmsev = sqrt(sherr2[+]/i); eberr2 = (outest[,11]outest[,2])#(outest[,11]outest[,2]); ebrmsev = sqrt(eberr2[+]/i); sherr3 = (outest[,6]outest[,3])#(outest[,6]outest[,3]); shrmsel = sqrt(sherr3[+]/i); hverr3 = (outest[,9]outest[,3])#(outest[,9]outest[,3]); hvrmsel = sqrt(hverr3[+]/i); eberr3 = (outest[,12]outest[,3])#(outest[,12]outest[,3]); ebrmsel = sqrt(eberr3[+]/i); 119 PAGE 131 Print Simulation Values '; Print 'Shape of Rho Dist. 'normal'; *#6 Shape of the rho dist.; Print 'Rho= prho; Print 'Var Rho= varrho; Print 'Number of Studies= k; Print 'Number of Iterations=' i; Print 'Artifact Correlation=' cor2; Print '; Print 'Root Mean Square Error values'; Print mean shrmsem hvrmsem ebrmsem; Print 'variance shrmsev NO CALC ebrmsev; Print '90% cv lb shrmsel hvrmsel ebrmsel; create outest from outest; append from outest; QUIT; data new; set outest; *file 'c:/sas/diss/test'; *put tenth; *proc print; proc means; run; 120 PAGE 132 Appendix B Simulated Observed Validity Coefficient Distributions Based on Kemery et al. (1987) Figure B1 k=25 Stem Leaf # Boxplot 5 0 1  4  3 2345566899 10 ++ 2 049 3 *+* 1 34799 5 ++ 0 599 3  0 91 2  1 3 1  ++++ Multiply Stem.Leaf by 10**1 Figure B2 k=150 Stem Leaf # Boxplot 6 3 1  5 6 1  5 23 2  4 57788 5  4 0012222333444 13  3 56666667778888899 17 ++ 3 00001112222222333333444 23   2 555567777889 12   2 0111122223333444444 19 *+* 1 5677889 7   1 00011133444 11   0 555667788 9 ++ 0 111222333344 12  0 332000 6  0 886665 6  1 42 2  1 76 2  2 10 2  ++++Multiply Stem.Leaf by 10**1 121 PAGE 133 Appendix (Continued) Figure B3 k=250 Stem Leaf # Boxplot 5 66 2  5 0 1  4 6788889 7  4 000000111112233344 18  3 5666778888889999 16  3 000000111222222222333333444444 30 ++ 2 5556666666777777778888999999 28   2 0000111111222223333333344444444 31 ** 1 56666666778888888999 20  +  1 00011122333 11   0 5555666666777777889 19   0 011111222333444444444 21 ++ 0 4444443332222111100 19  0 999988887766655 15  1 32100 5  1 997776 6  2 0 1  ++++++Multiply Stem.Leaf by 10**1 Figure B4 k=500 Histogram # Boxplot 0.575+* 2  .*** 6  .****** 12  .************ 24  .**************************** 55  .***************************** 57 ++ .************************************** 75   .**************************** 55 *+* 0.175+************************** 51   .************* 25   .************** 28 ++ .******************* 38  .**************** 31  .************ 24  .****** 11  .** 4  0.225+* 2  +++++++* may represent up to 2 counts 122 PAGE 134 123 About the Author Jennifer Kisamore received a Bachelors Degree in Psychology from the University of South Florida in 1994. During her undergraduate work, she conducted an undergraduate thesis designed to enhance communication within a large organization. In 1995, she entered a combined Masters/Ph.D. program in Industrial/Organizational Psychology at the University of South Flor ida. During her graduate work, Jennifer conducted research concerning computerbased instruction as well as methods of metaanalysis. Jennifer also completed several internships during which she conducted validation studies, evaluated multimedia training programs and helped develop a new managerial performance appraisal system. Additionally, Jennifer served as an instructor for several courses including Research Methods for Psychology. She earned both departmental and university awards for her teaching. Jennifer earned her Masters degree in 1999 and completed her Ph.D. in 2003. Jennifer has made numerous conference presentations and has coauthored several articles currently under review. Jennifer has accepted an Assistant Professor of Industria l/Organizational Psychology position with the University of Oklahoma. 