USF Libraries
USF Digital Collections

Measurement equivalence of the center for epidemiological studies depression scale in racially/ethnically diverse older ...

MISSING IMAGE

Material Information

Title:
Measurement equivalence of the center for epidemiological studies depression scale in racially/ethnically diverse older adults
Physical Description:
Book
Creator:
Kim, Giyeon
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
Depressive symptoms
Measurement equivalence
Health disparities
Differential item functioning
CES-D
Dissertations, Academic -- Aging Studies -- Doctoral -- USF   ( lcsh )
Genre:
non-fiction   ( marcgt )

Notes

Summary:
ABSTRACT: This dissertation study was designed to examine measurement equivalence of the Center for Epidemiological Studies Depression (CES-D) Scale across White, African American, and Mexican American elders. Specific aims were to identify race/ethnicity-, sociodemographic-, and acculturation and instrument language-related measurement bias in the CES-D. Three studies were conducted in this dissertation to accomplish these aims. Two existing national datasets were used: the New Haven Established Populations for Epidemiologic Studies of the Elderly (EPESE) for the White and African American samples and the Hispanic Established Populations for Epidemiologic Studies of the Elderly (H-EPESE) for the Mexican-American sample. Differential item functioning (DIF) analyses were conducted using both confirmatory factor analysis (CFA) and item response theory (IRT) methods. Study 1 focused on the role of race/ethnicity on the measurement bias in the CES-D.Results from Study 1 showed a lack of measurement equivalence of the CES-D among Mexican Americans in the comparison with both Whites and Blacks. Race/ethnicity-specific items were also identified in Study 1: two interpersonal relation items in Blacks and four positive affect items in Mexican Americans. Study 2 focused on identifying sociodemographic-related measurement bias in responses to the CES-D among diverse racial/ethnic groups. Results from Study 2 showed that gender and educational attainment affected item bias in the CES-D. The interaction between gender and educational level and race/ethnicity was also found in Study 2: Mexican American women and lower educated Blacks had a greater predisposition to endorse the 'crying' item. Focusing on Mexican American elders, Study 3 examined how level of acculturation and language influence responses to the CES-D. In Study 3, acculturation and instrument language-biased items were identified in Mexican American elders.Study 3 also suggested that acculturation-bias was entirely explained by whether the CES-D was administered in the English or the Spanish versions. Possible reasons for item bias on the CES-D are discussed in the context of sociocultural differences in each substudy. Findings from this dissertation provide a broader understanding of sociocultural group differences in depressive symptom measures among racially/ethnically diverse older adults and yield research and practice implications for the use of standard screening tools for depression.
Thesis:
Dissertation (Ph.D.)--University of South Florida, 2007.
Bibliography:
Includes bibliographical references.
System Details:
Mode of access: World Wide Web.
System Details:
System requirements: World Wide Web browser and PDF reader.
Statement of Responsibility:
by Giyeon Kim.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 158 pages.
General Note:
Includes vita.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001988971
oclc - 307662148
usfldc doi - E14-SFE0002205
usfldc handle - e14.2205
System ID:
SFS0026523:00001


This item is only available as the following downloads:


Full Text

PAGE 1

Measurement Equivalence of the Center fo r Epidemiological Depression Scale in Racially/Ethnically Diverse Older Adults by Giyeon Kim A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy School of Aging Studies College of Arts and Sciences University of South Florida Co-Major Professor: David A. Chiriboga, Ph.D. Co-Major Professor: Yuri Jang, Ph.D. Victor Molinari, Ph.D. Larry Polivka, Ph.D. Brent J. Small, Ph.D. Date of Approval: September 10, 2007 Keywords: Depressive Sympto ms, Measurement Equivalence, Health Disparities, Differential Item Functioning, CES-D Copyright 2007, Giyeon Kim

PAGE 2

i TABLE OF CONTENTS LIST OF TABLES ..............................................................................................................v LIST OF FIGURES ...........................................................................................................vi ABSTRACT ......................................................................................................................vii CHAPTER ONE: INTRODUCTION .................................................................................1 Introduction and Statement of the Research Problem .............................................1 Purpose of the Study ...............................................................................................2 Significance of the Study ........................................................................................5 Background .............................................................................................................6 Two Approaches for Cross-Cultural Depression Research ........................7 Emic Approach to Cross-Cultural Depression Research ................7 Etic Approach to CrossCultural Depression Research ..................9 Conceptual Model for Cross-Cu ltural Depression Research ....................13 Measurement Equivalence in Cross-Cultural Depression Research .........17 Establishing Conceptual Equivalence ...........................................18 Establishing Metric Equivalence ..................................................19 Establishing Structural Equivalence .............................................20 Cross-Cultural Research with the CES-D Scale .......................................22 Description of the CES-D Scale ...................................................22 DSM Criteria and the CES-D .......................................................24 Cross-Cultural Applicability of the CES-D ..................................26 Measurement Equivalence of the CES-D Scale ........................................31 Analytic Strategies for Measurement Equivalence Research ...................35

PAGE 3

ii Issues Needed for Future Research .......................................................................37 Study Design .........................................................................................................41 CHAPTER TWO: STUDY 1 RACE/ETHNICITY-RELATED MEASUREMENT BIAS ............................................45 Introduction ...........................................................................................................45 Methods .................................................................................................................51 Sample .......................................................................................................51 Measures ...................................................................................................52 CES-D Scale .................................................................................52 Statistical Analysis ....................................................................................52 CFA DIF detection ........................................................................54 IRT DIF detection .........................................................................56 Results ...................................................................................................................57 Descriptive Information of Sample ...........................................................57 Descriptive Item Statistics ........................................................................59 DIF Results ...............................................................................................59 Whites versus Blacks ....................................................................64 Whites versus Mexican Americans ...............................................64 Mexican Americans versus Blacks ...............................................65 CFA versus IRT ............................................................................65 Discussion .............................................................................................................66 CHAPTER THREE: STUDY 2 SOCIODEMOGRAPHIC-RELATED MEASUREMENT BIAS ....................................74 Introduction ...........................................................................................................74 Methods .................................................................................................................78 Sample .......................................................................................................78 Measures ...................................................................................................79

PAGE 4

iii CES-D scale ..................................................................................79 Statistical Analysis ....................................................................................80 DIF Detection ................................................................................81 CFA DIF detection. ...........................................................81 IRT DIF detection. ............................................................81 Results ...................................................................................................................83 Sample Characteristics ..............................................................................83 Dimensionality of the CES-D ...................................................................85 DIF Results ...............................................................................................85 Age-DIF ........................................................................................86 Gender-DIF ...................................................................................88 Educational Attainment-DIF .........................................................90 Discussion .............................................................................................................93 CHAPTER FOUR: STUDY 3 ACCULTURATIONAND LANGUAGE-R ELATED MEASUREMENT BIAS .......100 Introduction .........................................................................................................100 Methods ...............................................................................................................104 Sample .....................................................................................................104 Measures .................................................................................................104 CES-D scale ................................................................................104 Acculturation ...............................................................................105 Instrument Language ..................................................................106 Acculturation Language ..........................................................106 Statistical Analysis ..................................................................................106 Dimensionality test .....................................................................106 DIF analyses ................................................................................107 CFA DIF Detection. ........................................................108 IRT DIF Detection. .........................................................108 Results .................................................................................................................109

PAGE 5

iv Sample Characteristics ............................................................................109 DIF Results .............................................................................................111 Acculturation-bias .......................................................................111 Language-bias .............................................................................114 Acculturation Language-bias ...................................................114 Discussion ...........................................................................................................116 CHAPTER FIVE: DISCUSSION ...................................................................................123 Summary of the Study ........................................................................................123 Overview of Findings .........................................................................................123 Implications .........................................................................................................125 Methodological Implications ..................................................................125 Detecting DIF ..............................................................................125 Explaining DIF ............................................................................125 Dealing with DIF ........................................................................126 Practice Implications ...............................................................................128 Limitations and Future Research ........................................................................131 Final Thoughts ....................................................................................................134 REFERENCES ...............................................................................................................136 ABOUT THE AUTHOR .......................................................................................End Page

PAGE 6

v LIST OF TABLES Table 1. Study 1 Item Parameter Estimates for Whites, Bl acks, and Mexican Americans..........................................................................................................55 Table 2. Study 1 Descriptive Ch aracteristics of the Sample..........................................58 Table 3. Study 1 Mean and Standa rd Deviation (SD) of the CES-D in Whites, Blacks, and Mexican Americans........................................................................60 Table 4. Study 1 Item Responses of the CES-D Scale in Whites, Blacks, and Mexican Americans...........................................................................................61 Table 5. Study 1 DIF Results from CFA and IRT Methods..........................................62 Table 6. Study 2 Descriptive Ch aracteristics of the Sample..........................................84 Table 7. Study 2 Age DIF Results from CFA and IRT Methods..................................87 Table 8. Study 2 Gender DIF Results from CFA and IRT Methods.............................89 Table 9. Study 2 Educational Attainment DIF Results from CFA and IRT Methods..............................................................................................................91 Table 10. Study 3 Descriptive Information of the High and Low Acculturated Mexican Americans.........................................................................................110 Table 11. Study 3 DIF Results from Confirmatory Factor Analysis (df =341, df = 2)...........................................................................................................112 Table 12. Study 3 DIF Results from Item Response Theory (df =157, df = 4)...........................................................................................................113

PAGE 7

vi LIST OF FIGURES Figure 1. The Typology of the Relationship of Culture to Depression Adopted from the Sternberg (2004) Culture and Intelligence Model...............................14 Figure 2. Outline for the Dissertation...............................................................................44 Figure 3. Item information function for Item 17 (I had crying spells) showing Gender-DIF in the total sample and Me xican American race/ethnicity............90 Figure 4. Item information function for Item 17 (I had crying spells) showing Education-DIF in the total samp le and Black race/ethnicity.............................92

PAGE 8

vii MEASUREMENT EQUIVALENCE OF THE CENTER FOR EPIDEMIOLOGICAL STUDIES DEPRESSION SCALE IN RA CIALLY/ETHNICALLY DIVERSE OLDER ADULTS Giyeon Kim ABSTRACT This dissertation study wa s designed to examine measurement equivalence of the Center for Epidemiological Studies Depr ession (CES-D) Scale across White, African American, and Mexican American elders. Speci fic aims were to identify race/ethnicity-, sociodemographic-, and acculturation and inst rument language-related measurement bias in the CES-D. Three studies were conducted in this dissertation to accomplish these aims. Two existing national datasets were used: the New Haven Established Populations for Epidemiologic Studies of the Elderly (EPE SE) for the White and African American samples and the Hispanic Established Popul ations for Epidemiologic Studies of the Elderly (HEPESE) for the Mexican-American sample. Differential item functioning (DIF) analyses were conducted using both c onfirmatory factor an alysis (CFA) and item response theory (IRT) methods. Study 1 fo cused on the role of race/ethnicity on the measurement bias in the CES-D. Results from Study 1 showed a lack of measurement equivalence of the CES-D among Mexican Americans in the comparison with both

PAGE 9

viii Whites and Blacks. Race/ethnicity-specific items were also iden tified in Study 1: two interpersonal relation items in Blacks and four positive affect items in Mexican Americans. Study 2 focused on identifying sociodemographic-related measurement bias in responses to the CES-D among diverse r acial/ethnic groups. Results from Study 2 showed that gender and educational attainment affected item bias in the CES-D. The interaction between gender and educational level and race/ethnicity was also found in Study 2: Mexican American women and lower educated Blacks had a greater predisposition to endorse the crying it em. Focusing on Mexican American elders, Study 3 examined how level of acculturation and language influence responses to the CES-D. In Study 3, acculturation and instru ment language-biased items were identified in Mexican American elders. Study 3 also s uggested that acculturation-bias was entirely explained by whether the CES-D was administered in the English or th e Spanish versions. Possible reasons for item bias on the CES-D are discussed in the cont ext of sociocultural differences in each substudy. Findings fr om this dissertation provide a broader understanding of sociocultural group differenc es in depressive symptom measures among racially/ethnically diverse olde r adults and yield research and practice implications for the use of standard screening tools for depression.

PAGE 10

1 CHAPTER ONE: INTRODUCTION Introduction and Statement of the Research Problem With the increased attention paid to racial and ethnic disparitie s in mental health, a growing body of literature s uggests racial/ethnic disparitie s in depression among older adults (Perreira, Deeb-S ossa, Harris, & Bollen, 2005; Robison, Curry, Gruman, Covington, Gaztambide, & Blank, 2003). A numbe r of studies have reported higher rates of depressive symptoms and disorders in so me racial/ethnic minority groups than in White Americans (e.g., Dunlop, Song, Lyons, Manheim, & Chang, 2003; Minsky, Vega, Miskimen, Gara, & Escobar, 2003). However, reported prevalence ra tes of depression vary dramatically across dive rse racial/ethnic groups, ra nging from 1.5% to 25.4% (e.g., Mui, Burnette, & Chen, 2002; Saez-Santiago & Bernal, 2003; Weissman, Bland, Canino, Faravelli, Greenwald, Hwa et al ., 1996). Given that culture defined broadly, including racial/ethnic, age, gender, socioecono mic, geographical, and language variations shapes values, attitudes, beliefs, and behaviors by a group of people (Sternberg, 2004), as well as influences disease manifestation and diagnostic labeling (Kleinman, 2004), cultural differences among racially/ethnically diverse elderly groups may affect the way in which people experience and express their depressive symptoms, wh ich in turn may contribute to variations in the preval ence rates of depression. A number of comparative studies on depressive symptoms with diverse racial/ethnic groups and even with subgroups of the major racial/ethnic groups have sought to identify similarities and differences and explain differences across these groups

PAGE 11

2 (Gregorich, 2006). While these efforts are pr aiseworthy, questions can be raised about the cross-group validity of the depression measures used. Group mean comparisons of depressive symptoms assume measurement i nvariance in which the same construct is being measured and the measurement used for comparison functions in the same way across all groups (Bravo, 2003; va n de Vijver, 2001). Violati ons of this assumption can occur if individuals from particular culture s or subgroups are more or less likely to endorse specific items as a consequence of group membership, such as race/ethnicity, age, gender, educational experience, and langua ge of administration (Jones, 2006). Researchers, however, have not paid enough attention to the cultu ral equivalence of depressive symptom measur es, taking the underlying me thodological assumptions for granted. Failure to establish measurement equivalence on depression may lead to inaccurate estimates of prevalence and misl eading group comparisons (Hui & Triandis, 1985; Vandenberg & Lance, 2000). For this reason, comparative research on depressive symptoms is needed to distinguish differe nces arising from a lack of measurement equivalence, which is known as differential item functioning (DIF), from true differences in standing on the same latent trait. Purpose of the Study In light of the abovementioned lack of research on the cultural equivalence of depressive symptom measures, the main pur pose of this dissertation study was to examine measurement equivalence of one of th e most widely used depression assessment tools, the Center for Epidemiological Studi es Depression Scale (CES-D), across diverse racial/ethnic groups and subgroups of the major racial/ethnic groups. Focusing on comparisons between three ra cial/ethnic elderly groups in cluding Whites, Blacks, and

PAGE 12

3 Mexican Americans, specific aims of this di ssertation research were to identify culturespecific items in the CES-D scale functioni ng differently across diverse racial/ethnic groups and subgroups of age, gender, educat ional attainment, acculturation, and language within each racial/ethnic elderly group as we ll as to identify a core set of depressive symptom items in the CES-D scale across raci ally/ethnically divers e elderly groups and subgroups within the same racial/ethnic group. The specific research questions for this dissertation study were the following: Research Question 1: Do the measurement properties of the CES-D scale vary by race/ethnicity among older adults?; Research Question 2: Do the measurement properties of the CES-D scale vary by sociode mographic characteristics (i.e., age, gender, and educational attainment) within ea ch racial/ethnic elderly group?; Research Question 3: Do the measurement properties of the CES-D scale vary by the level of acculturation and instrument language in older Mexican-A mericans? Each research question was examined in a separate study (Study 1, 2, and 3). In order to investigate th ree research questions, this dissertation study used two existing national datasets from the New Haven Established Populations for Epidemiologic Studies of the Elderly (EPE SE) for older Whites and Black samples and the Hispanic Established Populations for Epidemiologic Studies of the Elderly (H EPESE) for older Mexican American sample The following data analyses were conducted: (1) ANOVA and t-test; (2) dimensi onality test using exploratory factor analysis (EFA) and confirmatory factor an alysis (CFA); and (3) differential item functioning (DIF) analyses using both confirmatory factor analysis (CFA) and item response theory (IRT) methods.

PAGE 13

4 Research Question 1 was examined in Study 1 entitled Measurement Equivalence of the Center for Epidemiological Studies Depression Scale Among White, Black, and Mexican American Respondents in the New Haven and the Hispanic EPESE. The purpose of Study 1 was to identify race/ethnicity-related measurement bias in responses to the CES-D items in three raci al/ethnic elderly groups. Study 1 hypothesized that the measurement properties of the CE S-D scale would vary by race/ethnicity among older adults. Research Question 2 was examined in Study 2 entitled SociodemographicRelated Measurement Bias in the Center fo r Epidemiological Studies Depression Scale: A Study of Racially/Ethnically Diverse Older Adults. Study 2 focused on identifying sociodemographic-related measurement bias in response to the CES-D in three racial/ethnic elderly samples. Study 2 hypothesized that th e measurement properties of the CES-D scale would vary by age, gender, and educational attainment within each racial/ethnic elderly group. Focusing on older Mexican Americans, Research Question 3 was examined in Study 3 entitled Measurement Bias in the Center for Epidemiological Studies Depression Scale in Older Mexican Amer icans: Effects of Acculturation and Language. The purpose of Study 3 was to identify how le vel of acculturation and language influence responses to the CES-D in Mexican Ameri can elders. Study 3 hypothesized that the measurement properties of the CES-D woul d vary by the level of acculturation and instrument language in older Mexican Americans.

PAGE 14

5 Significance of the Study The goal of the overall dissertation is to evaluate the measurement equivalence of the CES-D across three racial/e thnic groups (Whites, Blacks, and Mexican Americans). This dissertation study represents the first multi-racial/ethnic and multi-cultural evaluation of the full version of the CESD using older adults from two nationally representative datasets. Prev ious analyses have been limite d to one or two racial/ethnic groups to evaluate the measurement equiva lence of the CES-D. The effects of sociodemographic and cultural factors on sy mptoms and expressions of depressive symptoms can be explored and compared in this dissertation. More importantly, this dissertation study can identify sociodemogra phicand culture-specific depressive symptom items that function differentially ac ross cultures and sub-cultural groups within the same racial/ethnic group. This dissertation study can also identify a core set of depressive symptom items that function equivalently across cultures and subgroups of age, gender, educational attainment, and accult uration. This dissertation can eventually address limitations of current screening for depression in older racial/ethnic groups and improve the detection of depression among di verse groups of older adults, by providing the modified versions of the CES-D in each group. Identification of optimal depression screens may provide a basis for developi ng culturally appropri ate and effective depression assessment tools in different racial/ethnic groups within the U.S. and crossnationally. Findings from this dissertation study can pr ovide directions for future research and practice in the field. This study may be useful for screening, assessment, and treatment for depression in ra cially or ethnically diverse groups. When established

PAGE 15

6 measures are used to screen and assess for de pression in different r acial/ethnic groups, we should recognize the risk that individuals with different cultural backgrounds may tend to be misclassified, leading to underor over-dia gnosis for depression. This may be related to underor over-reporting, as well as underor over-treatment for depression. This dissertation study may help inform the field w ith respect to culturally appropriate screens for depression in racially/ethnically diverse groups and sub-cultura l groups within the same racial/ethnic group. This dissertation can be extended to other racial/ethnic groups, as well as other diverse cultural subgroups w ith important status characteristics. This may be particularly important when conseque nces of inaccurate measures could lead to misguided public policies. Fina lly, this line of research may be extended to other mental health measures to enable accurate comparisons among diverse racial/ethnic groups for future research. Background In this background section, this dissertation begins with three conceptual issues that underlie concerns about measurement equivalence: (1) two approaches for crosscultural depression research including emic and etic approaches to cross-cultural depression research; (2) a conceptual framew ork for cross-cultural depression research, along with implications for different c onceptualizations of depression and recommendations for future research; and (3 ) measurement equivalence issues in crosscultural depression research including conceptu al, metric, and struct ural equivalence. With special attention to one of the most widely used depression assessment tools, the Center for Epidemiologic Studies-Depression Scale (CES-D) scale, the cross-cultural applicability of the CES-D scale is disc ussed next. Focusing more on measurement

PAGE 16

7 equivalence research, factor s affecting measurement equivalence of the CES-D and analytic strategies for measurement equivalence are reviewed. Based on the extensive background for this dissertation, issues needed for future research are also presented in the following section. Two Approaches for Cross-Cu ltural Depression Research A central issue in cross-cultural comp arative studies involves two different perspectives, which together have been called the emic-etic paradigm (Brislin, Lonner, & Thorndike, 1973; Canino, Le wis-Fernandez, & Bravo, 1997; Pike, 1954; Triandis & Brislin, 1984). The emic approach reflects the inside perspective of the ethnographer, whereas the etic approach rflects the outside perspective of the comparativist researcher (Morris, Leung, Ames, & Lickel, 1999; Ng & Earley, 2006). The emic approach uses variables and observations that are culturally sp ecific to a particular group, at a certain period in time, to develop an instrument (Rait & Burns, 1998) This does not allow for comparative research as it l ooks at variables in terms of language and culture and the instrument may not be relevant to other groups The etic approach, on the other hand, is basically comparative, and is di rected at eliciting standardiz ed categories of phenomena out of local specificities (Canino et al., 1997). Emic Approach to Cross-Cultural Depression Research The emic approach attempts to describe the internal logic of a cu lture, its singularity, considering this a necessary step prior to any valid cross-cultural analysis (Chvez & Canino, 2005). Therefore, it does not allow cros s-cultural comparisons using identical case definitions and standardized diagnostic interviews as case-finding instruments (Cheng, 2001). However, given that the emic appro ach focuses on examining a construct from

PAGE 17

8 within a specific culture and understanding that construct as the pe ople from within that culture understand it (Schaffer & Riordan, 2003), the emic approach is particularly useful in understanding the relatively uni que features of the manifestation of depressive symptoms and diagnosis in a given ethnic group. For example, Patel and colleagues (2001) ex amined medical concepts of depression in Zimbabwe, and found that one culture-specific terminology, kufungisissa (thinking too much), has been shown to be closely linke d to depression among Zi mbabwean depressed patients. Similarly, some researchers with an emic approach have highlighted that each culture has its own somatic meta phors to describe depression: nervios (brain ache or uncontrollable) in Mexican Americans (Jenkins, 1988); shenjing shuairuo (neurasthenia) in Chinese (Parker, Gladstone, & Chee, 2001); Sadri dayeq alayya (My chest feels tight) and Jesmi metkasser (broken body) in Dubai (Sulaiman, B hugra, & De Silva, 2001). Along the same lines, the emic approach has also infl uenced DSM-IV, which now includes an index of culturally defined syndromes, as well as st atements describing how culture influences the prevalence, symptomatology, course, and c linical outcome of sp ecific psychiatric disorders (Canino et al., 1997). Emic studies of depression have succeeded in developing some indigenous depression instruments (Tanaka-Matsumi, 2001). For example, with an additional inclusion of five Hopi illness categories (translated as worry, sickness, unhappiness, drunken-like craziness, and disappointment), Manson, Shore, and Bloom (1985) developed the American Indian Depression Scale (AIDS) among Hopi Indians. Other examples of locally developed screening questionnaire s include the Primary Care Psychiatric Questionnaire (PPQ; Stinivasan & Sures h, 1990) in India, the Shona Symptom

PAGE 18

9 Questionnaire (Patel, Simunyu, Gwanzura, Lewis, & Mann, 1997) in Zimbabwe, and the Chinese Health Questionnaire (CHQ; Cheng & Williams, 1986). Also, there are a few examples of indigenous structured intervie ws, including the Indian Psychiatric Survey Schedule (Shamasundar, Krishna Murthy, Pr akash, Prabhakar, & Subbakrishna, 1986). Although these indigenous depression measur es share much with the questionnaires developed in Western culture and there is a high degree of agreement in care classification, the emic studies on depression have been criticized for neglecting the problem of observation bias (Canino et al., 1997). One of the critical issues in cross-cultural depression research with an emic approach is the lack of me thodological homogeneity across studies focusing on different cultures. This can result in the inability to disentangle methodological from substantive factors when variability in cross-cu ltural comparisons is observed (Bravo, 2003). For example, as mentioned earlier, although Patel and colleagues (2001) suggested that culture-specific somatic symptoms are strongly associated with depression among Zimbabweans, cr iticisms may remain with re spect to genera lizability of the indigenous measurement and epidemio logical testing of causal hypotheses on depression. Etic Approach to Cross-Cultural Depression Research The main assumption with the etic cross-cu ltural research on de pression is that the etiology of depression is univers al and key constructs of depr ession exist equally across all cultures. Epidemiologists and cross-cultural re searchers often use the etic approach for the cross-cultural comparative study of depression (e.g., Breslau, AguilarGaxiola, Kendler, Su, Williams, & Kessler, 2005; Cole, Kawachi, Maller, & Berkman, 2000; Gonzalez, Haan, & Hinton, 2001; Iwata, Turner, & Lloyd, 2002; Kessler & Ustun, 2004; Nguyen, Kitner-

PAGE 19

10 Triolo, Evans, & Zonderman, 2004), emphasi zing the search for equivalence across cultures and using similar methods, constructs, and measures across se ttings in order to increase the generalizability of the findings (Schaffer & Riordan, 2003; van de Vijver, 2001). A prototypical etic stud y of depression is the intern ational project of the World Health Organization (1983) on th e diagnosis and classification of depression in Switzerland, Canada, Japan, and Iran. The goal of this study wa s to test feasibility of using standardized instruments of depressive diso rders. Using the Schedule fo r Standardized Assessment of Depressive Disorders (WHO/S ADD) by psychiatrists, 573 pa tients were diagnosed with depression in this study. On the basis of 39 symptoms of depression, WHO (1983) found that more than 76% of the depressed patient s reported core depres sive symptoms that included sadness, joylessness, anxiety, tension, lack of ener gy, loss of inte rest, loss of ability to concentrate, and ideas of insufficiency (p. 61). Suicidal ideation was also present in 59% of patients. The WHO project also discovered crosscultural variation in the expression of depression, such as somatic complaints and obsessions which were not part of the original 39 symptoms of depression measured by the W HO/SADD. Variations existed both within and across cultures in this project. More recently, WHO has also conducted several cross-national studies on depression. The cross-national depression study of Simon and his colleagues (2002) is an example. On the basis of the World Hea lth Organizations Psychological Problems in General Health Care (PPGHC) study, they exam ined prevalence rates of depression in 14 countries. The PPGHC study used the Com posite International Diagnostic Interview (CIDI) for psychiatric symptoms and dia gnoses. The authors found evidence that

PAGE 20

11 prevalence rates of current major depression varied across cultures. They also reported that depression was universally associ ated with disability, but th is association varied across cultures. They concluded that use of identical measures and diagnostic criteria might identify different levels of depression severity in differe nt countries or cultures. A number of studies with an etic appro ach have reported that even within the United States, prevalence estimates varied acr oss racial/ethnic groups (e.g., Breslau et al., 2005; Dunlop et al., 2003; Gonzalez et al., 2001 ; Swenson, Baxter, Shetterly, Scarbro, & Hamman, 2000). For example, using a nationally representative sample from the National Comorbidity Survey Replica tion (NCS-R), Breslau and colleagues (2005) examined racial/ethnic variations in DSM-IV disorders. The Com posite International Diagnostic Interview (CIDI) was used for the diagnostic as sessment. Comparing non-Hispanic Whites, non-Hispanic Blacks, and Hispanics, the authors found significantly lower lifetime prevalence and risk of depression in both minority groups. Suggesting the presence of common protective factors across disorder s for both minority groups, the authors concluded cultural differences might lead racial/ethnic minor ity groups to respond differently to the same survey questions rega rding their psychiatric history despite similar levels of morbidity. Reflecting on the importance of cultur al influences on reporting depressive symptoms and the cultural appropriateness of measures, etic researchers have examined psychometric properties of depression measur es (e.g., Crockett, Randall, Shen, Russel, & Driscoll, 2005; Foley, Reed, Mutran, & DeVellis 2002; Nguyen et al., 2004). For example, using exploratory factor anal ysis, Foley and colleagues (2002 ) did not confirm Radloffs original four-factor model in older African Americans. They found no distinction between

PAGE 21

12 somatic complaints and depressed affect, and they identified one new factor, social wellbeing, that has not been repor ted in the general population. Th is suggests the presence of unique measurement properties of the CES-D in African Americans, as well as needs for additional research to asse ss the validity of CES-D acr oss diverse cultural groups. Etic studies of depression have developed a number of depression instruments. Assessment of depressive symptoms and disord er has been conducted with self-report (e.g., Center for Epidemiologic Studies Depressi on, Beck Depression Scale, Zung Depression Scale, Geriatric Depression Scale) and interv iewer or clinician rating scales (e.g., Hamilton Rating Scale for Depression, Structured Clin ical Interview for DSM-IV, WHO Composite International Diagnostic Inte rview). Although these self-r eport and interviewer rating instruments are based on symptom criteria that is geared to Western culture patients, these instruments have been used as standards for depressive symptoms across ethnic and cultural groups. As a result, the etic appr oach has been criticized for emphasizing reliability at the expense of va lidity (Canino et al., 1997). It may impose the appearance of cross-cultural homogeneity at th e expense of a constricted c onceptualization embedded in the instrumentation (Bravo, 2003). This limita tion has been called the cultural fallacy, meaning the approach ignores cross-cultural differences in the nature or validity of depressive disorder (Bravo, 2003). For example, Medina-Mora and colleagues (2005) examined psychiatric disorders in Mexico usi ng the CIDI instrument. However, given that culture-bound syndromes such as nervios exist in Mexico and ma y not be captured by the CIDI, the instrument based only on DSM-IV may have potential problems due to cultural fallacy.

PAGE 22

13 Conceptual Model for Cross-Cu ltural Depression Research A significant limitation of cross-cultural rese arch conducted in the field of depression is that it lacks a clear theoretical framework. In order to advance cross-cultural comparative research on depressive symptoms Sternbergs (2004) culture and intelligence analytic model presents as one of the more appropriate an alytic frameworks, although it was originally developed with a focus on cross-cultural inte lligence research. Sternberg (2004) proposed a fourfold typology to capture the possible relatio nship of culture to intelligence. The four cells of the typology differ in two key respects: (1) whether or not there are cross-cultural differences in the nature of the mental pro cesses and representations involved in adaptation that constitute intelligence and (2) whether ther e are differences in the instruments needed to measure intelligence (beyond simple translation or adaptation), as a result of cultural differences in the content required for adaptation (Sternberg, 2004, p.326). The typology can be applied to cross-cult ural comparative research on depression by reframing the previous discussion of e tic and emic paradigm and of depression instruments used in the previous cross-cultural depression research. Taking all the abovementioned conceptual considerations into account, we propose four types of the relationship of culture to depression, that are shown in Figure 1. The four types are summarized as follows: (1) Type I, the nature of depression is the same across cultures and the instruments used to measure it are the same ; (2) Type II, the nature of depression is different across cultures, but the instruments used to measure it are the same across cultures; (3) Type III, the nature of depr ession is the same across cultures, but the instruments are not the same; (4) Type IV, the nature of depression is different across cultures and the instruments used to measure it are different.

PAGE 23

14 Dimensions of Depression Relation Same Different Same Type I -Etic approach -Type I-based Depression Research Research on core depressive symptoms Research on depression prevalence rates Research on metric & structural equivalence of depression measures WHO projects based on DSM-based measures Type II -Etic approach -Type II-based Depression Research Research on different factor structure of instrument Research on different expression on depression Research on metric & structural non-equivalence of depression measures Research on somatization and depression Instruments of Depression Different Type III -Emic approach -Type III-based Depression Research Research on locally developed screening tools Research on symptom expression within one culture Type IV -Emic approach -Methods: focus groups, in-depth interview -Type IV-based Depression Research Research on culture-bound syndromes Research on somatization and depression Research on metric & structural equivalence of depression measures Figure 1. The Typology of the Relationship of Culture to Depression Adopted from the Sternberg (2004) Culture and Intelligence Model

PAGE 24

15 In Type I, both the instruments and the ensuing dimensions of depression are the same across cultures. Studies that found core depressive symptoms across cultures using DSM-based measures are good examples, such as a prototypically etic study of depression (WHO, 1983). I ndeed, a number of WHO st udies of cross-national prevalence rates using DSM criter ia (e.g., Breslau et al., 2005) ar e also based on this type. This line of research assumes that core depressive symptoms exist cross-culturally and cross-nationally and only levels of depressi ve symptoms are differe nt across cultures. The argument based on Type I is that the natu re of depression is precisely the same across cultures and that this nature can be m easured identically wit hout regard to culture using appropriate transl ations when necessary. In Type II, the measures used to assess depression are the same across cultures, but the outcomes obtained from using those measures are structurally different as a function of culture. This type is close to a number of etic depression studies (e.g., Foley et al., 2002; Jang, Kim, & Chiriboga, 2005; Mi ller, Markides, & Blank, 1997) identifying that the same depression measures given in different cultures sugge sted that people from different cultures express their depressive sy mptoms in different ways. For example, as mentioned earlier, Miller and his colleague s (1997) argued strongly for a two-factor model of the CES-D among elderly Mexican Am ericans instead of the classic four-factor model that was derived from samples of the general American population. They believed qualitative differences in factorial struct ure exist between the two cultural groups. Type III posits that there is no difference in the nature of depression but that a difference in the instruments used to measur e it is necessary. As mentioned already in the emic approach, studies that develope d indigenous depression measures (e.g., the

PAGE 25

16 Chinese Health Questionnaire, the American Indian Depression Scale, the Shona Symptom Questionnaire in Zimbabwe) are exampl es for this type II I (e.g., Manson et al., 1985; Patel & Mann, 1997). This approach basically assumes that instruments for depression within a given culture must be em ic derived from within the context of the culture rather than from outside it, and argue s that when the same instruments are used across cultures, meanings for depression to be assigned to the scores would be different from one culture to another. In Type IV, both the nature of depression and the instruments are different as a function of the culture being investigat ed. A number of culture-bound syndromes included in DSM-IV are examples: some of these culture-specific symptoms have been reported to be associated with depressive symptoms, such as brain fag (West Africa), dhat (Indian), nervios (Latinos), and shenjing shairu (Chinese). This type argues that depression can be understood and measured onl y as an emic construct within a given culture and nothing about depression is necessarily common across cultures. It is my opinion that the two major topics of the present dissertation, culture and depression, can be more fully understood, measured, and analyzed by applying the proposed types of the relationship of culture to depression. Because research using the previously-mentioned emic-etic paradigm has not paid sufficient attention to differences in the instruments needed to measure depr ession across cultures, the proposed typology may expand the traditional emic-etic approach by adding the key aspect of depression instruments used in cross-cultural research to the proposed typology. The proposed analytic framework has the potential fo r providing a broader understanding on racial/ethnic differences in depressive symp toms, as well as on the role of culture on

PAGE 26

17 depressive symptoms. Additionally, the propos ed analytic typology may be useful in the development of culturally sensitive and info rmed screening and assessment of depression within a given culture. Finally, the proposed analytic typology may serve as a basis for future studies designed to extend the topic to other mental health research areas. Measurement Equivalence in Cross-Cultural Depression Research One major issue related to assessing the cr oss-cultural comparability of depressive symptoms has been the equi valence of measures (Bravo, 2003; Crockett et al., 2005; Liang, 2002; van de Vijver, 2001). Although it may be less of a concern when more open-ended questions are administered by an interviewer, the equivalence issue is particularly serious when self-report sc reening measures are involved (Liang, 2002). Valid and reliable questionnaire items in one language often lose meaning and context after translation. Even with accurate translation, the problem of different nuances unique to different cultures may not be resolved (Bravo, 2003). As mentioned already, even when using the DSM criteria for depression and a standardized depression instrument, cultural factors still affect the way individua ls express their depressive symptoms. Failure to substantiate the equivalence in depression instruments is potentially serious because it may lead to inaccurate preval ence rate and misleading group comparisons (Vandenberg & Lance, 2000). Therefore, in order to make meaningful comparisons across diverse population groups, researchers mu st establish equivalence of depression measures in addition to the traditional reliab ility and validity requirements of instruments. Johnson (1998) reviewed articles on cross-cultural equivalence and reported that more than 50 specific terms of equivalen ce (e.g., conceptual equivalence, criterion equivalence, semantic equivalence, metric equivalence) have been discussed or

PAGE 27

18 mentioned in the available literature on cross-cu ltural research. Acco rding to this review article, one of the most common equivalence ad dressed in the cross-cultural research was conceptual equivalence, followed by metric equivalence and stru ctural equivalence (Johnson, 1998). Similarly, a number of researchers point out that cross-cultural measurement equivalence requires at leas t the abovementioned three interrelated conditions (Markides, Liang, & Jackson, 1990; van de Vijver, 2001). More importantly, these types of equivalence c onstitute a hierarchy in that both conceptual and metric equivalence are required for structural equivalence and metric equivalence assumes conceptual equivalence. Establishing Conceptual Equivalence Conceptual equivalence is the most basi c type of equivalence and implies that research materials or observed behaviors have the same meaning in two or more cultures (Liang, 2002). Hui and Triandis (1985) identify conceptual equivalence as a necessary condition for making cross-cultural comparis ons. Thus, unless two or more cultural groups share the same basic concept of depressi on, there is little pu rpose in determining whether measures of that construct are equally valid across groups (Crockett et al., 2005). An example of a possible research question for evaluating conceptual equivalence might be: Do minorities and Whites think of depression in the same way? Conceptual equivalence can be evaluated by using several methods, such as backtranslation, focus groups, random probes, and in -depth interviews. A major requisite for conceptual equivalence may be fidelity of translation. Ramirez and colleagues (2005) suggest that qualitative methods may be best suited to asse ss conceptual equivalence of existing depression measures as well as to uncover indigenous idio ms of depression and

PAGE 28

19 culture-bound syndromes. Qualitative methods, for example, can explore the relevance and appropriateness of depression concepts as well as the way pe ople from different racial/ethnic backgrounds give m eaning to a particular domain. Establishing Metric Equivalence Assuming conceptual equivalence, metr ic equivalence assures that a given measurement specification can be applied to different cu ltures (Liang, 2002). Metric equivalence occurs when the factor loadings of items in the depression instruments are invariant across the two or more cultural groups (Crockett et al., 2005). A number of studies evaluating the metric equivalence of depression instruments have used exploratory factor analysis (EFA; e.g., Callahan & Wolins ky, 1994; Foley et al., 2002), confirmatory factor analysis (CFA; e.g., Cr ockett et al., 2005; Nguyen et al., 2004), and item response theory (IRT: e.g., Cole et al., 2000; Iwata & Buka, 2002). A number of studies have suggested that CFA may be th e most versatile approach to evaluating metric and structural e quivalence simultaneously (e.g., MacCallum & Austin, 2000; Raju, Laffitte, & Byrne, 2002). The assessment of measurement equivalence across cultural groups usually i nvolves the use of multiple-group CFA (van de Vijver & Leung, 2000). The advantages of CF A with regard to cross-cultural research are that it 1) allows for detailed comparisons of factor models acro ss cultural groups, 2) allows for a comparison of latent means, and 3) is not unduly in fluenced by relatively small sample sizes. As an example of the confirmatory a pproach, on the basis of two surveys of African Americans and one survey of Cauc asians, Nguyen and colleagues (2004) tested the metric equivalence of the CES-D scale. Using confirmatory f actor analysis, they

PAGE 29

20 demonstrated the equivalent number of factor s and pattern of factor loadings across all three groups. Then, they tested the metric equivalence of the CES-D to see if the magnitudes of the factor loadings were equal across three racial/ethnic groups. Significant loading differences were found between African American and Caucasian groups, while significant loading similariti es were found between the two surveys of African Americans. The author s pointed out that a higher numb er of loading similarities between two African American groups were expected given the same ethnicity and cultural backgrounds. Loading differences be tween African Americans and Caucasians were found in a number of both somatic complaints and depressive affect items, such as I felt everything I did was an effort, My sl eep was restless, I had crying spells, and I felt fearful. The study on elderly Mexican Americans by Miller and his co lleagues (1997) is another example. They argued strongly fo r a two-factor model of the CES-D among elderly Mexican Americans instead of the original four-factor model (Radloff, 1977) which was derived from samples of the general American population. This study presents a case for qualitative differences in factorial structure between two cultural groups. Their findings raise a serious question as to whethe r the CES-D can be used for analyzing differences in depr essive symptoms between el derly Mexican Americans and the general American population. Establishing Struct ural Equivalence Assuming both conceptual and metric equi valence, structural equivalence refers to similarities in the causal mechanism between a construct of depression and its consequences across different racial/ethnic groups (Liang, 20 02). Considering that the

PAGE 30

21 three types of equivalence constitute a hierarchy, depression instruments may be conceptually and metrically equivalent but not structurally equiva lent across different racial/ethnic groups. Structur al equation modeling (SEM; e .g., Crockett et al., 2005) and other techniques such as path analysis have been used to evaluate stru ctural equivalence. A number of researchers point out that SEM is probably the most versatile approach to evaluating metric and structural equivale nce simultaneously (Byr ne & Watkins, 2003; Liang, 2002; MacCallum & Austin, 2000). The study on Latino and Anglo adolescents of Crockett and colleagues (2005) is an example. The authors confirmed metric equivalence of a self-esteem measure across one Anglo and three Latino groups of adolescents, but not metric equivalence of the CES-D scale. They then tested structural equivalence of the CES-D between the four youth groups using multiple-group SEM. They identified similar relations between the CES-D and self-esteem across all four different groups (Anglo, Mexican, Cuban, and Puerto Rican) and concluded that resu lts from the multiple-group SEM supported structural equivalence. Ho wever, it should be noted that this study is not a perfect example for structural equivale nce because of the lack of me tric equivalence of the CESD scale across the Anglo and three Latino groups in this study. However, it is relatively rare to find studies on the stru ctural equivalence of depre ssion measures across different cultural groups. Although a sizable number of studies ha ve been conducted with regard to the cross-cultural comparison of depression, when two or more cultural groups are compared, descriptive and analytical t echniques have been applied of ten without addressing issues concerning conceptual, metric, and struct ural equivalence (Liang, 2002). Given the

PAGE 31

22 above review, assessing cross-cultural comparability of many widely used depression instruments seems to deserve the highest pr iority. Cross-cultural comparisons across racial/ethnic groups may not be justified w ithout resolving these measurement concerns. Cross-Cultural Research with the CES-D Scale One of the major issues in cross-cult ural depression research is how well depression instruments developed on samp les of European Americans can assess depressive symptoms across ra cially or ethnical ly diverse groups. Because optimal depression screens and optimal cut-scores have not been identified for racially or ethnically diverse older adults, it may be impor tant to review evidence for and against the utility of existing screening t ools for depression across divers e cultural groups, as well as to identify optimal cut-scores for each existi ng depression screening to ol. Given the fact that a substantial number of cross-cultural and cross-national st udies on depressive symptoms have used the Center for Epid emiologic Studies-Depression Scale (CES-D) and the main purpose of this proposed dissert ation study is to eval uate the measurement properties of the CES-D across diverse racial /ethnic groups, it may be meaningful to evaluate the cross-cultural applicability of the most widely used self-report depression instrument, the CES-D. The purpose of th is section is (1) to provide a broad understanding on the CES-D scale and (2) to id entify the cross-cultural applicability of the CES-D scale. Description of the CES-D Scale The CES-D was first developed in 19 77, as a 20-item instrument with a dimensional response format (Radloff, 1977). The purpose of the CES-D is to determine the frequency and severity of current depr essive symptoms in community samples.

PAGE 32

23 Respondents are asked how often they have experienced each symptoms of depression during the past week on a fourpoint scale from rarely or none of the time to most or all of the time. Each item is scored 0-3, a nd total scores range fr om 0 to 60, with higher scores indicating more fre quent depressive symptomatology. A suggested standard cutoff score for probable depression on the fulllength questionnaire is equal to or higher than 16 (Andresen, Carter, Malmgren, & Patr ick, 1994). CES-D scores of 16 to 26 are considered indicative of mild depression and sc ores of 27 or more are indicative of major depression (Zich, Attkisson, & Greenfield, 1990 ). Zich and colleagues (1990) found the stringent cutoff score of 27 was more useful for screening medical patients for depression than the standard cutoff score of 16. Concern about respondent burden during surveys of older adults has led researchers to develop shortened versions of the original CES-D (Suthers, Gatz, & Fiske, 2004). To modify the original instrument, researchers have redu ced the number of instrument items, as well as reduced the respons e format from four c hoices to either three or two choices. These shorter versions of the CES-D include the 11-item version administered in the Iowa Established Popul ations for Epidemiological Studies of the Elderly (EPESE), the 10-item version administ ered in the Boston EPESE, and an 8-item version administered in the Health and Retirement Study (HRS)/ Assets and Health Dynamics among the Oldest Old (AHEAD) survey of older adults (Suthers et al., 2004; Turvey, Wallace, & Herzog, 1999). Factor analyses of the CES-D have been conducted since its initi al development. In a recent meta-analysis of th e factor structures of CES-D, Shafer (2006) identified that four specific factors have been generally suppor ted by the majority of studies. These four

PAGE 33

24 factors are generally described as Positive A ffect, Depressed or Negative Affect, Somatic Symptoms, and Interpersonal Problems. Howe ver, there are no formal subscales for the CES-D. More importantly, a number of studies have not replicated four specific factors of the CES-D across diverse racial/ethnic groups (e.g., Crockett et al., 2005; Miller et al., 1997). DSM Criteria and the CES-D Numerous studies have examined the dia gnostic accuracy of screening tests for depression. Several diagnostic instruments have been used to define the presence or absence of depression. The more frequen tly-used instruments include the following: Structured Clinical Interview for DSM-IV (SCID); Diagnostic Inte rview Schedule (DIS); CIDI; and Research Diagnostic Cr iteria (RDC). A sizable nu mber of studies using selfreport depression screening instruments have examined sensitivity and specificity for major depressive disorder, defined by a vari ety of criterion standards, many of which are based on DSM criteria. According to the DSM-IV, the diagnosis of a major depressive episode requires the presence of five significant symptoms, which must include either a predominantly depressed mood and/or loss of interest or pl easure for at least a 2-week period of time (American Psychiatric Association, 1994; Kaplan & Sadock, 2003). Other symptoms may include: significant change in weight or appetite; insomnia or hypersomnia; psychomotor retardation or agitation; fatigue or loss of energy; feelings of worthlessness or excessive guilt; diminished concentration, lo ss of clarity of thought or indecisiveness; and recurrent thoughts of death (American Psychiatric Association, 1994; Kaplan & Sadock, 2003).

PAGE 34

25 Previous research has compared CES-D scores to clinical depression based on DSM criteria (e.g., Haringsma, Engels, B eekman, & Spinhoven, 2004; Watson, Lewis, Kistler, Amick, & Boustani, 2004; Janssen, Beekman, Comijs, Deeg, & Heeren, 2006). For example, Beekman and colleagues (1997) compared the CES-D to major depression diagnoses determined by the Diagnostic In terview Schedule (DIS), which evaluates major depression based on the DSM-III. A cutoff score of 16 on the 20-item CES-D was found to have a sensitivity of 93.2%, specificity of 56.2%, and positive predictive value of 13.2% for major depression during the past year. Compared to false positives, true positives were more likely to be female and to have elevated anxiety, but did not differ on physical illness or cogn itive performance. As another example, focusing on what geront ologists call the old-old adults (i.e., age 75 to 84), Watson and colleagues (2004) compared the Geriatric Depression Scale (GDS) and the CES-D to the SCID-IV depression diagnoses. They reported that the recommended cutoff points of 12 on the GDS and 16 on the CES-D performed poorly in detecting both major and minor depression. Using a traditional cutoff score of 16, for example, the CES-D showed a sensitivity of only 60% and a specificity of 89% for detecting major depression and a sensitivity of only 50% and a specificity of 86% for detecting minor depression. Given the findings the authors suggested the need for more sensitive methods of screening in the healthy older adults. Some studies have compared CES-D short forms to DSM-based instruments (e.g., Irwin, Artin, & Oxman, 1999; Suthers et al ., 2004; Turvey et al., 1999). Irwin and colleagues (1999) compared their screening results from the CES-D against to the SCID as a criterion standard in a community-bas ed sample of adults with known physical

PAGE 35

26 illness. They found that scores of greater th an or equal to 4 had 99% sensitivity and 84% specificity for major depression. As anothe r example, on the basis of the AHEAD data Turvey and colleagues (1999) compared a yes/no format 8-item version of the CES-D to a DSM-based measure, the CIDI-SF. Results indicated that the clearest source of discordance between the two measures wa s found for those who were positive for depression on the 8-item CES-D but not currently depressed on the CIDI-SF. The authors reported that the discordance was explained by respondents who reported depressive symptoms on the 8-item CES-D, but did not meet the CIDI-SF criteria for symptom frequency or dura tion, and were therefore cl assified as noncases. Cross-Cultural Applicability of the CES-D The CES-D has been used in a sizable num ber of cross-cultura l and cross-national studies on depressive sympto ms. A number of previous studies have compared prevalence rates and means of the CES-D across racial/ethnic groups and found evidence of differences in both prevalence rates and means across those groups (e.g., Foley et al., 2002; Krause & Liang, 1992; Mackinnon, Mc Callum, Andrews, & Anderson, 1998). Comparing group means of the CES-D across four racial/ethnic groups (Japanese, Taiwanese, African Americans and Whites in the U. S.), Krause and Liang (1992) found that Japanese elders showed the lowest mean scores on overall depressive symptoms, followed by Taiwanese, Whites, and African Americans. More recently, Inoba and colleagues (2005) found that Japanese tend to ha ve lower mean scores of the CES-D than Whites. In addition to the re ported differences in the mean levels on the CES-D across cultural groups, the probable caseness rates of the CES-D also varied dramatically across diverse racial/ethnic groups, ra nging from 3.5% to more than 30%. For example, Blazer

PAGE 36

27 and colleagues (1998) showed that 9.5% fo r African Americans and 8.8% for Whites fall under the category of probable depression. Us ing the cut-off scores of the CES-D, reported prevalence rates across and within r acial/ethnic groups are the following: 3.5% for Germans (Papassotiropoulos & Heun, 1999); 13.2% for Hispanics and 9.2% for Whites (Swenson et al., 2000); 14% of African Americans (Foley et al., 2002); 19.8% for African Americans (Baker, Velli, Freidman, & Wiley,1995 ); 25.3% for Koreans (Cho, Nam, & Suh, 1998); 25.4% for Mexican Americans (Gonzalez et al., 2001); and more than 30% for Korean Americans (Jang et al., 2005). Given the evidence for different means and rates of probable depression across racial/ethnic groups, a major issue with regard to cross-cultural applicability of the CESD instrument is the extent to which such racial/ethnic group comp arisons reflect true differences in the depressive symptoms or conversely, how much is due to measurement variance in the construct of inte rest. Some studies have led to results suggesting that the CES-D should undergo revision based on the ta rget population. For example, Liang and colleagues (1989) observed lack of consensus on the factor structure of the CES-D and developed a 12-item version of the CES-D in Mexican Americans. In a study of Black and White Americans, Callahan and Wolinsky (1994) found significantly different factor structures for the four race/gender groups, and recommended dropping five items to maximize comparability across race by gender groups. Chapleski and colleagues (1997) also found that a 12-item version of the CES-D was a superior fit to American Indians than the original 20-item s cales. Although the CES-D appeared robust in the face of these minor changes, McCallum and colleague s (1995) suggested these minor changes to

PAGE 37

28 the original CES-D scale may create some potential risks, such as reduced reliability and lack of comparability in norms for screening. Several studies have found that the CESD has acceptable internal consistency as well as Radloffs (1977) four-factor soluti on of depressive symptoms in different racial/ethnic groups (e.g., Blazer et al., 1998; Krause & Liang, 1993; Roberts, 1980). Two studies of Roberts and colleagues (Roberts, 1980; R oberts, Vernon, & Rhoades, 1989), for example, showed acceptable reliabili ty of the CES-D in Mexican-Americans, African Americans, and Anglo Americans. Using confirmato ry factor analysis, they (1989) found that a four-factor structure was supported in Anglo-American and Mexican American psychiatric patients. On the basis of the data from the Duke site of the EPESE, Blazer and colleagues (1998) also confirmed an original four-factor solution of the CESD in both African Americans and Whites. Additionally, Mui and her colleagues (2002) reviewed cross-cultural studi es on the CES-D and pointed ou t its usefulness for assessing depressive symptoms across cu ltures in older adults. Although several studies have replicated th e original four-factor solution, there is growing evidence for the existence of uni que measurement properties of the CES-D across racial/ethnic groups. Factor analyses of the CES-D have yielded a wide range of factors across diverse populat ions, ranging from two to six factors (e.g., Crockett et al., 2005; Liang et al., 1989; Miller et al., 1997; Posner, Stew art, Marn, & Prez-Stable, 2001). For example, on the basis of the Hisp anic EPESE data, Miller and his colleagues (1997) argued strongly for a two-factor model of the CES-D among elderly Mexican Americans instead of a four-factor model whic h was derived from samples of the general American population. Chaplesk i and colleagues (1997) also found the factor solution of

PAGE 38

29 the CES-D was inconsistent with the original four-factor solution when applied to American Indians. More recently, research on adolescents by Crockett and colleagues (2005) found evidence for differences in the CES-D factor structure even among Latino subgroups (Mexican Americans, Cuban American s, and Puerto Rican Americans). This study showed that the original four-factor so lution fit very well for Whites and Mexican Americans while the four-factor solution did not fit adequately for Cuban Americans and Puerto Rican Americans. Given the identi fied different factor structure for Cuban Americans and Puerto Rican Americans, the authors argued that C uban and Puerto Rican Americans have somewhat different con cepts of depression than Whites. Several recent studies have addressed di fferent response patterns to items of the CES-D across racial/ethnic groups (e.g., Cole et al., 2000; Iwata & Buka, 2002; Jang et al., 2005; Kim, Jang, Chiriboga, 2006). Cole and colleagues (2000) examined race/ethnicityspecific response tendencies on the CES-D utilizing differential item functioning (DIF) analysis. They found that Af rican American elders were more likely to endorse items related to interpersonal problems, compared to White elders. Focusing on racially or ethnically diverse young adults Iwata and colleague s (Iwata & Buka, 2002; Iwata et al., 2002) also used DIF analyses to as sess variations in the manifestation. They showed that items related to somatic symptoms and positive affect function differently across racial/ethnic groups in both studi es. Both African Americans and Native Americans tended to endorse somatic symptoms over depressive symptoms, and immigrant Hispanics and Japanese appeared to inhibit the expression of positive affect. More recently, with special attention to th e role of levels of acculturation on the CES-D, Jang and colleagues (2005) found that Ko rean American elders who are lower or

PAGE 39

30 higher on acculturation showed significant diffe rences in response patterns and factor structures of the CES-D. Additionally, DIF analysis suggested that those with lower levels of acculturation were more likely to inhibit responses to positive items of the CESD. Expanding upon the findings from this study, Kim and colleagues (2006) examined variations in response to the CES-D by age a nd the levels of accultu ration, with Koreans in Korea and Korean Americans. Utilizing multiple group confirmatory factor analyses (CFA), metric differences in the CES-D were identified across four cultural groups (low acculturated Korean Americans; high acculturated Korean Americans; Korean middleaged adults; Korean older a dults), and similar parameter estimates were found between low acculturated Korean Americans and Korean older adults. This line of research suggests the substantial cultural influences in expressing depressive symptoms. Several conclusions can be drawn with regard to the appl icability of each instrument and methodological concerns, alt hough variations in samples and analytic techniques used in the previous research, as well as mixed results from the literature make it hard to compare findings. First, cr oss-cultural and cross-national studies on the CES-D confirm its general usefulness for assessing depression in diverse groups of older adults. Second, reported findings from the prev ious research suggest that cultural factors may impact the reportin g of depressive symptoms. Third, changes in various items and factors may improve the metric and structural equivalence of the CES-D when the CESD instrument is applied to culturally dive rse groups. Fourth, more research on the comparative performance of various screeni ng instruments with di verse racial/ethnic groups may be needed. Fifth, while vari ous versions of the CES-D instrument demonstrate excellent reliabilit y, it may need to be furthe r calibrated against clinical

PAGE 40

31 assessments of depression in diverse racial/ethnic groups. Finally, st ructural equivalence across cultural groups also needs to be tested and established, by relating the CES-D to other correlates of depr essive symptoms. Measurement Equivalence of the CES-D Scale As mentioned above, although cross-cultur al and cross-national studies on the CES-D scale confirm its general usefulness for assessing depression in diverse groups, previous studies have suggested group differences in the CES-D items across diverse racial/ethnic groups (e.g., Cole et al., 2000; Perreira et al., 2005; Roberts, Rhoades, & Vernon, 1990). Reported differences in the CES-D items were often found across subgroups of race/ethnicity (e.g., Cole et al ., 2000; Iwata & Buka, 2002), age (e.g., Gatz & Hurwicz, 1990; Hays, Landerman, George, Flint, Koening, Land et al., 1998; Kessler, Foster, Webster, & House, 1992), gender (e .g., Posner et al., 2001; Stommel, Given, Given, Kalaian, Schulz, & McCorkle, 1993), acc ulturation (e.g., Chiriboga et al., 2007; Jang et al., 2005; Kim et al., 2006), and inst rument language (e.g., Roberts et al., 1990), and other important variables. These gr oup differences in the CES-D items can be observed because (1) the CES-D measure accu rately assesses the unde rlying attribute for different groups when they actually differ on that attribute (true di fference) or (2) the CES-D scale inaccurately assesses the underlying attribute for one or more groups when the groups do or do not truly differ on the at tribute (artifactual difference). However, previous research has not fully identified whether such obs erved group differences in the CES-D items represent true differences, are due to differential item functioning (DIF), or are a combination of the two.

PAGE 41

32 One major area found lacking the literature is the assessment of differential item functioning (DIF) for the CES-D across subgroups. As menti oned above briefly, an item functions differentially if people from diffe rent subgroups with equal depression scores on the CES-D do not have the same probability of item endorsement. DIF occurs when the probability of responding to an item de pends on both an individuals depression and other construct-irrelevant factors, such as ra ce/ethnicity, age, or gender (Teresi, 2002). DIF can be a threat to validity since measur es containing DIF items may be invalid for between-group comparisons because their scores are indicative of a ttributes other than that which the test is intended to meas ure (Perkins, Stump, Monahan, & McHorney, 2006). Including DIF items in the CES-D s cale can exaggerate or attenuate true differences between subgroups of race/ethni city, age, gender, or other important demographic variables. The effect of race/ethnicity on the meas urement properties of the CES-D has not been fully identified yet in the previous rese arch and still remains questionable. Most previous studies of measurement bias in the CES-D scale have focused on subscale instead of individual item analysis to trac e differential response s across racial/ethnic groups (Blazer et al., 1998; Callahan & Wolinsky, 1994; Nguyen et al., 2004; Perreira et al., 2005; Roberts et al., 1989). For ex ample, Callahan and Wolinsky (1994) found differences in the CES-D factor structur e by racial/ethnic grou p, although the study was limited due to missing data. More importantl y, even some previous investigations of race/ethnicity bias in the CE S-D scale did not include older populations for their analyses (e.g., Iwata & Buka, 2002; Iwat a et al., 2002). Only a single study has used an appropriate method to assess item bias in the CES-D scale by race/ethnicity and tested the

PAGE 42

33 effect of race/ethnicity on the CES-D items among older adults. Cole and colleagues (2000) compared two racial/ethnic groups and found evidence for racial/ethnic item bias in the CES-D scale. The authors suggested that Blacks were more likely to endorse higher levels of the two inte rpersonal problem items (people are unfriendly and people disliked me). To date, none of studies fully considered and tested DIF items in the CESD scale across three or more racial/ethnic elderly groups. More investigation will be needed regarding race/ethnicity-specific DIF items in the CES-D scale, and future research on item bias in the CES-D needs to include Hispanic population. Additionally, for future research, it may be meaningful to explain why diverse racial/ethnic groups respond differentially to the items of th e CES-D and what factors across diverse racial/ethnic groups can contri bute to different response pa tterns to the CES-D items. The effect of demographic character istics (e.g., age, gender, educational attainment) on the measurement properties of the CES-D in diverse racial/ethnic groups has not been identified yet and still remain s uncertain. Previous studies found different measurement properties of the CES-D scal e across subgroups of age (e.g., Gatz & Hurwicz, 1990) and gender (e.g., Berkman, Be rkman, Kasl, Freeman, Leo, Ostfeld et al., 1986; Callahan & Wolinsky, 1994). However, most studies on the measurement properties of the CES-D were limited to test th e factor structure instead of testing item bias in the CES-D scale. Moreover, alt hough some of previous studies have used appropriate methods to test item bias in the CES-D across demographic subgroups (e.g., Cole et al., 2000; Stommel et al., 1993), these studies did not consider the effect of demographic variables on the CEC-D items with in each racial/ethnic group. For example, although Cole and colleagues (2000) conducted item-level biases in the CES-D across

PAGE 43

34 subgroups of age, gender, and race/ethnicity, th e authors did not anal yze ageand genderrelated measurement bias in the CES-D with in each racial/ethni c group. Because each racial/ethnic group has its unique demographi c profiles which may unde rlie racial/ethnic group differences (Alwin & Wray, 2005) and th ese cultural and social differences across diverse racial/ethnic populations may result in different responses to depressive symptom items (Nguyen et al., 2004), it may be meaningf ul to examine demographic-related item bias in the CES-D scale within each racial/e thnic group. Special lack ing in the previous research was the effect of educational attain ment: none of the previous research focused on the effect of educational attainment on the measurement properties of the CES-D among older adults. Recent research suggests that acculturation and instrument language significantly affect different response pa tterns to items of the CES-D (Chiriboga, Jang, Banks, & Kim, 2007; Jang et al., 2005). Given that acculturation has been referred to simply as the degree to which people change when faced w ith the challenge of living in a cultural context differing from their own (Berry & Kim, 1988; Trimble, 2003), people who are more acculturated may be more likely to adop t the ways of thinking and expressing that characterize the host culture. In contrast those who are less acculturated may be less accepting the new ways of thinking and expressing themselves, and instead hold onto the culture and behaviors that refl ect their culture of origin. The differences in thinking, feeling, and expressing themselves may in fluence the ways in which depressive symptoms are organized and expressed, as well as have implications for the measurement nonequivalence of screening tools for depres sive symptoms. In addition, literature suggests that language significan tly affects the measurement properties of the CES-D in

PAGE 44

35 racial/ethnic populations. Fee lings reported in a native language may be expressed with more emotion than those expressed in a second language (Cuellar & Roberts, 1984; Roberts et al., 1990). Given that previous studies showed higher correlations between English proficiency and acculturation, it may be interesting to find out the effect of both acculturation and instrument language on the measurement properties of the CES-D in diverse racial/ethnic groups. Findings from this line of research may suggest the substantial cultural differences even within the same racial/et hnic group. Analytic Strategies for Meas urement Equivalence Research Researchers have used several analytic strategies and sta tistical methods to evaluate racial/ethnic group di fferences in depressive symptoms itself. For example, studies showing differences in prevalence rate s of depression across racial/ethnic groups have used chi-square test, providing sensitivity and specifi city (e.g., Baker et al., 1995 ; Beals, Manson, Whitesell, Mitchell, DKovins Simpson et al., 2005; Beekman et al., 1997; Madianos, Gournas, & Stefanis, 1992; Medina-Mora et al., 2005). Some researchers have used receiver operating ch aracteristic (ROC) analysis to provide different cut-off points across racial/e thnic groups (e.g., Papassotiropoulos & Heun, 1999; Somervell, Beals, Kinzie, Boehnlein, Leung, & Manson, 1993). There have been some studies using ANOVA or T-test to comp are means of depressive symptoms across cultural groups (e.g., Krause & Li ang, 1992; Lee & Farran, 2004). A number of studies evalua ting factor structure of de pression instruments have used exploratory factor analysis (EFA; e.g., Callahan & Wolinsky, 1994; Foley et al., 2002; Van Tran, 1997). EFA is useful in a pr eliminary stage in identify the number of factors and the underlying pattern of factor loadings within a given cultural group.

PAGE 45

36 Although EFA has been used to assess metr ic equivalence, sp ecifically, factorial invariance, EFA is insufficient for assessing comparability for several reasons. Liang (2002) suggested the following three reasons. First, in EFA, si gnificance tests of the differences in factorial structure cannot be applied. Second, comparisons of factor configurations are accomplis hed by using Pearson correlation matrices. Such comparisons are confounded by variance differences between samples because correlation coefficients are standardized in terms of sample variances. Third, in EFA, little a priori specificatio n is involved in deriving the factor structure. A number of other researchers have also noted the limitations of EFA (e.g., Teresi, 2002; van de Vijver & Leung, 2000). Measurement equivalence is typically assessed through the use of statistical analyses comparing the propertie s of an instrument in two or more groups. In order to distinguish a lack of measurement equivalen ce (i.e., differential item functioning; DIF), a problem with the instrument, from impact, tr ue differences in the trait distributions, researchers have used two popular methods capable of detecting DIF; one is confirmatory factor analysis (CFA) and the other is item re sponse theory (IRT). A number of studies have suggested that CFA may be the most ve rsatile approach to evaluating metric and structural equivalence simultaneously (e .g., MacCallum & Austin, 2000; Raju et al., 2002). The assessment of measurement equi valence across cultural groups usually involves the use of multiple-group CFA (van de Vijver & Leung, 2000). The advantages of CFA with regard to cro ss-cultural research are that it 1) allows for detailed comparisons of factor models across cultural groups, 2) allows for a comparison of latent means, and 3) is not unduly influenced by relatively small sample sizes.

PAGE 46

37 There are advantages ascribed to IRT, wh ich has also been reported as a powerful method for examining measurement equivalence at the item or scale level (Bingenheimer, Raudenbush, Leventhal, & Brooks-Gunn, 2005). Assuming that all items reflect the same underlying construct, IRT allows for th e comparison of scale scores even when individuals or groups of indivi duals answer different items. In addition, IRT parameters are subpopulation invariant. They do not depe nd on the distribution of trait scores in examinee groups, in contrast to more conventional analyses where item means depend on sample particulars (e.g., the same item can have a low mean in one group and a high mean in another) (Hulin, Drasgow, & Parsons 1983; Meade & Lautenschlager, 2004; van de Vijver & Leung, 2000). There have been a number of studies recently that have compared CFA and IRT DIF detection procedures (e.g., Meade & Lauten schlager, 2004; Raju et al., 2002). These previous studies compared traditional CFA-ba sed procedures with IRT-based procedures that differed with regard to hypot hesis testing strategies and ta rgets of analysis. Recently, however, Stark, Chernyshenko, and Drasgow (2006) proposed and tested a common strategy for detecting DIF with CFA and IR T based on the likelihood ratio (LR) test. Their method, which involved comparing statis tically correct free-ba seline models with a series of constrained models that examined item loadings/discrimination parameters and intercepts/location parameters simultaneously, showed higher power an d low Type I error rates across simulation conditions for both CFA and IRT. Issues Needed for Future Research Given the preceding review of conceptual issues in cross-cultural depression research and cross-cultural research with the CES-D scale, some conceptual and

PAGE 47

38 methodological concerns and problems can be id entified and these issues may be in needs for future research in the field of depression. First, with regard to dataset issues, the majority of studies with the CES-D scale have used data collected from one racial/ethnic or cultural group to test m easurement equivalence of the CES-D. To assess cultural group differences in the CES-D scale, as we ll as to make a mean ingful comparison of depression with the CES-D scale across diverse racial/ethnic groups, collecting comparable data may deserve the highest prior ity. In addition, future research should pay attention to find nationally representative datasets with inclusions of the original version of the CES-D scale and racially/e thnically diverse elderly groups. Second, when two or more racial/ethni c groups are compared, descriptive and analytical techniques have been applied often without addressing issues concerning conceptual, metric, and structural equivalen ce. Given that cross-cultural measurement equivalence requires at least three interrela ted conditions which are conceptual, metric, and structural equivalence (Markides et al., 1990; van de Vijver, 2001), equivalence of the instruments must be established to ma ke a meaningful comparison across diverse racial/ethnic groups. Currently, much more re mains to be learned about the cross-cultural comparability of the CES-D scale. Third, although previous research raised the possibility of the nonequivalence of the CES-D across different racial/ethnic groups cross-cultural depression studies have not paid enough attention to how the CESD measure can be improved to screen for depression. The consequences of using none quivalent instruments may be potentially serious for several reasons. For example, if the CES-D commonly used to screen for cases of depression is valid and accurate for one cultural group but less for another

PAGE 48

39 cultural group, applying th e standard cut-scores of the CES-D may lead to misclassification in the second gr oup, resulting in false positives false negatives, or both. Thus, researchers should pay more attent ion to making the CES-D more reliable and culturally valid, as well as to establishi ng measurement equivalence of the CES-D. Fourth, and perhaps most important, the e ffects of race/ethnicity and demographic characteristics on the measurement properties of the CES-D have not been identified and still remain questionable. Given that some of racial/ethnic differences in mental health may be explained by differences in the age, gender and socio-economic composition of each racial/ethnic group (U.S Department of Health and Human Services, 2001), differences in the demographic characteristic s (age, gender, and educational attainment) of each racial/ethnic group ma y also explain different responses to the CES-D scale, which may in turn contribute to detecting DIF items that function differently across subgroups of each racial/ethnic group. By id entifying demographic-related item bias in the CES-D within each racial/ethnic group, future research can explain why diverse racial/ethnic groups respond differentially to the certain items of the CES-D scale. Currently, much more remains to be learne d about demographic-specific depressive symptom items that function differentially ac ross cultures and sub-cultural groups within each racial/ethnic group, as well as a core set of depressive symptom items that function equivalently across cultures a nd subgroups of age, gender, and educational attainment. Fifth, acculturation-related item bias in th e measurement properties of the CES-D has not been extensively identified am ong diverse racial/ethnic groups. Because literature suggests that level of acculturati on is an important variable to explain differences in thinking and expr essing depressive symptoms within racial/ethnic minority

PAGE 49

40 group (Chiriboga, 2004; Chiriboga et al., 2007; Myers & Rodriguez, 2003) and even explained racial/ethnic differences in depres sion by differences in age, gender, and SES of each group may not be meaningful without controlling for acculturation levels within racial/ethnic groups (Myers & Rodriguez, 2003), it may be expected to find the unique role of acculturation on item biases in the CES-D, as well as other depressive symptom inventories. In addition, gi ven that acculturation levels and language proficiency are highly interrelated and instrument language it self significantly affects measurement bias in racial/ethnic populations, it will be meani ngful to examine the combination role of acculturation and instrument language. Findings from this line of research may suggest the substantial cultural differences even with in the same racial/et hnic group. Lastly, although at present, CFA and IR T are two popular methods to detect a lack of measurement equivalence, none of published studies involving item level comparisons in depressive symptom inst ruments compared CFA and IRT results. Considering that only some simulation st udies are now beginning to compare CFA and IRT DIF detection procedures and appear in the research literature (e.g., Gonzalez-Roma, Hernandez, & Gomez-Benito, 2006; Stark et al., 2006), testin g DIF analyses with both CFA and IRT may be clearly needed for fu ture study in cross-cultural depression research. Moreover, it may be very mean ingful to use a common strategy for DIF detection across both CFA and IRT and compare the results. The combined use of the two statistical approaches may increase the a ccuracy for testing metric equivalence of depressive symptom inventorie s as well as other mental health instruments and may be the first step towards applying the integrative method of detecting DIF across diverse groups.

PAGE 50

41 Study Design This dissertation was designed to examin e cultural differences in the measurement properties of the CES-D among racially/ethni cally diverse older adults within the framework of integrating Sternbe rgs Type I (the nature of depression is the same across cultures and the instruments used to measure it are the same) and Type II (the nature of depression is different across cultures, but the instruments used to measure it are the same across cultures) regarding the relationship of culture to depression. As mentioned earlier, three major hypotheses for the dissertation we re (1) the measurement properties of the CES-D would vary by race/ethnicity among olde r adults, (2) the measurement properties of the CES-D scale would vary by age, gender, and educational attainment group within each racial/ethnic elderly group, and (3) the measurement properties of the CES-D would vary by the level of acculturation and instrume nt language in older Mexican Americans. Three steps for the disserta tion were designed to test the abovementioned three hypotheses in a separate study and the outline for the three studi es was shown in Figure 2. Study 1 was designed to examine Research Question 1. The purpose of this study was to test whether the CES-D items functi on differentially across three racial/ethnic group, older Whites, Blacks, and Mexican Ameri cans. Study 1 was expected to identify race/ethnic-specific items in the CES-D that function differentially across three racial/ethnic groups, as well as a core set of depressive symptom items in the CES-D that function similarly across three racial/et hnic groups. Findings from Study 1 were expected to address limitations of the CESD as a tool for making cross-racial/ethnic comparisons.

PAGE 51

42 Study 2 was designed to examine Research Question 2. This study aimed to assess the role of sociodemogr aphic characteristics (i.e., ag e, gender, and educational attainment) on the measurement properties of the CES-D within each racial/ethnic elderly group (Whites, Blacks, and Hispanics). Study 2 hypothesizes that the measurement properties of the CES-D scale will vary by ag e, gender, and educational attainment group within each racial/ethnic elde rly group. For the purpose of this Step 2, differentiation was made with respect to age (younger than 75 vs. 75 or older), gender (male vs. female), and educational attainment (Less than 8 th grade vs. 8 th grade or more) variables for each racial/ethnic group. Results from Study 2 wa s expected to identify age-, gender-, and educational attainment-specific items in th e CES-D within each racial/ethnic elderly group, as well as a core set of items in the CES-D functioning equivalently across subgroups of age, gender, and educational at tainment within each racial/ethnic elderly group. Findings from Study 2 may provide a broader understanding of cultural group differences in the CES-D among racially/ethnically diverse older adults and be a basis for developing culturally equivalent depression measures across di verse racial/ethnic groups and subgroups within each racial/ethnic group. Study 3 was designed to examine Research Question 3. The purpose of Study 3 was to identify the measurement properties of the CES-D across acculturation levels and instrument language among older Mexican American s. In this step, this dissertation was expected to identify acculturation and instru ment language-specific items in the CES-D that function differentially in older Mexican Americans, as well as a core set of CES-D items that function equivalently across s ubgroups within the Mexican American group. With special attention to the role of acculturation and instrument language on the

PAGE 52

43 measurement properties of the CES-D, differentiation was made with respect to the level of acculturation (low vs. high acculturated groups ) and instrument language (Spanish vs. English) for older Mexican Americans. Fi ndings from Study 3 may provide evidence of measurement nonequivalence of the CES-D scal e even within a specific subgroup of the overall Hispanic population.

PAGE 53

44 New Haven EPESE (N = 2,340) Hispanic EPESE (N = 2,623) Study 1: Research Question 1: Do the measurement properties of the CES-D vary by race/ethnicity? Whites (N = 1,876) Blacks (N = 464) Mexican Americans (N = 2,623) 1. Age :1) <75 1. Age :1) <75 1. Age :1) <75 2) 75 or older 2) 75 or older 2) 75 or older 2. Gender :1) Male 2. Gender :1) Male 2. Gender :1) Male 2) Female 2) Female 2) Female 3. Education :1) <8 th grade 3. Education :1) <8 th grade 3. Education :1) <8 th grade Study 2: Research Question 2: Do the measurement properties of the CES-D vary by sociodemographic characteristics within each racial/ethnic group? 2) 8 th or more 2) 8 th or more 2) 8 th or more 1. Acculturation :1) Low Acculturation 2) High Acculturation :1) Spanish 2. Instrument Language 2) English :1) Low Acculturation Spanish English 2) High Acculturation Spanish Study 3: Research Question 3: Do the measurement properties of the CES-D vary by acculturation and instrument language in older Mexican Americans? 3.Acculturation x Instrument Language English Figure 2. Outline for the Dissertation

PAGE 54

45 CHAPTER TWO: STUDY 1 RACE/ETHNICITY-RELATED MEASUREMENT BIAS Introduction A growing body of literature documents raci al/ethnic disparities in depressive symptoms (e.g., Coyne & Marcus, 2006; Dunlop, Song, Lyons, Manheim, & Chang, 2003). A number of cross-cultural and cr oss-national studies on these depressive symptoms have found evidence that preval ence rates of probable depression vary dramatically across diverse racial/ethnic groups, with the rates ranging from 1.5% to 32.0% (e.g., Blazer, Landerman, Hays, Simons ick, & Saunders, 1998; Gonzalez, Haan, & Hinton, 2001; Dunlop et al., 2003; Foley, R eed, Mutran, & DeVellis, 2002; Mui, Burnette, & Chen, 2002). For example, G onzalez and colleagues (2001) reported that 25.4% of Mexican American elders showed ev idence of probable depression, while Foley and colleagues (2002) showed 14.0% of African American elders fall under the category for probable depression. Because of cultura l differences across diverse racial/ethnic groups, in fact, it has become a virtual truism in cross-cultural research that racially/ethnically di verse groups manifest different prevalence rates of probable depression and different group m eans on standard inventories. However, the question of whether differences across diverse groups have any practical implications for how the currently existing depression instruments s hould be used remains in controversy. The relationship of culture to depression has been explained by two different perspectives, the emic and the etic (Can ino, Lewis-Fernandez, & Bravo, 1997; Pike,

PAGE 55

46 1954; Triandis & Brislin, 1984). The emic appro ach to cross-cultural depression research assumes that every culture has its own unique ways of manifesting and expressing symptoms and suggests using variables and obser vations that are tailore d to the particular cultural group when developing a depression instrument (Rait & Burns, 1998). This approach argues that when the same instrument s are used across cultures, meanings of the depression scores may differ from one culture to another. Followi ng this approach has led to the development of several indigenous depression instruments, such as American Indian Depression Scale (AIDS; Manson et al., 1985) and the Shona Symptom Questionnaire (Patel, Simunyu, Gwanzura, Lewis, & Mann, 1997) in Zimbabwe. In contrast, etic researchers assume that the nature of depression is essentially the same across cultures and that, using appropriate translations when necessary, this nature can be measured identically without regard to culture (e.g., Worl d Health Organization, 1983). Etic studies of depression have led to the development of a number of depression instruments including self-re port scales (e.g., Beck Depression Scale, Center for Epidemiologic Studies Depression Scale, Geri atric Depression Scale) and interviewer or clinician rating scales (e.g., Structured Clinical Inte rview for DSM-IV, WHO Composite International Diagnostic Intervie w). This line of research as sumes that core depressive symptoms exist cross-culturally and cross-nationa lly and that the key i ssue is variation in levels of depressive symptoms across cultural groups. Although emic and etic approaches have their own views on the relationship of culture to depressive symptoms, where these pe rspectives basically agree is that culture shapes values, attitudes, belie fs, and behaviors (Sternberg, 20 04) and may affect ways in which people experience and express their symptoms of depression (Kleinman, 2004).

PAGE 56

47 However, it is unfortunate that neither approach has been able to provide clear guidelines as to how to analyze cultural influences on existing depressive symptom inventories. One unresolved issue in assessing depr essive symptoms of different cultural groups has been the equivalence of measures (e.g., Crockett, Randall, Shen, Russel, & Driscoll, 2005; Liang, 2002). Measurement equivalence is of par ticular importance in cross-cultural depression research because if depressive symptom measures have differential meanings or validity across dive rse cultural groups, group comparisons would be misleading and the prevalence rates of de pression would be inaccurate (Crockett et al., 2005). When self-report depression measures are used, researchers should pay particular attention to the issue of measurement e quivalence since it may be unclear whether different depressive symptom scores across cultural groups are caused by actual differences in depression or problems with the depression instrument used (i.e., differential item functioning; DIF). If people from different cultures with equal depression scores are more or less likely to endorse specific depressive symptom items as a consequence of cultural gr ouping, those specific items endorsed more or less by a group of people will function differently across groups or show DIF (Teresi, 2002). Measures containing DIF items may be invalid for betw een-group comparisons because their scores are indicative of attributes other than that wh ich the test is intended to measure (Perkins, Stump, Monahan, & McHorney, 2006). Therefor e, a high priority in assessing crossracial/ethnic comparability of many widely used depression instruments should be to distinguish a lack of measurement equivalence, which is known as DIF, from impact, true group differences in standing on the same latent trait, as well as to find appropriate methodology that can detect the measurement bias accurately.

PAGE 57

48 Given the abovementioned importance of de tecting DIF in cross-cultural studies, researchers have generally used one of two methods cap able of detecting DIF; confirmatory factor analysis (CFA) or item response th eory (IRT). Most studies examining DIF in depression instruments have only used one of these methods. The methods, however, may not yield the same resu lts, since they differ in terms of their mathematical models (e.g., linear model vs. non linear models), target of analysis (e.g., testing invariance of loading parameter first a nd intercepts later vs. testing invariance of discrimination and location parameters simu ltaneously), and strategies for hypothesis testing (testing free baseline model vs. testing constrained baseline model) (Raju, Laffitte, & Byrne, 2002; Stark, Chernyshenko, & Drasgo w, 2006). The potenti al discrepancy in results raises serious questions as to how to interpret results. At present studies involving item level comparisons in depressive symp tom instruments have not included both CFA and IRT results, and only now are some si mulation studies beginning to compare these two DIF detection procedures (e.g., Meade & Lautenschlager, 2004; Raju et al, 2002; Stark et al., 2006). It is th erefore meaningful to empl oy both CFA and IRT methods. One of the objectives of the present study was to use a common strategy across both CFA and IRT as a means of detecting DIF in the widely-used Center for Epidemiological Studies Depression Scale (CES-D). Since its development on samples of Eu ropean Americans, the CES-D has been used in a substantial number of cross-cultural studies on depressive symptoms. Several studies on the CES-D have found acceptable inte rnal consistency (Mui et al., 2002) and have confirmed its general efficacy in dete cting depression across diverse racial/ethnic groups of older adults, including African Americans (e.g., Blazer et al., 1998) and

PAGE 58

49 Mexican Americans (e.g., Gonzalez et al., 2001; Swenson, Baxter, Sh etterly, Scarbro, & Hamman, 2000). Despite its ge neral usefulness in cross-cultural application, there has been growing evidence that the CES-D has unique measurement properties across racial/ethnic groups (e.g., Ca llahan & Wolinsky, 1994; Crockett et al., 2005; Foley et al., 2002; Miller, Markides, & Blacks, 1997; N guyen, Kitner-Triolo, Evans, & Zonderman, 2004; Perreira, Deeb-Sossa, Ha rris, & Bollen, 2005; Roberts, Vernon, & Rhoades, 1989). For example, Miller and colleagues (1997) ar gued strongly for a two-factor model of the CES-D among elderly Mexican Americans instead of the classic four-factor model that was derived from samples of the general American population. Foley and colleagues (2002) also found evidence for differences in the CES-D factor st ructures among older African Americans: they f ound no distinction between depressive affect and somatic symptom factors. As mentioned above, howev er, most previous studies investigating unique measurement properties of the CES-D scale in diverse cultural groups have focused on subscale analyses of the CES-D. Given that cultural differences across diverse racial/ethnic groups may be rela ted to unique patterns of response (e.g., McHorney & Fleishman, 2006), these subscale analyses may not be enough to understand and determine unique psychometric propertie s of the CES-D (Teresi, 2002). The potential of the CES-D items to function di fferentially across multi racial/ethnic elderly groups thus becomes a priority in cross-cultural research. A few studies have used DIF methods to investig ate the CES-D item bias by race/ethnicity and found evidence of differen tial function (e.g., Cole Kawachi, Maller, & Berkman, 2000; Iwata & Buka, 2002; Iwata, Turner, & Lloyd, 2002). In one DIF analysis conducted among White, Japane se, Native American and Argentinean

PAGE 59

50 undergraduates, Iwata and Buka (2002) found evidence that Whites are predisposed to endorse positive CES-D items and that Japane se and Argentineans are more likely to inhibit endorsement of the same positive item s. Generalizations from this study were limited due to the use of a small and non-representative sample. Testing DIF among representative samples of younger adults, Iwat a and colleagues (2002) suggested that two of four positive items (I felt hopeful about fu ture and I enjoyed life) showed DIF and were over-endorsed by African Americans, and Hispanics showed tendencies to inhibit the expression of positive affect. In the field of gerontology, only two studies (Cole et al., 2000; Yang & Jones, in press) investigated race/ethnic item differences on the CES-D. Using the same New Haven EPESE data set em ployed in the present investigation, both studies found evidence that Blacks were more likely than Whites to endorse two interpersonal relation items (people are unfrien dly and people disliked me). To date, no study has fully considered and tested DIF items in the CES-D across three or more racial/ethnic elderly groups. Moreover, none have included older Mexican Americans, the largest subgroup of Hispan ics in the United States, fo r identifying DIF items on the CES-D. For this reason, more investigation is clearly needed rega rding race/e thnicityspecific DIF items in the CES-D scale among diverse groups of older adults including Mexican Americans. Taking the abovementioned issues togeth er into account, the purpose of this present study was to examine the cultural e quivalence of the CES-D items across three racial/ethnic elderly groups including Whites, Blacks, and Mexican Americans. Specifically, the present study focused on identi fying race/ethnicity-related DIF items in the CES-D scale that function differentially, as well as a core set of CES-D items that

PAGE 60

51 function equivalently across racially/ethnically diverse elderly groups. In order to take a conservative approach to identifying DIF, only items identified by both the CFA and IRT methods as IDF were treated as such. Methods Sample The present study used two national da tasets. The New Haven Established Populations for Epidemiologic Studies of the Elderly (EPESE) provided the White and Black samples and the Hispanic Established Po pulations for Epidemiologic Studies of the Elderly (H-EPESE) provided the Mexican Amer ican sample. These two datasets were selected because of (1) their in clusion of older adults aged 65 or older; (2) their use of the original 20-item CES-D with f our-point rating scale; and (3 ) their inclusion of diverse racial/ethnic groups, especially Blacks in New Haven EPESE and Mexican Americans in the H-EPESE. The New Haven EPESE is a longitudinal study of community-dwelling participants aged 65 or older collected in one of four geogr aphic locations (East Boston, New Haven, Iowa, and North Carolina) and includes Whites (N = 2,283) and Blacks (N = 529) at baseline (1982). The H-EPESE is al so a longitudinal study of Mexican American participants (N = 3,050) aged 65 or older that collected from Texas, New Mexico, Colorado, Arizona, and California and was modeled after the design of the EPESE studies in order to compare with other populations in 1993-4. Using the first waves of the New Ha ven EPESE and H-EPESE, subjects (2,283 Whites, 464 Blacks, and 2,623 Mexican Americans) were included in the analyses if they responded to all 20 CES-D items. Using list wise deletion method, 407 subjects of Whites (17.8 % of the total Whites), 65 subjects of Blacks (12.3 % of the total Blacks), and 427

PAGE 61

52 subjects of Mexican Americans (14.0 % of th e total Mexican Americans) were excluded in the analyses due to missing data. Rem oved participants in each group had similar characteristics in terms of gender distributi on, but were more likely to be older for all three racial/ethnic groups. Measures CES-D Scale Both the New Haven EPESE and H-EPESE us ed the original 20-item version of the CES-D (Radloff, 1977). Respondents in both datasets were asked to report how often each symptom was experienced during the past week, and their symptoms were rated on a 4-point likert scale, with categories presente d in the following orde r: rarely or none of the time (coded as 0), some or a little of the time (coded as 1), much of the time (coded as 2), and most or all of the time (coded as 3). The four positive items were reverse-coded and scale scores were co mputed by summing across twenty items to produce total scores ranging from 0 (no depr essive symptoms) to 60 (severe depressive symptoms). Scores of 16 or higher are typically viewed as evidence of probable depression (Andresen, Carter, Malmgren, & Pa trick, 1994). Internal consistency of the CES-D was satisfactory in the present sample: = .86 for Whites, = .84 for Blacks, and = .88 for Mexican Americans. Statistical Analysis Sample characteristics were first compar ed for three racial /ethnic groups. The ANOVA and the chi-square test were used to test for group differences, with p < .05 indicating a statistically significant difference. For descriptive purposes, the prevalence of probable depression based on the CES-D (total score 16) was compared using a chi-

PAGE 62

53 square test. Individual item mean and standard deviation (SD) for the CES-D scale were also compared, with p < .05 indi cating a statistic ally significant difference. Information on item responses of the 20 CES-D items was provided for Whites, Blacks, and Mexican Americans. Because the DIF detection methods used in this investigation assumed that a single dominant factor underlies item respons es, the unidimensionality of the CES-D was investigated using a princi pal component analysis (PCA) via SPSS and a confirmatory analysis via LISREL 8.8 (Jreskog & Srbom 2006). The PCA results indicated that while the analysis produced 4 factors for Whites (1 st factor = 5.86, % variance = 29.31; 2 nd factor = 1.32, % variance = 6.61; 3 rd factor = 1.20, % variance = 5.99; 4 th factor = 1.05, % variance = 5.26), 5 factors for Blacks (1 st factor = 5.49, % variance = 27.45; 2 nd factor = 1.39, % variance = 6.98; 3 rd factor = 1.17, % variance = 5.88; 4 th factor = 1.08, % variance = 5.43; 5 th factor = 1.02, % variance = 5.07) and 3 factors for Mexican Americans (1 st factor = 6.84, % variance = 34.20; 2 nd factor = 2.27, % variance = 11.35; 3 rd factor = 1.14, % variance = 5.70), the ratio of first to second eigenvalue was 4.41, 3.95, and 3.01, respectively, which suggests that the data are essentially unidimensional (Lord, 1980; Stout, 1990). This result was supported by one-factor confirmatory factor analyses, where goodness-of-fit indices for th ree racial/ethnic gr oups all exceeded .90, indicating generally good fits of the one-factor model to the data. In these analyses Whites showed slightly better goodness-of -fit indices than Blacks and Mexican Americans (for Whites, CFI = .95, NFI = .94, NNFI = .95; for Blacks, CFI = .93, NFI = .90, NNFI = .93; and for Mexican Americans, CFI = .91, NFI = .90, NNFI = .90). These results confirmed the overall unidimensionality of the CES-D.

PAGE 63

After verifying that the CES-D data overall were sufficiently unidimensional, the application of IRT and CFA DIF detection using the likelihood ratio tests proceeded. In both methods, this study followed the general approach to hypothesis testing described by Stark et al. (2006), since the latter showed high power and low Type I error rates across a wide variety of simulation conditions. In essence, the authors suggested testing for DIF by a common strategy that can be implemented in both CFA and IRT, which is called free baseline with Bonferroni correction. From CFA, this strategy involves use of a fully free baseline model (with the exception of a single referent item), which is statistically appropriate as the basis for subsequent nested model comparisons where one item at a time is constrained to be equal across groups. From IRT, this strategy incorporates the ideas of simultaneous comparisons of item parameters (discrimination-loadings and locations-intercepts) and strict p-values for flagging DIF items. Detail analytic procedures for CFA and IRT are described below. 542 CFA DIF analyses involving item loadings and intercepts were conducted using an analogous strategy with LISREL 8.8 (Jreskog & Srbom, 2006). Information on item loadings and intercepts for three groups (Whites, Blacks, and Mexican Americans) is shown in Table 1. Using the free baseline model, where only the parameters of the referent are constrained across groups, baseline and constrained models were run in succession and the chi-square difference statistics for the nested model comparisons were evaluated using a Bonferroni corrected critical p-value. When the observed chi-square difference was greater than the corresponding critical chi-square value (Bonferroni corrected, = 11.88 with 2 degrees of freedom), the item was flagged DIF. CFA DIF detection

PAGE 64

= loca tion paramters 55 Table 1. Study 1 Item Parameter Estimates for Whites, Blacks, and Mexican Americans Note. (tau) = threshold (= intercept); a = discrimieter; b (lam bda) = loading (=slope); nation param 1, b 2, andb 3 e Whites (N = 1,876) Blacks (N = 464) Mexican Americans (N = 2,623) CFA IRT CFA IRT CFA IRT CES-D Item a b 1 b 2 b 3 a b 1 b 2 b 3 a b 1 b 2 b 3 1 1.00 0.44 1.39 0.85 2.14 2.52 1.00 0.36 1.52 1.11 2.28 2.36 1.00 0.43 0.82 1.77 2.55 0.82 2 0.78 0.35 1.13 1.45 2.59 2.92 0.91 0.43 1.15 1.16 2.29 2.72 0.84 0.35 1.05 2.05 2.85 1.05 3 1.21 0.33 2.30 0.97 1.79 2.21 0.93 0.28 2.17 1.10 2.17 2.29 1.16 0.37 0.81 1.58 2.15 0.81 4 0.57 0.29 0.89 2.23 2.73 3.79 0.56 0.26 1.02 2.13 2.54 3.83 0.41 0.87 0.92 1.95 2.88 0.92 5 0.80 0.40 1.16 1.03 2.53 3.09 0.71 0.34 1.30 1.05 2.82 3.16 1.00 0.43 0.73 1.72 2.57 0.73 6 1.51 0.50 2.71 0.46 1.55 1.90 1.28 0.44 2.64 0.56 1.78 1.99 1.37 0.54 0.41 1.35 2.06 0.41 7 1.30 0.51 1.74 0.72 1.67 1.98 1.03 0.55 1.30 0.85 1.94 2.16 1.27 0.61 0.44 1.37 2.04 0.44 8 1.19 0.93 0.97 0.48 0.84 1.80 0.62 0.99 0.52 0.59 1.21 2.95 0.58 0.94 0.03 1.43 2.56 0.03 9 0.65 0.20 1.47 1.72 2.67 3.01 0.86 0.20 1.87 1.57 2.30 2.59 0.74 0.26 1.28 2.08 2.72 1.28 10 0.72 0.27 1.48 1.31 2.66 2.95 0.94 0.29 1.58 1.23 2.40 2.70 0.80 0.32 0.98 2.05 2.86 0.98 11 1.16 0.59 1.25 0.72 1.81 2.11 0.94 0.50 1.18 0.78 2.31 2.60 1.05 0.55 0.69 1.59 2.39 0.69 12 1.49 0.62 1.70 0.65 1.05 2.12 1.24 0.52 1.53 0.79 1.36 2.55 0.85 0.85 0.18 1.37 2.18 0.18 13 0.76 0.35 1.10 1.43 2.54 3.01 0.75 0.36 1.26 1.33 2.38 2.76 0.90 0.39 0.95 1.84 2.69 0.95 14 1.39 0.50 1.95 0.62 1.62 1.93 1.24 0.48 1.97 0.63 1.72 2.03 1.26 0.47 0.66 1.52 2.05 0.66 15 0.43 0.20 1.03 2.18 3.48 3.94 0.54 0.37 0.89 1.58 3.13 3.49 0.56 0.25 1.66 2.46 2.99 1.66 16 1.32 0.46 1.74 1.00 1.34 2.20 0.90 0.34 1.43 1.28 1.92 3.08 0.72 0.91 0.14 1.33 2.17 0.14 17 0.74 0.21 1.79 1.45 2.42 2.88 0.54 0.14 1.57 1.88 3.01 3.16 1.21 0.42 0.75 1.51 2.20 0.75 18 1.27 0.44 2.40 0.53 1.77 2.26 1.28 0.43 2.48 0.56 1.90 2.13 1.40 0.54 0.46 1.32 2.07 0.46 19 0.30 0.11 1.14 2.57 3.96 4.34 0.68 0.25 1.35 1.51 2.81 3.06 0.65 0.18 1.41 2.20 2.76 1.41 20 0.93 0.34 1.61 1.05 2.21 2.61 0.83 0.34 1.52 1.06 2.32 2.91 1.13 0.37 0.90 1.55 2.20 0.90

PAGE 65

56 IRT DIF detection Because the CES-D scale was polytomous, Samejimas (1969) Graded Response model was chosen in this present stud y, using the MULTILOG computer program (Thissen, 1991). For this model, each four-category item has one discrimination parameter ( a ) and three location parameters ( b 1 b 2 and b 3 ). The discrimination parameter reflects the extent to which an item di fferentiates between levels of underlying depression, and items with higher a are generally preferred because they are more informative in a psychometric sense. The location parameters refe r to the point on the underlying depression scale in which the probability is 50% for endorsing the first category relative to th e last 3 categories (b 1 0 vs. 1, 2, 3), the first 2 categories relative to the last 2 categories ( b 2 0, 1 vs. 2, 3), and the first 3 categories relative to the fourth category ( b 3 0, 1, 2 vs. 3), respectively. To assess model-data fit, the MODFIT subroutine was used. To determine good model-data fit, this study used adjusted chisquares to degrees of freedom ratios for item singles, doubles, and triples, which may provi de more advanced chi-square methods. Adjusted chi-squares to degrees of freedom ratios for item singles, doubles, and triples all showed less than 3, indicating good model-data fit (mean adjusted chi-squares/df for singles = 0.001; mean adjusted chi-squres/df for doubles = 2.5 08; and mean adjusted chisquares/df for triples = 2.801) (Drasgow, Levine, Tsien, Williams, & Mead, 1995). The concurrent calibration method was subs equently used to put the reference and focal group parameters on a common metric with Item 1 as an anchor item. In this step, Whites (in two cases of White-Black and White-Mexican American comparisons) and Mexican Americans (only in the case of Mexican American-Black comparisons) were

PAGE 66

designated as the reference group, whose latent mean was set to zero. The Mexican Americans (only in the case of White-Mexican American comparisons) and Blacks (in two cases of White-Black and Mexican American-Black comparisons) were designated as the focal group; its latent mean was free to vary. Item parameter estimates are presented in Table 1. As described for the CFA DIF method, the free-baseline model strategy was also used for each CES-D item, and differences in relative goodness of fit were examined with respect to critical chi-square statistics. Each chi-square difference was compared to Bonferroni corrected p-values (corrected, 2 = 16.31 with 4 degrees of freedom), and items exhibiting DIF were flagged. Results Descriptive Information of Sample As shown in Table 2, the 1,876 Whites, 464 Blacks, and 2,623 Mexican Americans were significantly different in terms of their age and gender distribution. In terms of age distribution, 46.7% of the Whites and 32.3% of both the Blacks and Mexican Americans were 75 or older. Whites included more individuals who were aged 75 or older. More than half were female for all three groups (57% for Whites, 63% for Blacks, and 58% for Mexican Americans). At the same time, there were significant differences in gender distribution ( 2 = 6.98, p < .05). Black participants were more likely to be female than Whites and Mexican Americans. A study variable, the CES-D, showed significantly different mean scores across three racial/ethnic groups, showing higher scores of Mexican Americans than that of the Whites or Blacks (F = 33.82, p < .001). Moreover, using the standard cutoff of 16 on the CES-D for evidence of probable 57

PAGE 67

depression, Mexican Americans consistently exhibited a greater likelihood for depression than levels reported for the Whites or Blacks. Specifically, 16% of the Whites, 14.4% of the Blacks, and 23.1% of the Mexican Americans fell into the probable depression category ( 2 = 44.17, p < .01). Table 2. Study 1 Descriptive Characteristics of the Sample New Haven EPESE (N = 2,340) H-EPESE (N = 2,623) Whites (N=1,876) Blacks (N=464) Mexican Americans (N=2,623) F or 2 Age (75) 45.7% 32.3% 32.3% 89.58*** Female 56.6% 63.4% 58.1% 6.98* CES-D 8.03/8.39 7.88/7.82 10.06/9.31 33.82*** Probable depression ( 16) 16.0% 14.4% 23.1% 44.17*** Note. New Haven EPESE = New Haven Established Populations for Epidemiologic Studies of the Elderly; H-EPESE = Hispanic Established Populations for Epidemiologic Studies of the Elderly; SD = Standard Deviation; CES-D = Center for Epidemiological Studies-Depression p < .05. *** p < .001. 58

PAGE 68

59 Descriptive Item Statistics The twenty CES-D items were evaluate d using classical test theory (CTT) statistics, which suggest that the first step for the DIF analyses is to check for mean differences. As shown in Table 3, mean scor es on each item were compared across the three racial/ethnic groups us ing the ANOVA test. Significa nt mean differences were identified for twelve of twenty items (Items 3, 4, 6, 7, 9, 10, 12, 14, 15, 16, 17, 18, and 19). In each case, Mexican Americans consistently appeared to have higher mean score than other two groups, with an exception of two items (Items 15 and 19) showing higher means among Blacks. Table 4 also summari zes item responses to each response option of the 20 CES-D items. DIF Results Results of CFA and IRT DIF analyses are summarized in Table 5. Because the CFA and IRT methods did not always identify the same DIF items across any two groups, I decided to focus on DIF items detected by both methods, as a more conservative strategy that would reduce possibl e Type I error rates. This strategy, using two or more sets of statistical results a nd explaining consistent DIF items across different procedures, has been suggested by a number of res earchers to detect DIF (Hambleton, 2006; Hambleton & Rogers, 1989). Although reporti ng commonly identified DIF items was the main focus, however, attention was also given to uncommon DIF items across two methods as well as DIF-free items.

PAGE 69

Table 3. Study 1 Mean and Standard Deviation (SD) of the CES-D in Whites, Blacks, and Mexican American Whites (N = 1,876) Blacks (N = 464) Mexican Americans (N = 2,623) CES-D Items Mean (SD) Mean (SD) Mean (SD) F 1. I was bothered by things that usually dont bother me .44 (.81) .36 (.79) .43 (.78) 1.65 2. I did not feel like eating; my appetite was poor .35 (.78) .43 (.85) .35 (.71) 2.57 a, c 3. I felt that I could not shake off the blues even with help from my family or friends .33 (.73) .28 (.66) .37 (.74) 4.37* b, c 4. I felt that I was jut as good as other people + .29 (.77) .26 (.71) .87 (1.21) 203.74*** b, c 5. I had trouble keeping my mind on what I was doing .40 (.76) .34 (.67) .43 (.75) 2.81 c 6. I felt depressed .50 (.81) .44 (.76) .54 (.81) 3.58* c 7. I felt everything was an effort .51 (.90) .55 (.96) .61 (.92) 7.645*** b 8. I felt hopeful about future + .93 (1.22) .99 (1.23) .94 (1.16) .46 9. I thought my life had been a failure .20 (.60) .20 (.61) .26 (.64) 5.318** b, c 10. I felt fearful .27 (.63) .29 (.68) .32 (.66) 4.34* b 11. My sleep was restless .59 (.98) .50 (.86) .55 (.90) 1.90 12. I was happy + .62 (.99) .52 (.89) .85 (1.07) 39.46*** b, c 13. I talked less than usual .35 (.79) .36 (.79) .39 (.75) 1.39 14. I felt lonely .50 (.87) .48 (.84) .47 (.83) .72 15. People were unfriendly .20 (.59) .37 (.79) .25 (.68) 13.55*** a, b, c 16. I enjoyed life + .46 (.91) .34 (.75) .91 (1.12) 139.04*** a, b, c 17. I had crying spells .21 (.58) .14 (.50) 42 (.78) 66.49*** b, c 18. I felt sad .44 (.72) .43 (.73) .54 (.83) 10.49*** b, c 19. I felt people disliked me .11 (.43) .25 (.64) .18 (.53) 18.97*** a, b, c 20. I could not get going .34 (.71) .34 (.69) .35 (.73) 1.26 Note. + Reverse-coded item. a A significant mean difference between Whites and Blacks was obtained at the .05 level. b A significant mean difference between Whites and Hispanics was obtained at the .05 level. c A significant mean difference between Blacks and Hispanics was obtained at the .05 level. p < .05. ** p < .01 *** p < .001. 60

PAGE 70

Table 4. Study 1 Item Responses of the CES-D Scale in Whites, Blacks, and Mexican Americans Response (%) Rarely/none of the time Some of the time Much of the time Most/all of the time CES-D Items W B M W B M W B M W B M 1. Bothered by things 70.7 76.7 71.7 20.5 16.4 17.8 3.0 0.6 6.8 5.8 6.3 3.7 2. Poor appetite a, c 79.1 74.4 76.4 13.1 15.3 15.4 2.1 3.4 5.3 5.8 6.9 2.9 3. Could not shake blues b, c 78.4 80.4 74.8 14.2 15.1 16.4 3.6 0.9 5.6 3.9 3.7 3.3 4. As good as other people + b, c 85.3 86.4 60.5 4.5 3.9 11.3 5.8 6.7 8.7 4.4 3.0 19.6 5. Trouble concentrating c 72.1 73.9 70.1 19.9 21.3 19.9 3.4 1.5 7.1 4.5 3.2 2.9 6. Felt depressed c 64.7 67.2 62.3 25.8 25.9 25.0 4.1 2.2 8.7 5.4 4.7 8.7 7. Everything an effort b 69.3 68.5 61.7 18.8 18.5 22.0 3.8 2.6 9.3 8.1 10.3 9.3 8. Hopeful about future + 59.3 56.7 50.0 6.9 7.1 21.6 15.3 17.0 13.2 18.5 19.2 15.2 9. Life a failure b, c 87.0 87.9 82.8 8.7 7.5 11.0 1.5 1.5 3.9 2.8 3.0 3.9 10. Felt fearful b 80.7 79.7 76.3 15.0 14.9 17.0 1.3 1.7 4.6 3.0 3.7 4.6 11. Restless sleep 66.4 67.0 66.8 19.2 23.3 17.8 3.6 2.4 9.0 10.8 7.3 6.5 12. Happy + b, c 68.0 70.5 52.8 9.4 11.6 22.4 15.7 13.1 11.5 7.0 4.7 13.2 13. Talked less 78.6 78.2 74.2 12.9 13.4 15.7 3.0 2.8 6.9 5.5 5.6 3.2 14. Felt lonely 68.2 67.9 69.5 20.9 22.2 19.2 3.6 3.4 6.1 7.3 6.5 5.1 15. People unfriendly a, b, c 87.2 76.9 84.7 8.8 15.3 8.6 1.4 1.9 3.2 2.6 5.8 3.5 16. Enjoyed life + a, b, c 76.4 79.5 51.7 6.9 9.5 20.9 10.6 8.2 11.7 6.0 2.8 15.7 17. Crying spells b, c 85.2 89.9 72.5 10.6 7.8 16.7 2.1 0.4 7.2 2.1 1.9 3.6 18. Felt sad b, c 66.7 67.2 63.9 26.1 26.7 22.5 3.8 1.9 9.5 3.4 4.1 4.2 19. People disliked me a, b, c 92.2 82.8 87.0 6.0 12.9 9.3 0.6 1.1 2.4 1.2 3.2 1.4 20. Could not get going 76.8 75.6 76.5 16.8 17.9 13.5 2.6 3.4 6.5 3.9 3.0 3.5 Note. W = Whites (N=1,876); B = Blacks (N=464); H = Mexican Americans (N=2,623). + Reverse-coded item. a A significant mean difference between Whites and Blacks was obtained at the .05 level. b A significant mean difference between Whites and Mexican Americans was obtained at the .05 level. c A significant mean difference between Blacks and Mexican Americans was obtained at the .05 level. 61

PAGE 71

62 Table 5. Study 1 DIF Results from CFA and IRT Methods Whites vs. Blacks Whites vs. Mexican Americans Mexican Americans vs. Blacks CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) Models Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Baseline Model (Referent: Item1) 2033.85 27259.1 6344.63 58833.0 5495.77 44396.8 Comparison Models Constrained Item 2 (10.75) (15.6) (0.93) (80.3) DIF (9.87) (33.3) DIF Constrained Item 3 (8.39) (13.4) (18.24) DIF (18.4) DIF (19.51) DIF (25.8) DIF Constrained Item 4 + (8.83) (3.5) (893.04) DIF (346.2) DIF (307.16) DIF (124.0) DIF Constrained Item 5 (5.33) (7.8) (12.95) DIF (81.4) DIF (14.62) DIF (30.8) DIF Constrained Item 6 (6.82) (7.2) (17.57) DIF (39.7) DIF (13.57) DIF (29.6) DIF Constrained Item 7 (5.65) (17.2) DIF (52.88) DIF (60.1) DIF (4.52) (52.8) DIF Constrained Item 8 + (10.75) (17.8) DIF (41.34) DIF (212.2) DIF (1.68) (72.4) DIF Constrained Item 9 (4.89) (5.0) (32.60) DIF (33.2) DIF (7.75) (6.7) Constrained Item 10 (4.52) (7.2) (31.33) DIF (52.8) DIF (0.77) (15.3) Constrained Item 11 (9.62) (10.1) (6.03) (80.8) DIF (1.99) (37.3) DIF Constrained Item 12 + (14.83) DIF (5.8) (214.43) DIF (225.4) DIF (177.44) DIF (53.9) DIF Constrained Item 13 (0.42) (4.3) (12.66) DIF (80.1) DIF (2.05) (25.4) DIF Constrained Item 14 (1.20) (6.2) (6.98) (29.9) DIF (1.77) (18.2) DIF Constrained Item 15 (29.43) DIF (41.5) DIF (34.53) DIF (24.3) DIF (15.65) DIF (43.0) DIF Constrained Item 16 + (33.65) DIF (10.7) (619.80) DIF (341.2) DIF (267.81) DIF (97.3) DIF Constrained Item 17 (21.96) DIF (5.0) (383.23) DIF (105.7) DIF (294.00) DIF (49.1) DIF Constrained Item 18 (0.16) (11.6) (77.97) DIF (50.6) DIF (20.87) DIF (38.2) DIF Constrained Item 19 (47.72) DIF (48.5) DIF (146.12) DIF (72.6) DIF (10.83) (38.5) DIF Constrained Item 20 (1.39) (7.4) (17.20) DIF (74.3) DIF (6.31) (29.4) DIF Total # DIF Items 5 4 16 19 9 17 Note. Items in bold and underlined are common DIF items across CFA and IRT methods in each cross-racial/ethnic comparison. 2 were > 16.31. 2 were > 11.88. b In IRT, DIF flagged if + Reverse-coded item. a In CFA, DIF flagged if

PAGE 72

63 In both CFA and IRT methods, this study used Item 1 as a referent and had nineteen model comparisons in which each of the CES-D items was constrained to be equal across groups in each model. Am ong all three group comparisons (e.g., Whites vs. Blacks; Whites vs., Mexican Americans; and Mexican Americans vs. Blacks), WhiteMexican American group comparisons exhibite d the greatest number of DIF items (16 common DIF items) and White-Black group comparisons flagged the fewest number of DIF items (2 common DIF items). Only one item, Item 15 (people were unfriendly), consistently exhibited DIF across all three group comparisons, which suggest Item 15 functioned differently in each of the three groups. Regarding the responses to this interpersonal relation item (Item 15), Black s were favored over Whites and Mexican Americans, while the latter were favored ove r Whites. These results clearly suggested that Blacks were more likely to endorse It em 15 (people were unfriendly) than Whites and Mexican Americans. Across all three group comparisons, sixteen of twenty items flagged DIF at least once. In other words, 80% of the twenty CES-D items functioned differently across at least two of the three groups. Among theses sixteen DIF items, fourteen items (Items 2, 3, 4, 5, 6, 7, 9, 10, 12, 13, 16, 17, 18, and 20) favor ed Mexican Americans, indicating Mexican Americans had a greater likelihood to endorse these items than Whites or Blacks. Two items related to interpersonal problems (the previously mentioned Item 15, people were unfriendly and Item 19, pe ople disliked me) favored Blacks, suggesting Blacks were more likely to endorse them th an Whites and Mexican Americans. Only four items (Items 1, bothered by things; 2, poor appetite; 11, res tless sleep; and 14, felt lonely) showed no evidence of DIF, suggesting they were common depressive

PAGE 73

64 symptom items that functioned equivalent ly across Whites, Blacks, and Mexican Americans. Three of these four items (Item s 1, 2, and 11) are associated with somatic symptoms, while the fourth (Item 14) is us ually found in Radloffs depressive affect factor. Whites versus Blacks In the comparison of Whites and Blacks, CFA identified five DIF items (Items 12, 15, 16, 17, and 19) and IRT flagged four DIF items (Items 7, 8, 15 and 19). Two DIF items (Items 15, people were unfriendly and 19, people disliked me) were identified in both CFA and IRT methods. The same findings have been previously observed in two DIF studies (Cole et al., 2000; Yang & Jones, in press) a nd the present study supported their findings using the same samples draw n from the New Haven EPESE but different DIF methods (Cole et al. used the Mantel-H aenszel (MH) adjustment; Yang & Jones used the multiple indicators, multiple causes (MIMIC ) model). As was the case in these two studies, in the present analyses, Blacks were more likel y than Whites to endorse the two interpersonal items. Whites versus Mexican Americans In the White-Mexican American comparison, CFA flagged sixteen items flagged as DIF and IRT identified nineteen DIF items All sixteen of the CFA-flagged items were also identified by IRT: Items 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 17, 18, 19 and 20. In other words, 80% of the CES-D items we re indicated to function differently across Whites and Mexican Americans. Notably, all four positive affect items (Items 4, felt good as good as others; 8, felt hopeful about the future; 12, felt happy; 16, enjoyed life) exhibited DIF in both approaches. A ll DIF items except four positive items favored

PAGE 74

65 Mexican Americans, which means Mexican Americans were more likely to endorse these items compared to Whites. Responses to th e four positive affect items, here actually representing low positive affect items since these four items were reverse-scored, favored Mexican Americans, indicating Mexican Am ericans were less likely to endorse all positive feeling items. Mexican Americans versus Blacks In the comparison of Mexican Americans and Blacks, CFA identified nine DIF items and IRT identified seventeen. There were nine common DIF items (Items 3, 4, 5, 6, 12, 15, 16, 17, and 18), suggesting that nearly half of the CES-D items functioned differently for Mexican Americans and Blacks. Three of four positiv e affect items (Items 4, felt good as good as others ; 12, felt happy; 16, enj oyed life) showed DIF. Compared to Blacks, Mexican Americans were more likely to endorse all DIF items (including three low positive feeling items) except one interpersonal problem item (Item 15, people were unfriendly). Blacks were more likely to endorse Item 15 (people were unfriendly) than were Mexican Americans. Three positive affect items favored Mexican Americans over Blacks, indicating Mexican Am ericans were more likely to endorse low positive feeling items. This finding parallels those for the comparison of Mexican Americans with Whites, and suggests that Mexican Americans may be less likely to report their positive feelings than Blacks. CFA versus IRT In the comparison of DIF items identified by CFA and IRT methods, common DIF items were found for two of seven items in the White-Black comparison, sixteen of nineteen items in the White-Mexican Ameri can comparison, and nine of seventeen items

PAGE 75

66 in the Mexican American-Black compar ison. Most of the uncommon DIF items identified by either CFA or IRT were items detected by the IRT method, suggesting that the IRT DIF method was more likely to dete ct DIF over CFA in the present study. Discussion The present study investigated the cultur al equivalence of the CES-D across three racial/ethnic groups of older adults, incl uding Whites and Blacks from the New Haven EPESE and Mexican Americans from the Hispanic EPESE. It is worth pointing out that the present study may be the first to examine item bias in the full version of the CES-D across racially/ethnically dive rse elderly populations, and es pecially to include older Mexican Americans in the comparisons. The goal of this study was to identify items in the CES-D that function differentially across diverse racial/ethnic groups. The approach followed included two different analytic strategi es, the CFA and the IRT, to identify DIF, and hence was able to examine items detected by one or the other, or both, strategies. DIF analyses in Study 1 indicated that across all three ra cial/ethnic groups, sixteen of the twenty CES-D items displaye d statistically signifi cant DIF in both CFA and IRT methods. In other words, 80% of the CES-D items functioned differently across Whites, Blacks, and Mexican Americans. That left four items (Items 1, bothered by things; 2, poor appetite; 11, re stless sleep; and 14, felt lonely ) of the twenty CES-D items as being identified to function similarly acro ss all three racial/et hnic groups of older adults. The bottom line is that all three gr oups clearly did not report their symptoms of depression on the CES-D equivalently, a fi nding that emphasizes the need for further study of measurement equivalence in at least this depression screening instrument.

PAGE 76

67 In the comparison of Whites and Blacks, results supported previous findings from two published DIF studies on the CES-D items among older adults that used the same dataset employed in the present analysis (C ole et al., 2000; Yang & Jones, in press). Compared to Whites, Blacks consistently over-endorsed two interpersonal relation items (people were unfriendly and people disliked me), which may reflect perceptions of racial discrimination by Blacks. It has been well documented that Blacks generally experience more disadvantaged social conditions than Whites, and are more likely to report racial discrimination (e.g., Ren, Amick, & Williams, 1999; Williams, 2005). Disproportionate responses to two interp ersonal problem items that may confound depressive symptoms with perceived racial prejudice ha ve been observed in other published studies with Blacks (e.g., Blazer et al., 1998), although they were not testing DIF. These results suggest that researchers w ho are interested in investigating depressive symptoms among Blacks should pay careful a ttention to the CES-D items involving interpersonal relations. No previous work has addressed item bias in the CES-D among older Mexican Americans. The most striking finding in the present DIF analyses was the general lack of measurement equivalence of the CES-D scal e in the comparison of Mexican Americans to Whites and Blacks. Sixteen of the tw enty CES-D items were shown to function differently in the comparison of Mexican Americans and Whites. Remarkably, Mexican Americans were predisposed to endorse all of the sixteen DIF items (h ere, including four low positive affect items) compared with thei r counterparts. The comparison with Blacks were nearly as dramatic, with nearly half of the items on the CES-D manifesting item

PAGE 77

68 bias and with the results indicating higher levels of endorsement by the Mexican Americans. A greater tendency to endorse depressi ve symptoms among Mexican Americans can be partially explained by research i ndicating that Mexican Americans are less hesitant to admit their symptoms of ps ychological distress (Haberman, 1970). This response style may lead to less underreporting symptoms of depression among Mexican Americans (McHorney & Fleishman, 2006), which would help to explain their relatively high depression scores. Overa ll, these results suggest that using the standard cut-off scores of 16 or higher (Andresen et al., 1994) with Mexican American elders may lead to misclassification, resulting in el evated false positive rates. Future work is warranted on the appropriate cut-off scores of the CES-D in Mexican American population. One intriguing finding was that Mexican Americans appeared to be much less likely to endorse positive affect items than the other two groups. This finding parallels results reported in Iwata and colleagues (2002) two studies of young adults. Their studies found that Hispanics were more like ly to report the absence of positive feelings than were Whites. In other words, the tw o studies showed that Hispanic young adults were less likely to endorse positive affect items compared to Whites, suggesting that Hispanics tend to inhibit at least the reporting if not the actual experienceof positive feelings such as feeling good about oneself feeling happy, feeling hopeful about the future and enjoying life. In addition, th ese positive item bias in the CES-D favoring Whites and Blacks may be also explained by re latively little hesitation to express positive feelings among Whites and Blacks. There is, in fact, some literature suggesting that positive feelings are prominent in mainstream American culture (Ying, 1989), and also

PAGE 78

69 that life in mainstream American culture may generate more positive feelings in daily life (Iwata & Buka, 2002). Consistent with previous studies showing measurement equivalence of all positive affect items in the CES-D among older Whites and Blacks (e.g., Cole et al., 2000; Yang & Jones, in pr ess), in the present study, Whites and Blacks showed a similar response tendency in expres sing their positive fee lings, indicating that they may share values, attitudes, and beliefs regarding the expression of positive feelings. There is, of course, also the clinically im portant possibility that the similarities and differences may reflect similariti es and differences in the actual experience of emotions. A related finding to the abovementioned response tendency to positive feeling items is that Mexican Americans showed gr eater endorsement of the extreme category on positive affect items, which directly affected item bias on positive feeling items. Compared with Whites and Blacks, Mexican Am ericans had a greater tendency to select rarely/none of the time, one of the end points, when they were asked to report their positive feelings, such as feeling good about th emselves (Item 4), feeling happy (Item 12), and enjoying life (Item 16). This dispr oportionate extreme response style has been previously reported in the literature (e.g., Clarke, 2000; Marin, Gamba, & Marin, 1992; Hui & Triandis, 1989), with a suggestion that Hispanics may prefer extreme responses because of a cultural value that associat es extreme responses with sincerity and conviction. It is noteworthy th at a greater tendency to sel ect the extreme response style was found only for positive affect items, wh ich suggest responses to positive feeling items among Mexican Americans may be associ ated with their inhibited endorsement of positive feelings as well as extreme response st yle. More importantly, this response bias

PAGE 79

70 resulted in a higher total mean score of the CES-D and a significantly larger percentage of clinically depre ssed individuals. There was also evidence that Hispanic s reported somatic symptoms differently than did the White groups. Th e literature has suggested that Hispanics are more likely to somatize their psychological distress such as depressive symptoms (e.g., Angel & Guarnaccia, 1989; Fabrega, 1990). The presen t study found that four of seven somatic symptom items (Items 5, trouble concentration; 7, everythin g is an effort; 13, talked less; and 20, could not get going) exhibited DIF in the Mexican American-White comparison and one somatic item (Item 5, t rouble concentration) in the Mexican American-Black comparison. In all cases there was a greater endorsement among Mexican Americans. These findings partiall y supported the interac tion between ethnicity (favoring Mexican Americans) and somatic symptom DIF items in the CES-D. Interestingly enough, however, this study found th ree of the four common items (i.e., no DIF items) found to function equivalently acro ss all three groups were somatic symptom items. These no-DIF items were bothered by things, poor appetite, and restless sleep, although bothered by thi ngs was used as a referent. Given the fact that this study focused on only older adults and that older adults may tend to somatize their depressive symptoms (e.g., Norris, Arnau, M eagher, & Bramson, 2005), it seems that the interaction between DIF in somatic symptom items and race/ethnicity may be confounded with older age in this sample. It should be emphasized that this wa s the first study using two popular DIF methods, both CFA and IRT methods, to det ect common DIF with the CES-D. The use of the joint approaches has been suggeste d by a number of researchers as a way of

PAGE 80

71 ensuring a comprehensive test of meas urement equivalence (e.g., Hambleton, 2006; Hidalgo-Montesinos & GomezBenito, 2003; Wang and Russell, 2005). In fact, using multiple methods has been recommended for cross-cultural researchers to identify cultural differences more accurately (e.g.,Sch affer, & Riordan, 2003). In addition to following their suggestion, this study used a unified strategy of DIF detection with CFA and IRT suggested by Stark et al. (2006), in order to provide higher power and lower Type I error rates, which is of particul ar concern in DIF detection method. Although CFA and IRT found some disc repant DIF items, the majo rity of DIF items were identified by both methods. Results suggest that the use of both methods may be helpful for detecting item bias. For example, when th is study applied this strategy to detect and interpret DIF, results from this study in the comparison of Whites and Blacks showed perfect agreement with two prev ious studies (Cole et al., 200 0; Yang & Jones, in press) that used the same sample as employed here. This suggests the combined use of the two statistical approaches increases the accuracy for testing measurement equivalence of depressive symptom inventorie s. However, results also raise questions about how to interpret DIF items that are only detect ed by one method. Careful attention and interpretation should be made to these unc ommon DIF items across different methods for future research. From a methodological point of view, St udy 1 reinforces the need for careful evaluation of measurement equivalence across dive rse groups. Of partic ular interest were the apparent response inhibition for pos itive affect items evident among Mexican American elders and the two interpersonal problem items where Blacks appeared to have a greater predisposition for endorsement. When instruments are used to screen and assess

PAGE 81

72 for depression in diverse racial/ethnic popula tions, researchers and practitioners should be aware of the risk that individuals from different cultural backgrounds may tend to be misclassified, such as false positives, false negatives, or both, leadi ng directly to underor over-diagnosis for depression. Use of inaccurate measures could also lead to misguided public policies. Although the findings of Study 1 hold imp lications to resear ch, practice, and public policies, limitations should be noted. On e factor that was not controlled in the study was the potential influence of historic al time and cohort differences between the samples from the New Haven EPESE and th e H-EPESE. The New Haven EPESE was collected in 1981-1982, whereas the H-EPESE was collected in 1993-4. The over ten year differences between those two sample s may have led to differential response patterns. In addition, this study included a relatively small sample of Blacks. Both limitations underscore the importance of appropriate nationally representative datasets that can provide enough information to capture racial/ethnic dispar ities in health. In summary, Study 1 high lights the importance of c onsidering symptoms of depression that may be experienced and expre ssed differently by diverse cultural groups. Mexican American elders, in particular, we re found to differ substantially from White elders in their predispos ition to respond. Black elders, in general, were much more likely to respond in patterns similar to those of the White elders. The reasons underlying the differences, as well as the similarities, are at present unknown. Clearly, more work remains to be done, especially with regard to understanding potential sources of DIF, such as sociodemographic characteristics. Ul timately, this avenue of research may lead

PAGE 82

73 to the development of a screening tool that is as free of item bias as possible across diverse racial/ethnic groups.

PAGE 83

74 CHAPTER THREE: STUDY 2 SOCIODEMOGRAPHIC-RELATED MEASUREMENT BIAS Introduction Responding to the national commitment to reduce racial/ethni c disparities in health and health care (e.g., U.S. Department of Health and Human Services, 2005), this study addressed one important but little-studied area in health di sparities research: measurement equivalence of mental health sc reening tools. While a number of studies have addressed the question of measurement equivalence, most have simply compared factor structures across diverse racial/ethnic groups with no consideration of potential item bias. Measurement equivalence is of pa rticular importance in health disparities research because if items on a measure have differential meanings or validity across diverse groups, group comparisons may be mi sleading and the prevalence estimates may be inaccurate. Especially when self-repor t screening tools are applied in diverse racial/ethnic groups, particular attention should be paid to whether item response levels are systematically inflated or deflated by fact ors, such as cultural values and gender role that are unrelated to the a targeted construc t such as depressive symptoms (Stewart & Npoles-Springer, 2003). In cases where th ese potentially biasing factors operate differentially across diverse racial/ethnic groups, apparent group differences or similarities assessed by the self -report instrument could be the result of response bias what has been called differe ntial item functioning (DIF) rather than true group differences.

PAGE 84

75 Two analytic approaches have been us ed for testing measurement equivalence, confirmatory factor analysis (CFA) and item response theory (IRT). Most studies examining DIF have only relied on results from one of these methods. The two methods, however, often produce different results, since they differ in underlying approaches (Raju, Laffitte, & Byrne, 2002; Stark, Chernyshenko, & Drasgow, 2006; Teresi, 2006). CFA is based on a linear model, tests invariance of lo ading parameter first and intercepts later, and uses a free baseline model for hypothesis testing. In contrast, IRT is based on a nonlinear model, tests inva riance of discrimination and location parameters simultaneously, and uses a constrained baseli ne model strategy for hypothesis testing. The implications of these differences, for item level comparisons, are presently unknown; it is only recently that simulation studies ar e being run to compare the two approaches to DIF detection (e.g., Meade & Lautenschlager, 20 04; Raju et al., 2002; Stark et al., 2006). Until such studies are completed some ps ychometricians are recommending applying multiple DIF detection approaches for more accurate DIF results and for more definitive information concerning which items are showing DIF and which ones are not (e.g., Hambleton, 2006; Wang & Russell, 2005). According to Hambleton (2006), it may be useful to focus on the items that reveal c onsistent DIF across diffe rent methods. Taking the abovementioned guidelines into account, one of the objectives of the present study was to apply these two DIF methods as a means of identifying items that are consistently classified as DIF for more accurate results and stronger conclusions. A related, and also understudied area in measurement equivalence is the influence of sociodemographic charac teristics (Stewart & Npol es-Springer, 2003). Unlike sociodemographic-related measurement bias, race/ethnicity DIF on self-report screening

PAGE 85

76 tools has been addressed in previous research (e.g., Kim, Chiriboga, & Jang, 2007). Recent literature on U.S. racial/ethni c populations documents the substantial sociodemographic diversity among America s racial/ethnic group (Williams, 2005). A number of studies have sugge sted that sociodemographic differences may play a causal role in racial/ethnic disparities in heal th (e.g., Alwin & Wray, 2005; Mirowsky & Ross, 2003; Williams, 2005). However, little is known about how such characteristics may also influence the reporting of health symptoms and the completion of instruments. Sociodemographic characteristics may reflect fundamental differences in the experience of symptoms that give rise to the lack of measurement equivalence (McHorney & Fleishman, 2006). These sociodemographic characteristics in fact are fundamental to shaping the different experiences and liv ed realities of people (e.g., Wray, Alwin, & McCammon, 2005). In turn these differences in e xperiences may influence peoples values, perceptions, and views and could, among other thi ngs, systematically inflated or deflated item response levels. More importantly, ev en shared sociodemographic conditions may not confer similarity across diverse racial/e thnic groups, which suggest s the possibility of interactions between race/ethnicity and sociodemographic characteristics regarding measurement equivalence. For these reasons, this Study 2 was interested in not only the overall effect of sociodemographic strata on the measurement equi valence but also the differential effect of sociodemographic f actors on the measurement equivalence in diverse racial/ethnic groups. The instrument I selected to evaluate measurement equivalence was the Center for Epidemiological Studies Depression Scale (C ES-D; Radloff, 1977). Since its initial

PAGE 86

77 development in 1977, the 20-item CES-D has b een used in a substantial number of studies. Despite its wide use in diverse populations and confirmed general usefulness (Mui, Burnette, & Chen, 2002), there has been little attempt at the systematic assessment of the CES-D items that function differentially across diverse groups. A few CES-D studies have examined the effect of sociodemographic characteristics (e.g., age, gende r, or educational attainment ) on measurement equivalence (e.g., Cole, Kawachi, Maller, & Berkman, 2000; Yang & Jones, in press). Cole and colleagues (2000) studied item-level biases in the CES-D across groups varying in age, gender, and race/ethnicity (Whites vs. Blacks). They found one gender biased item (crying) and two race /ethnicity biased ite ms (people were unfriendly and people disliked me). Using a different DIF detection method (the multiple indicators, multiple causes [MIMIC] model), Yang and Jones (in pre ss) also successfully replicated findings from Cole and colleagues (2000). In both stud ies, two interpersonal relation items had a higher predisposition for Blacks and the c rying item had higher endorsement among women. In their studies, age and gender item biases within each ra cial/ethnic group were not analyzed in ways that might have capture d more of the fundamental differences that give rise to item bias. Neith er study investigated the effect of educational attainment on the measurement equivalence of the CES-D among older adults. Perhaps more importantly, none of the previous work has fully considered the sociodemographic-related measurem ent bias in the CES-D within as opposed to acrossdifferent racial/ethnic groups. Each racial/ethnic group has its unique sociodemographic profiles which may underlie within group differences (Alwin & Wray, 2005), and which may also in themselves resu lt in differing predispositions to respond

PAGE 87

78 depressive symptom items (Nguyen, Kitner-Triolo, Evans, & Zonderman, 2004). Therefore, it may be meaningful to examine sociodemographic-related item bias in the CES-D scale within each racial/ethnic group. By identifying sociodemographic-related item bias in the CES-D within each racial/eth nic group, future research may be in a better position to explain why diverse racial/ethnic groups respond di fferentially to the certain items of the CES-D scale. The purpose of this Study 2 was to examine the sociodemographic-related item bias of the CES-D in the total sample as we ll as three racial/ethnic elderly groups: Whites, Blacks, and Mexican Americans. Due to the relatively large sample sizes, this investigation had a unique opportunity to in vestigate possible interaction effects of sociodemographic variables and race/ethnicity (Whites, Blacks, and Mexican Americans). The use of two DIF methods to examine item bias in the CES-D also represents a potential contribution to the fi eld. With respect to findings from this dual use approach, this study took a conservative approach to identifying DIF by recognizing DIF only when the same item was identified by both the CFA and IRT methods. Methods Sample Two national datasets were used for this study. The New Haven Established Populations for Epidemiologic Studies of the Elderly (EPESE) provided the White and Black samples and the Hispanic Established Po pulations for Epidemiologic Studies of the Elderly (H-EPESE) provided the Mexican Amer ican sample. These two datasets were selected because of (1) their in clusion of older adults aged 65 or older; (2) their use of the original 20-item CES-D with a four-point rating scale; and (3) their inclusion of Whites

PAGE 88

79 and Blacks in New Haven EPESE and Mexican Americans in the H-EPESE. The New Haven EPESE is a longitudinal study of co mmunity-dwelling participants aged 65 or older, and is one of four similar studies (the other sites collected data in East Boston, Iowa, and North Carolina). The H-EPESE, also a longitudinal study, included Mexican American participants aged 65 or older a nd living in Texas, New Mexico, Colorado, Arizona, and California. The H-EPESE wa s modeled after the design of the EPESE studies in order to compare with other populations in 1993-4. Using the first waves of the New Ha ven EPESE and H-EPESE, subjects were excluded if they had any missing data on 20 CESD items. This resulted in the listwise deletion of 407 Whites (17.8 % of the total Whites), 65 Blacks (12.3 % of the total Blacks), and 427 Mexican Americans (14.0 % of the total Mexican Americans). Further 41 subjects (32 Whites and 9 Blacks) were excluded due to missing data on their educational attainment. No missing data was found with regard to age and gender. The remaining sample comprised 1,844 Whites, 455 Blacks, and 2,623 Mexican Americans. Removed participants in each group had sim ilar characteristics in terms of gender distribution and educational attainment, but we re more likely to be older for all three racial/ethnic groups. Measures CES-D scale Study 2 also used the CES-D as a target instrument. Both the New Haven EPESE and H-EPESE used the original 20-item version of the CES-D (Radloff, 1977). Respondents in both datasets were asked to report how often each symptom was experienced during the past w eek, and their symptoms were rated on a 4-point likert

PAGE 89

80 scale, with categories presented in the followi ng order: rarely or none of the time (coded as 0), some or a little of the time (coded as 1), much of the time (coded as 2), and most or all of the time (coded as 3). The four positive items were reverse-coded and scale scores were computed by summing acr oss twenty items to produce total scores ranging from 0 (no depressive symptoms) to 60 (severe depressive symptoms). Scores of 16 or higher are typically viewed as eviden ce of probable depressi on (Andresen, Carter, Malmgren, & Patrick, 1994). In ternal consistency of the CE S-D was satisfactory in the present sample: = .87 for the total sample, = .86 for Whites, = .84 for Blacks, and = .88 for Mexican Americans. Statistical Analysis Study 2 used the same methodology used in Study 1. Because the DIF detection methods used in this investigation assume that a single dominant factor underlies CES-D item responses (Stark et al., 2006), the unidimensionality as sumption was first evaluated using a principal component analysis (PCA) via SPSS and a confirmatory factor analysis (CFA) via LISREL 8.8 (Jreskog & Srbom, 2006). This was done within each sociodemographic characteristic group (e.g., younger vs. older, men vs. women, more or less educated) within each of th e three racial/ethnic groups as well as for the total sample. Twenty four runs for each PCA and CFA were performed. After verifying the underlying unidimensiona lity of the CES-D, the application of IRT and CFA DIF detection using the likeli hood ratio tests proceeded. In both methods, this study followed the general approach to hypothesis testing descri bed by Stark et al. (2006), since the latter approach showed hi gh power and low Type I error rates across a wide variety of simulation conditions. In essence, the authors suggest testing for DIF by

PAGE 90

a common strategy that can be implemented in both CFA and IRT. The strategy is called free baseline with Bonferroni correction. A fully free baseline model (with the exception of a single referent item) was used as the basis for subsequent nineteen nested model comparisons where one item at a time was constrained to be equal across groups. Item parameters were compared simultaneously (discriminationloadings and locationsintercepts), using Bonferroni corrected p-values for flagging DIF items. Detailed analytic procedures for CFA and IRT are described below. DIF Detection CFA DIF detection. CFA DIF analyses involving item loadings and intercepts were conducted using an analogous strategy with LISREL 8.8. Using the free baseline model, where only the parameters of the referent are constrained across groups, baseline and constrained models were run in succession. The chi-square difference statistics for the nested model comparisons were evaluated using a Bonferroni corrected critical p-value. When the observed chi-square difference was greater than the corresponding critical chi-square value (Bonferroni corrected, 2 = 11.88 with 2 degrees of freedom), the item was flagged DIF. IRT DIF detection. Because the CES-D scale is polytomous, the Graded Response Model (GRM; Samejima, 1969) was estimated using the MULTILOG computer program (Thissen, 1991). For the GRM, each four-category item has one discrimination parameter (a) and three location parameters (b 1 b 2 and b 3 ). The discrimination parameter reflects the extent to which an item differentiates between levels of underlying depression; items with higher a are generally preferred because they are more informative in a psychometric sense. The location parameters refer to the point 81

PAGE 91

on the underlying depression scale in which the probability is 50% for endorsing the first category relative to the last 3 categories (b 1 0 vs. 1, 2, 3), the first 2 categories relative to the last 2 categories (b 2 0, 1 vs. 2, 3), and the first 3 categories relative to the fourth category (b 3 0, 1, 2 vs. 3), respectively. To determine whether the GRM used for parameter estimation adequately fits the data, chi-square fit statistics were assessed using the MODFIT program. Adjusted chi-squares to degrees of freedom ratios for each item were all less than 3, indicating a good model-data fit (Drasgow, Levine, Tsien, Williams, & Mead, 1995). The concurrent calibration method was subsequently used to put the reference and focal group parameters on a common metric with Item 1 as an anchor item. In this step, groups consisting of those who were younger (aged 65 to 74), female, and more educated (8 th grade or more) were designated as the reference group, whose latent mean was set to zero. Groups with older age (75 or older), male gender, and lower education (less than 8 th grade) were designated as the focal group; focal group latent mean were free to vary. As described for the CFA DIF method, the free-baseline model strategy was also used for each CES-D item, and differences in relative goodness of fit were examined with respect to critical chi-square statistics. Each chi-square difference was compared to Bonferroni corrected p-values (corrected, 2 = 16.31 with 4 degrees of freedom), and items exhibiting DIF were flagged. 82

PAGE 92

Results Sample Characteristics As shown in Table 6, Whites, Blacks, and Mexican Americans were significantly different in terms of their age, gender, and educational attainment. The White sample included more individuals who were aged 75 or older: 46.7% for Whites and 32.3% for both the Blacks and Mexican Americans. In all three groups well over half were female (57% for Whites, 63% for Blacks, and 58% for Mexican Americans). Black participants were more likely to be female than Whites and Mexican Americans. In terms of educational attainment, there were more Mexican Americans with less than an eight grade level of education (77%) than was the case for Whites, among whom the majority had more than an 8 th grade level of education (40.5% for 8 th -11 th grade, 20.1% for 12 th grade, and 15.8% for more than 12 th grade). A study variable, the CES-D, also demonstrated significantly different mean scores across three racial/ethnic groups, showing higher scores in Mexican American sample than in the White or Black samples (F = 33.82, p < .001). Moreover, using the standard cutoff of 16 on the CES-D for evidence of probable depression, Mexican Americans consistently exhibited a greater likelihood for probable depression than was reported for the Whites or Blacks. Specifically, 16% of the Whites, 14.4% of the Blacks, and 23.1% of the Mexican Americans fell into the probable depression category ( 2 = 44.17, p < .01). 83

PAGE 93

Table 6. Study 2 Descriptive Characteristics of the Sample New Haven EPESE (N = 2,299) H-EPESE (N = 2,623) Variables Whites (N = 1,844) Blacks (N = 455) Mexican Americans (N = 2,623) F or 2 Age (75) 45.7% 32.3% 32.3% 89.58*** Female 56.6% 63.4% 58.1% 6.98* Educational attainment 1279.26*** Less than 8 th grade 23.6% 50.5% 77.0% 8 th -11 th grade 40.5% 31.9% 13.3% 12 th grade 20.1% 13.2% 6.7% More than 12 th grade 15.8% 4.4% 3.1% CES-D Scale Mean (SD) 8.03 (8.39) 7.88 (7.82) 10.06 (9.31) 33.82*** Probable depression ( 16) 16.0% 14.4% 23.1% 44.17*** Reliability () .86 .84 .88 Note. New Haven EPESE = New Haven Established Populations for Epidemiologic Studies of the Elderly; H-EPESE = Hispanic Established Populations for Epidemiologic Studies of the Elderly; SD = Standard Deviation; CES-D = Center for Epidemiological Studies-Depression p < .05. *** p < .001. 84

PAGE 94

85 Dimensionality of the CES-D Dimensionality was examined in each sociodemographic group by three racial/ethnic groups and by the total samp le, making for a total of twenty four dimensionality tests. The PCA results indica ted that while the several analyses produced three to six factors, the ratio of first to second eigenvalue ra nged from 3.01 (low educated Mexican Americans: 1 st factor = 6.90; 2 nd factor = 2.29; 3 rd factor = 1.14) to 4.63 (high educated Whites: 1 st factor = 5.89; 2 nd factor = 1.27; 3 rd factor = 1.24; 4 th factor = 1.12), respectively, which suggests that the data are essentia lly unidimensional (Lord, 1980; Stout, 1990). This result was supported by one -factor confirmatory factor analyses, where goodness-of-fit indices exceeded .90 (e xcept for two .89 fit indices for low educated Mexican Americans). Among twenty four CFA analyses, high educated Whites showed the best goodness-of-f it indices (CFI = .95, NFI = .94, NNFI = .95), while low educated Mexican Americans showed the lowest goodness-of-fit indices (CFI = .90, NFI = .89, NNFI = .89). These results confirmed th e overall unidimensionality of the CES-D for the present analyses. DIF Results Results of age, gender, and educational attainment DIF analyses are summarized in Table 7, Table 8 and Table 9, respectivel y. In both the CFA and IRT methods, this study used Item 1 as a referent and had nineteen model comparisons in which each of the CES-D items was constrained to be equal across groups in each model. Following suggestions by a number of psychometricia ns (Hambleton, 2006; Hambleton & Rogers, 1989), this study reported DIF items that show ed up consistently ac ross the two methods as a more conservative strategy that would reduce possible Type I error rates. Although

PAGE 95

86 jointly identified DIF items were the main focus, attention was also given to DIF items detected by only one method as well as DIF-free items. As an overview, six of the CES-D ite ms were shown to have statistically significant sociodemographic-related DIF in both CFA and IRT methods. Looking at another way, 70% of the CES-D items (i.e., four teen items) were relatively free of item bias associated with sociodemographic characteristics. Different effects of sociodemographic characteristics on the CE S-D item bias were found in the three racial/ethnic groups. Using the joint CFA/IR T identification criter ion, no evidence of item bias by sociodemographic characteri stics was observed among Whites. Among Blacks, one item was observed to be bias ed by educational attainment (Item #17, crying). Among Mexican Americans, the same item (Item #17, crying) was biased by gender. Item #17 was the only item confounded with race/ethnicity and sociodemographic characteristics (here, educational attainment and gender). Age-DIF (Younger than 75 vs. 75 or older) Table 7 summarizes age DIF results from CFA and IRT. In the comparison of those 65 to 74 years and those 75 and older, no consistent DIF item was detected by both CFA and IRT methods, suggesting no eviden ce of age-related item bias among older adults. The same results were found in two DIF studies of Black and White elders that used the same New Haven EPESE sample as us ed in the present study (Cole et al., 2000; Yang & Jones, in press). However, ther e were six DIF items detected by only one method: CFA found three age-DIF items (It ems #7, 10, 16) in the total sample; IRT found two age-DIF items (Items #6, 12) am ong Whites, one (Item #17) among Blacks,

PAGE 96

87 Table 7. Study 2 Age DIF Results from CFA and IRT Methods Note. Items i n bold and undined are common DIF ites across CFA and IRT mehods in each cross-ral/ethnic compson. 2 erl m t aci ari + Reverse-coded item. a In CFA, DIF flagged if were > 11.88. b In IRT, DIF flagged if 2 were > 16.31. Age DIF ( younger than 75 vs. 75 or older) Total Whites Blacks Mexican Americans CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) Models Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Baseline Model (Referent: Item1) 6452.68 65125.5 1619.35 23247.5 849.32 6885.6 5253.01 41133.0 Comparison Models Item 2, appetite (0.65) (3.1) (0) (7.3) (0.20) (0.7) (1.99) (2.8) Item 3, blues (6.50) (2.9) (0.31) (7.3) (1.62) (2.4) (6.24) (3.3) + Item 4, good (7.28) (14.2) (1.72) (8.7) (10.03) (9.0) (9.02) (12.2) Item 5, mind (1.22) (12.1) (0.02) (13.9) (3.29) (0.4) (4.89) (1.9) Item 6, depressed (3.46) (1.8) (0.31) (16.4) DIF (8.59) (2.9) (4.05) (5.6) Item 7, effort (22.40) DIF (5.8) (1.40) (9.2) (7.59) (2.7) (1.86) (3.7) + Item 8, hopeful (6.85) (1.0) (0.37) (8.1) (7.66) (0.9) (6.46) (0.7) Item 9, failure (3.60) (1.2) (0.33) (11.3) (7.91) (6.8) (11.07) (6.0) Item 10, fearful (20.95) DIF (4.6) (10.97) (3.6) (0.62) (5.6) (10.83) (0.7) Item 11, sleep (1.21) (5.7) (1.71) (2.6) (2.04) (0.3) (0.07) (11.7) + Item 12, happy (8.99) (1.8) (0.20) (21.9) DIF (6.33) (2.2) (8.42) (14.7) Item 13, talked less (0.32) (1.1) (0.58) (5.3) (1.52) (4.4) (0.61) (11.1) Item 14, lonely (11.02) (2.4) (3.70) (6.4) (6.01) (5.4) (5.44) (5.9) Item 15, unfriendly (1.86) (3.3) (2.05) (3.2) (3.79) (6.3) (0.32) (12.6) + Item 16, enjoyed (13.44) DIF (1.9) (0.53) (4.5) (0.17) (3.8) (9.75) (17.4) DIF Item 17, crying (0.70) (1.1) (0.12) (5.7) (10.24) (30.5) DIF (3.83) (5.3) Item 18, sad (7.67) (8.5) (0.56) (10.0) (6.47) (1.0) (7.74) (2.6) Item 19, disliked (0) (2.7) (1.58) (2.3) (2.11) (3.1) (1.65) (8.2) Item 20, get going (5.62) (9.6) (0.91) (1.8) (3.12) (5.9) (6.81) (1.3) Total # DIF Items 3 0 0 2 0 1 0 1

PAGE 97

88 and one (Item #16) among Mexican Americans. 5), men were more likely than women to endorse the crying item. Gender-DIF (Male vs. Female) As shown in Table 8, two gender biased items (Items #15, people were unfriendly and #17, I had crying spells) were identified by both CFA and IRT. Compared to women, men showed a greater tendency to endorse the unfriendly item at all levels of depression. Wome n had a greater propensity th an men to endorse crying item at lower to mid-higher depressions scores (theta = -3 to +2.4). However, at more severe levels of depre ssive symptoms (theta +2.5), men were more likely than women to endorse the crying item. Gender was confounded with Mexican Amer ican race/ethnicity on the crying item. As shown in Figure 3, compared to Mexican American men, women were more likely to report the crying item at highe r levels of depressive symptoms (theta +.7). However, no gender differences on the crying item were observed at lower levels of depressive symptoms (theta < +.7) as shown in Figure 3. This finding indicates that with crying spells, greater gender DIF occurred at higher levels of depression. When compared to the total female sample, Mexican American women were more likely to endorse the crying item at higher le vels of depressive symptoms (theta +1.2), whereas at lower levels of depressive symptoms (theta < 1.2), they were less likely to endorse the crying item. Mexican American men showed a response pattern similar to that of men in the total sample, although their scores on the crying item were higher than that of the total male sample across all levels of depression.

PAGE 98

89 Table 8. Study 2 Gender DIF Results from CFA and IRT Methods Note. Items i n bold and undlined are common DIF ites across CA and IRT mehods in each cross-ral/ethnic compson. 2 er m F t aci ari + Reverse-coded item. a In CFA, DIF flagged if were > 11.88. b In IRT, DIF flagged if 2 were > 16.31. Gender DIF ( Male vs. Female) Total Whites Blacks Mexican Americans CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) Models Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Baseline Model (Referent: Item1) 6406.22 65164.4 1646.11 23246.3 808.31 6941.8 5313.52 41230.2 Comparison Models Item 2, appetite (1.66) (7.4) (2.26) (1.0) (0.58) (1.5) (9.71) (9.9) Item 3, blues (0.25) (1.9) (3.09) (0) (0.01) (1.4) (0.22) (6.7) + Item 4, good (0.62) (5.0) (2.87) (2.6) (5.17) (3.3) (2.77) (10.3) Item 5, mind (2.20) (2.1) (2.86) (3.3) (4.29) (5.1) (0.36) (3.8) Item 6, depressed (0.16) (3.1) (3.96) (11.0) (0.27) (1.7) (0.22) (1.4) Item 7, effort (0.02)) (11.9) (1.74) (5.2) (0.53) (12.6) (0.95) (8.9) + Item 8, hopeful (4.58) (7.6) (3.18) (5.6) (0.04) (0.8) (3.87) (10.8) Item 9, failure (1.41) (7.1) (4.49) (6.1) (0.05) (1.2) (0.05) (8.1) Item 10, fearful (4.33) (2.8) (2.16) (3.9) (0.27) (2.6) (3.40) (2.1) Item 11, sleep (0.33) (6.1) (4.58) (1.8) (3.47) (9.7) (0.56) (2.5) + Item 12, happy (1.12) (1.8) (8.32) (2.0) (4.49) (2.9) (1.11) (2.2) Item 13, talked less (1.69) (6.1) (1.81) (6.3) (0.27) (0.9) (0.57) (2.3) Item 14, lonely (0.11) (4.9) (7.24) (5.6) (0.89) (3.7) (5.66) (2.9) Item 15, unfriendly (27.16) DIF (17.0) DIF (10.69) (8.1) (1.05) (4.4) (9.68) (16.8) DIF + Item 16, enjoyed (1.85) (10.5) (6.61) (2.1) (0.76) (4.5) (4.83) (15.3) Item 17, crying (27.77) DIF (32.0) DIF (9.32) (10.3) (4.49) (7.2) (16.34) DIF (22.7) DIF Item 18, sad (4.71) (3.7) (9.50) (0.8) (0.07) (2.9) (6.72) (4.6) Item 19, disliked (0.77) (6.7) (3.68) (4.5) (0.27) (2.6) (0.04) (8.8) Item 20, get going (2.56) (0.3) (1.06) (2.0) (0.24) (6.7) (1.99) (3.9) Total # DIF Items 2 2 0 0 0 0 1 2

PAGE 99

Educational attainment DIF results are summarized in Table 9. In the comparison of those with less than an 8th grade education and 8th grade or more, five DIF items (Items #8, hopeful, #12, happy, #16, enjoyed, #17, crying, and #19, people disliked me) were identified in both CFA and IRT methods. Overall, those with lower educational attainment had greater propensities to endorse all five DIF items (recall that Items # 8, 12, and 16 had been reverse-coded, so greater propensity to endorse these three items represents greater propensity to report low positive affect) compared to those with higher educational attainment. However, different response patterns between the two groups were not observed at lower levels of depressive symptoms for all five DIF items. Three low positive affect items (Items #8, hopeful, #12, happy, and #16, enjoyed,) Educational Attainment-DIF (Less than 8th grade vs. 8th grade or more) Figure 3. Item information function for Item 17 (I had crying spells) showing Gender-DIF in the total sample and Mexican American race/ethnicity. Gender-DIF: Item #17, 'crying spells'1.01.52.00.01.02.03.0 symptoms (Theta) Information 0.00.5-3.0-2.0-1.0DepressiveItem Total Female Total Male MA Female MA Male 90

PAGE 100

91 Table 9. Study 2 Educational Attainment DIF Results from CFA and IRT Methods Note. Items i n bold and undined are common DIF ites across CFA and IRT methods in each cross-ral/ethnic compson. 2 erl m aci ari + Reverse-coded item. a In CFA, DIF flagged if were > 11.88. b In IRT, DIF flagged if 2 were > 16.31. Educational Attainment DIF (less than 8 th grade vs. 8 th grade or more) Total Whites Blacks Mexican Americans CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) CFA a ( df = 2) IRT b ( df = 4) Models Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Chi-Square (Difference) Baseline Model (Referent: Item1) 6359.75 64037.9 1623.15 22222.1 810.44 6829.1 5276.64 40527.9 Comparison Models Item 2, appetite (1.65) (26.0) DIF (0.96) (2.3) (0.65) (1.0) (0) (7.0) Item 3, blues (5.72) (16.6) DIF (2.84) (7.6) (2.80) (3.7) (2.85) (7.1) + Item 4, good (4.24) (211.9) DIF (2.03) (18.6) DIF (6.66) (7.9) (3.54) (23.3) DIF Item 5, mind (6.91) (25.6) DIF (0.07) (17.0) DIF (5.03) (6.3) (0.58) (3.4) Item 6, depressed (0.14) (0.8) (0.03) (13.0) (0.37) (4.5) (9.85) (5.7) Item 7, effort (0.49) (41.0) DIF (0.20) (13.0) (1.48) (5.9) (2.87) (5.8) + Item 8, hopeful (24.56) DIF (98.3) DIF (0.20) (4.7) (7.46) (4.9) (5.41) (20.3) DIF Item 9, failure (6.76) (27.8) DIF (0.74) (6.6) (5.02) (5.6) (2.97) (3.1) Item 10, fearful (1.60) (9.4) (4.06) (2.9) (3.90) (3.4) (1.85) (4.7) Item 11, sleep (0.19) (19.4) DIF (0.02) (1.7) (4.16) (1.4) (0.57) (5.2) + Item 12, happy (21.26) DIF (94.1) DIF (0.01) (2.8) (4.36) (1.6) (2.95) (25.3) DIF Item 13, talked less (3.01) (14.0) (1.02) (6.4) (11.68) (11.4) (1.00) (10.9) Item 14, lonely (0) (6.6) (2.10) (3.7) (4.00) (2.5) (9.77) (3.6) Item 15, unfriendly (19.03) DIF (14.2) (9.54) (13.9) (6.90) (11.1) (4.31) (9.1) + Item 16, enjoyed (18.04) DIF ( 185.6 ) DIF (0.05) (3.1) (7.18) (2.8) (2.80) (68.9) DIF Item 17, crying (42.06) DIF (60.8) DIF (7.58) (7.6) (17.98) DIF (59.6) DIF (7.12) (2.1) Item 18, sad (1.60) (14.7) (1.04) (2.5) (1.32) (5.8) (6.32) (12.9) Item 19, disliked (53.59) DIF (48.2) DIF (9.80) (11.6) (10.99) (6.2) (5.46) (5.0) Item 20, get going (1.98) (30.3) DIF (0.86) (1.8) (2.36) (6.7) (1.29) (2.5) Total # DIF Items 6 13 0 2 1 1 0 4

PAGE 101

favored those with lower educational attainment, indicating it was harder for lower educated people to respond to the positive side of these items. Education DIF: Item #17, 'crying spells'0.00.51.01.52.0-3.0-2.0-1.00.01.02.03.0Depressive symptoms (Theta)Item Information Total Low Education Total High Education Black Low Education Black High Education Figure 4. Item information function for Item 17 (I had crying spells) showing Education-DIF in the total sample and Black race/ethnicity. Educational attainment was confounded with Black race/ethnicity on the crying item. As shown in Figure 4, low educated Blacks were more likely than high educated Blacks to endorse the crying item at lower to fairly high levels of depressive symptoms (theta = -3 to +2.1). Interestingly, however, at the highest levels of depression scores (theta +2.2), Blacks with higher educational attainment showed a greater tendency to endorse the crying item than Blacks with lower educational attainment. Similar response patterns were observed among total lower and higher education groups: those 92

PAGE 102

93 with low educational attainment were ge nerally more likely than those with high educational attainment to endorse the crying item (theta = -3 to +2.7), whereas at very severe levels of depression (theta +2.8), those with high edu cational attainment were more likely than those with low educational attainment to endorse the crying item. Compared to the total sample, both low and high educated Blacks were less likely to respond the crying item across all le vels of depressive symptom scores. Discussion With the increasing recogn ition of the importance of understanding mental health not only across racial/ethnic groups, but within such groups, has come a corresponding recognition of the importance of investigating the measur ement equivalence of popular screening tools such as the CES-D. Few if any studies have examined item bias in the CES-D across sociodemographic st rata in racially/ethnically diverse elderly populations. This Study 2 evaluated sociodemographic-rela ted measurement equivalence of the CESD in three racial/ethnic elderly groups draw n from two national datasets, the New Haven and Hispanic EPESE. In addition, given the importance of using multiple analytic strategies for measurement equivalence research (Hambleton, 2006; Wang & Russell, 2005), the present dissertation study used two different analytic methods, CFA and IRT, to detect item bias on the CES-D more accurately. Across all sociodemographic subgroups of age, gender, educational attainment, six of the twenty CES-D items revealed sta tistically significant DIF in both CFA and IRT methods: two showed gender bias (people were unfriendly and cry ing spells) and five showed bias associated with educational attainment (hopeful, happy, enjoyed, crying spells, and people disliked me). Notably, the crying item was not only biased

PAGE 103

94 by both gender and educational attainment cases but there were also interactions between race/ethnicity and both gender and educationa l attainment. This clearly suggests that crying was the most biased item across all the sociodemographic strata in diverse racial/ethnic elderly groups. The six DIF items included th e two interpersonal relation items, three of the four positive feeling items, and one depressive affect item. When using both CFA and IRT methods, no evidence of age DIF was observed in the present samples, all of which included only persons aged 65 and over. With regard to gender bias in the crying spells item, the present finding supported previous item bias studies on the CE S-D (e.g., Cole et al., 2000; Stommel et al., 1993) showing a greater propensity to endorse the crying spells item among females. This can be explained by the concept of womens permission to cry in society (e.g., Hammen & Padeskym, 1977), wh ich suggests that crying may be viewed as socially acceptable behavior for women in many circum stances. Similarly, it also has been suggested that greater tendencies to repor t mental health issues among women are consistent with socialization patterns that allow women to re port discomfort more readily and to appear less stoical than men (Mechanic, 1978; Nathanson, 1975). One novel finding related to the crying spel ls item was that at the highest level of depressive symptoms (here, at 2.5 or higher theta), men were more likely than women to endorse this item. Unlike previous studi es that has only identified womens higher endorsement on this item (e.g., Cole et al., 2000; Stommel et al., 1993), the unique response pattern reported here clearly indicate s that men were more likely to cry than women when their depressive symptoms were severe. At lower to moderately high depression scores (here, at th eta ranged -3 to 2.4), men ha d a tendency to inhibit their

PAGE 104

95 reporting of the crying spells, perhaps as a re sult of social desirability. A number of previous studies have noted that crying doe s not indicate depressed mood for men (e.g., Ross & Mirowsky, 1984). This finding suggests that this latter conclusion should be revisited and that further inve stigation on the crying item with different samples is warranted. No clear evidence of an age bias in the CES-D items was found in the present study, in agreement with two published DIF studies on the CES-D items among older adults (Cole et al., 2000; Yang & Jones, in pr ess). As suggested by Cole and colleagues (2000), this study also suspect th at the absence of an age bias may result either from an actual lack of bias due to age or my use of a restricted age range. Given the fact that the present DIF analyses showed six age biased items in the CES-D that were detected by one, but not both, of the two DIF methods, th e results suggest future research should include younger or middle-aged adults for comparisons with older adults. The crucial role of educational attainment on the CES-D items was identified in this dissertation study of older adults. This may be the first study to evaluate educational bias in the CES-D items among racially/ethnically diverse older adults. Five items (three reverse-coded positive affect items hopeful, happy, enjoyed, one depressive affect item crying, and one interpersonal prob lem item people disliked me) revealed consistently higher endorsement by the lowe r educated group. Edu cational attainment may be also a proxy for more fundamental di fferences that lead to DIF (McHorney & Fleishman, 2006). That is, higher educationa l attainment generally provides more resources and choices (Krause, 2007) and generally provides more opportunities for experiences that promote general well-b eing (Mirowsky & Ross, 2003). It may be

PAGE 105

96 therefore possible for in dividuals with higher levels of e ducation to express themselves in a more positive way, which in turn may be a ssociated with their lower endorsement of depressive symptom items. One of the most intriguing findings was that highly educated individuals had a greater propensity to endors e positive affect items (hopefu l, happy, and enjoyed,). In other words, compared to those with lower educational attainment, it was easier for those with higher educational attainment to report such positive fee lings as feeling good about themselves, feeling hopeful about the futu re, and feeling happy. This is compatible with research suggesting that higher educati onal level is associated with greater selfexpressiveness (Krause, 2007), which is connected to a greater positiv e affect and greater satisfaction with life (Bettencourt & Sheldon, 2001). This may eventually lead to lower endorsement of depressive symptoms. It should be emphasized that the cryi ng item was the only one biased in the subgroups divided by gender and by educationa l attainment, as well as the only one confounded with race/ethnicity. Results indica ted two interaction e ffects: 1) gender and Mexican American race/ethnicity and 2) educat ional attainment and Bl ack race/ethnicity. In essence, Mexican American women and lower educated Blacks had greater propensities to endorse the crying item compared with their counterparts (Mexican American men and higher educated Blacks, respectively). In terms of the interaction effect of gender, Mexican American women were more likely than Mexican American men to endorse the crying spells item, especially at higher levels of depressive symptom scores (here, at .7 or hi gher theta). The abovementioned greater propensity to repor t crying item among female respondents

PAGE 106

97 was clearly confounded with Mexican American culture, where for women crying may be a more acceptable behavior than it is fo r Whites and Blacks (Azocar, Aren, Miranda, & Mu oz, 2001; Golding, Aneshensel, & Hough, 1991). Compared with other White and Blacks women, Mexican American women showed the highest score on the crying spells item only at higher depressive symptom scores (at theta 1.2 or higher). With regard to the intera ction between educational level and Black race/ethnicity, less educated Blacks in general were more lik ely than more educated Blacks to endorse the crying item at lower to fairly high levels of depression. This finding also shows the effect of educational attainment on self-e xpressiveness among lower educated Blacks, suggesting the connections between a lowe r self-expressiveness and a lower positive affect, which in turn may lead to higher le vels of depressive sy mptoms (Bettencourt & Sheldon, 2001; Krause, 2007). Another interesting respons e pattern was observed in this interaction effect: Blacks with higher educational attainment showed a greater propensity to express their crying spells when they had severe depres sive symptoms (at theta 2.2 or higher). However, when their levels of depression were not severe, higher educated Blacks appeared to inhibit their expres sion or at least the reporting of the crying spells item. It is also worth mentioning that Blacks of both lower and hi gher education groups had the lowest scores on the crying item compared with all other groups, wh ich clearly suggests Blacks are underreportin g on this item. Some survey re searchers have also found Blacks to underreport socially stigmatizing behaviors and viewed this respons e pattern as a part of displaying their social desirabili ty (e.g., Johnson & van de Vijver, 2003).

PAGE 107

98 Noteworthily, both of the interpersonal relation items in the CES-D showed bias: people were unfriendly was gender biased a nd people dislike me wa s education biased. Compared with women and people with highe r educational attainment, men and people with lower educational attainme nt had greater propensities to endorse both interpersonal problem items across all levels of respondents depressive symptoms. This may reflect self-perceptions of discrimination experien ced by those with male gender and lower educational attainment. In fact, there is so me evidence that groups with lower power, as reflected by lower educational attainment, are likely to view the world as chaotic and catastrophic and to distrust the world outside family and friends (e.g., Briones et al., 1990; Hoppe & Heller, 1975), which may in tu rn lead to their self-perceptions of discrimination against them in everyday life. In fact, the two interpersonal problem items have revealed race/ethnicity -related item bias among Mexican Americans (e.g., Kim et al., 2007) as well as Blacks (e.g., Blazer et al., 1998; Cole et al., 2000). From a methodological standpoint, using multip le analytic strategies to detect DIF was of particular interest in this diss ertation study. Following the suggestion of researchers (e.g., Hambleton, 2006; Hamblet on & Rogers, 1989; Wang & Russell, 2005), this dissertation study applie d two of the most common DIF methods, CFA and IRT. DIF items that showed up consistently across the two procedures were reported in order to render conservative conclusions as to which items showed DIF and which ones did not. In addition, this study followed a unified stra tegy of DIF detection suggested by Stark and colleagues (2006), which can provide highe r power and lower Type I error rates. CFA and IRT methods yielded similar results although CFA identified more DIF items than did IRT. The results from CFA and IR T gave us a great amount of information on

PAGE 108

99 sociodemographic-related DIF on the CES-D, which also provided methodological and practical implications for future research. However, from a practical point of view, questions still remain as to how to deal with DIF once it is identified, as well as how to interpret uncommonly detected DI F items. Clearly, more work needs to be done in health disparities and measurement research to develo p clear guidelines to deal with DIF, such as removing consistent DIF items and adjusting cut-scores. Overall the results from Study 2 suggest that when self-re port instruments are used to screen and assess for depressi on in diverse sociodemographic populations, researchers and practitioners s hould be aware of the risk that individuals from different sociodemographic and cultural backgrounds ma y tend to be misclassified, which can directly lead to misdiagnosis as well as mi streatment for depression. Use of inaccurate measures could also lead to misguided public policies. Therefor e, in light of the abovementioned consequences of using nonequi valent measures, researchers should pay careful attention to making m easures more reliable and so cioculturally appropriate, as well as to establishing measurement equivale nce of the existing depression measures. The latter is the first and crucial step befo re diverse sociodemographic and racial/ethnic groups can be compared.

PAGE 109

100 CHAPTER FOUR: STUDY 3 ACCULTURATIONAND LANGUAGE-REL ATED MEASUREMENT BIAS Introduction Study 3 focuses on Mexican Americans, the largest subgroup of Hispanics in the United States. Mexican Americans are themse lves a culturally diverse group. A number of researchers have shown that Mexican Americans have different characteristics depending on their place of birth (e.g., Chiriboga, 2004; Sundquist & Winkleby, 2000), socioeconomic status (e.g., Krau se & Markides, 1985), and the level of acculturation (e.g., Chiriboga, Jang, Banks, & Kim, 2007; Gonzl ez, Haan, & Hinton, 2001). Of particular interest in the present investigation are di fferences in levels of acculturation because acculturation is considered one of the key dime nsions for understanding health disparities in diverse populations (Stewart & Npoles-Springer, 2003). A dynamic and ongoing cultural process, acculturation has been referred to as the degree to which people change when faced w ith the challenge of living in a cultural context differing from their own (Trimble, 2003). Studies examining the link between level of acculturation and health outcomes have shown mixed results, with some reporting positive connections (e.g., Berry & Kim, 1989; Chiriboga, Black, Aranda, & Markides, 2002; Gonzlez et al., 2001) a nd some reporting negative (e.g., Krause & Goldenhar, 1992; Sundquist & Winkleby, 2000). Despite the mixed results, what most health disparities researchers agree upon is that the level of acculturation may influence peoples life styles, behaviors, attitudes, and general experien ces to the host culture (e.g.,

PAGE 110

101 Stewart & Npoles-Springer, 2003). Theref ore, acculturation may be an important construct that can explain social and cultura l differences within Me xican Americans. With respect to the last point, Mexica n Americans who are more acculturated may be more likely to adopt ways of thinking and fe eling that characterize the host culture. In contrast, Mexican Americans who are less a cculturated may be less able to accept or adopt the new ways of thinking and expres sing themselves, and instead hold onto the values and behaviors that reflect their cu lture of origin. Th ese differences in acculturation may influence the ways in wh ich psychological symptoms are organized and expressed, as well as holding implications for the levels of measurement bias evident in mental health instruments such as screeni ng tools for depressive symptoms in here. The most frequently used depression scr eening tool in the United States is the Center for Epidemiological Studies Depression Scale (CES-D; Radloff, 1977). Originally developed for European American populations, good internal consistency and general usefulness to assess depressive sympto ms have been observed when the CES-D is applied to Mexican Americans (e.g., Gonzl ez et al., 2001; Liang, Van Tran, Krause, & Markides, 1989). The Spanish version of th e CES-D has been also widely applied to Mexican Americans and shows good comparability to the English version of the CES-D (e.g., Perczek, Caver, Price, & Pozo-Kader man, 2000). However, one missing piece of psychometric information is whether item responses to the CES-D among Mexican Americans are systematically influenced by factors, such as level of acculturation and language of assessment that are unrelated to depression. If the items in the CES-D do not function equivalently across subgroups of Mexican Americans, the CES-D may fail to capture depressive symptoms in certain subgroups of Mexican Americans. Under this

PAGE 111

102 condition, estimates of prevalence may be inaccurate and subgroup comparisons within Mexican Americans may be misleading. The effect of acculturation on the measur ement equivalence of the CES-D has not been fully determined in samples of older Mexican Americans, or indeed any other group of Hispanics. One study of older Mexican Americans found that the pattern of factor loadings in the CES-D was different in high and low acculturated groups, suggesting an association between the level of acculturati on and item endorsement (Chiriboga et al., 2007). To date only a single study has used an appropriate differe ntial item functioning (DIF) method to examine item bias. Using a sample of pregnant Hispanic women, Nguyen and colleagues (2007) found that responses to the CES-D differed by acculturation and that the low acculturated group was less likely to endorse somatic symptoms but more likely to endorse positive items than the acculturated group. In their study, the term acculturation was measured w ith respondents language preference, and the total sample was divided into two accult uration groups: people w ho preferred English were considered acculturate d and those who preferred Spanish were considered unacculturated. Although language of prefer ence has been used a proxy for acculturation (e.g., Cabassa, 2003; Zane & Mak, 2003), the latter is in fact a complex construct that include more than simple language ability. In addition, there has been research showing that feelings reported in ones primary la nguage may be expressed with more emotion than those expressed in a s econd language (Cuellar & Robe rts, 1984; Roberts, Vernon, & Rhoades, 1989). None of the existing studies have paid attention to the effects of both the level of acculturation and language on the item bias in the CES-D.

PAGE 112

103 An important issue for the present stu dy of measurement equivalence was to find a methodology that can distinguish a lack of measurement equivalence (i.e., DIF) from true differences in the trait di stributions (i.e., impact) more accurately. Researchers have used two popular methods to detect DIF: c onfirmatory factor an alysis (CFA) and item response theory (IRT). Most studies have used only one of these two methods to identify DIF items. Only a few simulation studies have compared these two DIF methods (e.g., Meade & Lautenschlager, 2004; Raju, Laffitte, & Byrne, 2002). For more accurate results and firmly-grounded conclusions as to which items are classified as DIF, a number of researchers recommended using multiple DIF analytic strategies (e.g., Hambleton, 2006; Wang & Ru ssell, 2005). In addition, one suggested strategy to reduce Type I error rates in DIF results has been to focus on the items that show up consistently across different DI F methods (Hambleton, 2006). Recently, Stark, Chernyshenko, and Drasgow (2006) proposed an d tested a common strategy for detecting DIF with CFA and IRT based on the likeli hood ratio (LR) test. Their method, which involved comparing statistically correct free-baseline models w ith a series of constrained models that simultaneously examined item load ings (for CFA)/discrimination parameters (for IRT) and intercepts (f or CFA)/location parameters (for IRT), showed higher power and low Type I error rates across simulation c onditions for both CFA and IRT. A recent study using this common strategy found it to be a useful approach for detecting DIF items (e.g., Kim, Chiriboga, & Jang, 2007). One of my foci in the pres ent dissertation study is to us e the Stark and colleagues (2006) common strategy and compare the results across both CFA and IRT. The intent was to identify the effects of acculturation and instrument language on the measurement

PAGE 113

104 equivalence of the CES-D in older Mexican Americans. It was expected that item responses to the CES-D would differ de pending on the level of acculturation and language used in responding to the questions. Methods Sample Data were drawn from the first wave of the Hispanic Established Populations for Epidemiologic Studies of the Elderly (H-EPE SE). The H-EPESE is a longitudinal study of Mexican Americans aged 65 and older w ho live in Texas, New Mexico, Colorado, Arizona, and California. In 1993-4, baseli ne interviews were conducted with 3,050 subjects using English and Spanish versions of the interview. Subjects were included in the analyses if they responded to all 20 CE S-D items (N = 2,623). From this listwise deletion, 427 participants (14.0 % of the total sample) were excluded. Those excluded had similar characteristics to those incl uded in terms of gender distribution and educational attainment, but were more likely to be older ( t = 9.42, p < .001). Measures CES-D scale The 20 item Center for Epidemiological Studies-Depression Scale (CES-D; Radloff, 1977) was used in this study and contains sixt een negative items and four positive items. The response format was a 4-point Likert scale, with categories presented in the following order: rarely or none of the time (coded as 0), some or a little of the time (coded as 1), occasionally (coded as 2) , and most or all of the time (coded as 3). The four positive items (Items #4, 8, 12, a nd 16) was reverse-scored and scale scores

PAGE 114

105 were computed by summing across twenty it ems to produce total scores ranging from 0 (no depressive symptoms) to 60 (severe depres sive symptoms). Scores of 16 or higher are typically viewed as evidence of probabl e depression (Andresen, Carter, Malmgren, & Patrick, 1994). A measure with this cutpoint was calculated and reliability in the present sample was satisfactory: = .88. Acculturation Level of acculturation was measured with th irteen items drawn from Hazuda et al. (1988) and Cuellar, Harris, and Jasso (1980) The thirteen items assessed linguistic acculturation including the self-reported abil ity to read, write, and understand English, and language used in conversations with fam ily (spouse, children, and parents), friends, neighbors, and coworkers. Previous research also noted that lingui stic acculturation has accounted for a substantial portion of acculturation status (e.g., Chiriboga et al., 2007; Jang, Kim, Chiriboga, & King-Ka llimanis, 2007). Since several items had relatively high levels of missing data as a result of v acated or never-occupied roles (e.g., language currently spoken with parents in most cas es the parent was deceased), imputation via the Solas statistical program (Statistical Solu tions, 2001) was used as a data substitution method. Detailed information on the imputa tion technique is described in papers by Chiriboga (2004) and Chiriboga et al. (2007). Principal comp onent analysis with oblimin and varimax rotation yielded one factor, and in ternal consistency based on the thirteen items was shown to be good ( = .98). Due to the special inte rest in the eff ects of level of acculturation, the total sample was divided into two acculturation groups using a median score on the linguistic ability factor: lo w acculturated Mexican Americans (N = 1,283) and high acculturated Mexican Americans (N = 1340).

PAGE 115

106 Instrument Language All participants in the H-EPESE study were given the choice of being interviewed in English or Spanish. Approximately 78 % of older Mexican Americans chose to be interviewed in Spanish. For the purpose of this study, the total sample was divided into two instrument language groups: Mexican Amer icans interviewed in English (N = 561) and Spanish (N = 2,062). Acculturation Language To determine whether item bias in the CES-D was more likely to be associated with acculturation or language, Mexican Americans were divided into four acculturation (high/low) language (English/Spanish) groups: high acculturated Mexican Americans interviewed in English (i.e., High-Englis h: N = 503); high ac culturated Mexican Americans interviewed in Spanish (i.e., High-Spanish: N = 837); low acculturated Mexican Americans interviewed in Englis h (i.e., Low-English: N = 58); and low acculturated Mexican Americans interviewed in English (i.e., Low-English: N = 1,225). The Low-English group was excluded in the an alyses due to the small sample size. Statistical Analysis Dimensionality test Since the DIF detection methods used in this investigation assume the unidimensionality of the CES-D (Stark et al., 2006), the dimensionality was evaluated first using a principal component analysis (PCA) via SPSS and confirmatory factor analysis (CFA) via LISREL 8.8. This was done within each acculturation (high vs. low), language (English vs. Spanish), and acculturation x language (high-English vs. highSpanish vs. low-Spanish) group. The PCA results indicated that while the analyses did in

PAGE 116

107 fact produce either three or four factors, the ratio of firs t to second eigenvalue ranged from 3.00 (high acculturated Mexican Americans interviewed in English: 1 st factor = 6.36, % variance = 31.79; 2 nd factor = 2.13, % variance = 10.74; 3 rd factor = 1.17, % variance = 5.88) to 3.19 (low acculturated Mexican Americans: 1 st factor = 7.23, % variance = 36.15; 2 nd factor = 2.27, % variance = 11.38; 3 rd factor = 1.07, % variance = 5.38), which suggests that the data are e ssentially unidimensional (Lord, 1980; Stout, 1990). This result was supported by one-factor confirmatory factor analyses, where goodness-of-fit indices exceeded .90 (except for one .89 fit index for Mexican American group interviewed in Spanish). The low acculturated group showed slightly better goodness-of-fit indices (CFI = .93, NFI = .92, NNF I = .92), while the Mexican American group interviewed in Spanish showed the lo west goodness-of-fit indices (CFI =.91, NFI = .89, NNFI = .90). These results confirmed th e overall unidimensionality of the CES-D for the present analyses. DIF analyses After verifying the essential unidimensiona lity of the CES-D, the application of IRT and CFA DIF detection using the likeli hood ratio tests proceeded. For both methods this study followed the general approach to hypothesis testing descri bed by Stark et al. (2006), since the latter showed high power and low Type I error rates across a wide variety of simulation conditions. In essence, the authors suggest testing for DIF by using free baseline with B onferroni correction that can be implemented in both CFA and IRT. A fully free baseline model (with the exception of a single referent item) was used as the basis for subsequent nineteen nested model comparisons where one item at a time was constrained to be equal across groups. Item parameters were compared simultaneously

PAGE 117

(discriminationloadings and locationsintercepts), using Bonferroni corrected p-values for flagging DIF items. Detailed analytic procedures for CFA and IRT are described below. CFA DIF Detection. CFA DIF analyses involving item loadings and intercepts were conducted using an analogous strategy with LISREL 8.8. Using the free baseline model, where only the parameters of the referent (Item 1) are constrained across groups, baseline and constrained models were run in succession. The chi-square difference statistics for the nested model comparisons were evaluated using a Bonferroni corrected critical p-value. When the observed chi-square difference was greater than the corresponding critical chi-square value (Bonferroni corrected, 2 = 11.88 with 2 degrees of freedom), the item was flagged DIF. IRT DIF Detection. With regard to IRT DIF detection, the Graded Response Model (GRM; Samejima, 1969) was estimated using the MULTILOG program because the CES-D scale is polytomous. For the GRM, each four-category item has one discrimination parameter (a) and three location parameters (b 1 b 2 and b 3 ). The discrimination parameter reflects the extent to which an item differentiates between levels of underlying depression, and items with higher a are generally preferred because they are more informative in a psychometric sense. The location parameters refer to the point on the underlying depression scale in which the probability is 50% for endorsing the first category relative to the last 3 categories (b 1 0 vs. 1, 2, 3), the first 2 categories relative to the last 2 categories (b 2 0, 1 vs. 2, 3), and the first 3 categories relative to the fourth category (b 3 0, 1, 2 vs. 3), respectively. To determine whether the GRM used for parameter estimation adequately fits the data, chi-square fit statistics were assessed using 108

PAGE 118

the MODFIT program. Adjusted chi-squares to degrees of freedom ratios for each item were all less than 3, indicating good model-data fit (Drasgow, Levine, Tsien, Williams, & Mead, 1995). In IRT DIF detection, the concurrent calibration method was subsequently used to put the reference and focal group parameters on a common metric with Item 1 as an anchor item. The designated focal and reference groups are presented in Table 2 and Table 3. In this step, the latent mean of the designated reference group was set to zero, whereas the latent mean of the designated focal group was free to vary. As described for the CFA DIF method, the free-baseline model strategy was also used for each CES-D item, and differences in relative goodness of fit were examined with respect to critical chi-square statistics. Each chi-square difference was compared to Bonferroni corrected p-values (corrected, 2 = 16.31 with 4 degrees of freedom), and items exhibiting DIF were flagged. Results Sample Characteristics Descriptive characteristics of the two acculturation groups are summarized in Table 10. Compared to the low acculturated, the high acculturated Mexican American elders were likely to be younger (t = 3.96, p < .001), male ( 2 = 14.48, p < .001), more educated (t = -30.48, p < .001), and born in the U.S. ( 2 = 399.26, p < .001). As I expected, high acculturated elders were more likely than low acculturated to be interviewed in English ( 2 = 424.95, p < .001). That low acculturated subjects might be interviewed in English may appear contradictory; it resulted from the fact that this study 109

PAGE 119

defined high or low acculturation on the basis of a median split: those closer to the median could function reasonably well in the alternative language. Table 10. Study 3 Descriptive Information of the High and Low Acculturated Mexican Americans High Acculturated Mexican Americans (N = 1,340) Low Acculturated Mexican Americans (N = 1,283) Variable M/SD (%) M/SD (%) t ( 2 ) Age 72.06 /5.93 73.05/6.73 3.96*** Gender (54.6%) (61.9%) (14.48***) Educational Attainment 6.82/3.88 2.88/2.58 -30.48*** Born in the U.S. (76.9%) (38.3%) (399.26***) Instrument Language (424.95***) English (37.5%) (4.5%) Spanish (62.5%) (92.5%) CES-D 9.26/8.67 10.89/9.86 4.51*** Probable Depression (16) (21.4%) (26.5%) (8.83**) ** p < .01, *** p < .001 Of particular relevance is that high acculturated Mexican American elders were significantly less likely to report symptoms on the CES-D scale (t = 4.57, p < .001). 110

PAGE 120

1112 Using the standard cutoff of 16 on the CES-D for evidence of probable depression, however, both high and low acculturated groups exhibited a greater likelihood for depression than levels previously reported for the non-Hispanic White population (e.g., Bromberger et al., 2004; Cornoni-Huntley, Blazer, Lafferty, Everett, Brock, & Farmer, 1990; Swenson et al., 2000). Significant group differences did persist: 26.5% of the low acculturated and 21.4% of the high acculturated fell into the probable depression category ( = 8.83, p < .01). These differences between low and high acculturated groups have been also reported in a number of previous studies of older Hispanic populations (e.g., Gonzalez et al., 2001; Mills & Henretta, 2001). As shown in the first data column of Tables 11 and 12, in the comparison of high and low acculturated groups, CFA flagged eight DIF items (Items # 3, 7, 11, 14, 16, 17, 18, and 19) and IRT identified three DIF items (Items # 4, 16, and 17). Two DIF items (Items #16, the reverse-scored I enjoyed life and #17, I had crying spells) were identified in both CFA and IRT. The (not) enjoyed item favored the low acculturated Acculturation-bias Table 11 and Table 12 summarize results of CFA and IRT DIF analyses. In both CFA and IRT methods, this study used Item 1 as a referent and had nineteen model comparisons in which each of the CES-D items was constrained to be equal across groups in each model. Following suggestions by a number of researchers (Hambleton, 2006; Hambleton & Rogers, 1989), this study reported DIF items that revealed consistently across the two methods as a more conservative strategy that would reduce possible Type I error rates, which is of particular concern in DIF detection. DIF Results

PAGE 121

Table 11. Study 3 DIF Results from Confirmatory Factor Analysis (df =341, df = 2) Acculturation-DIF Language-DIF Acculturation Language-DIF High a vs. Low b English b vs. Spanish a High-English b vs. High-Spanish a High-Spanish b vs. Low-Spanish a High-English b vs. Low-Spanish a Models 2 ( 2 ) 2 ( 2 ) 2 ( 2 ) 2 ( 2 ) 2 ( 2 ) Baseline Model (Referent: Item1) 6384.67 6550.36 3098.15 4937.52 3541.65 Comparison Models Item 2, appetite (7.02) (0.38) (4.87) (0.55) (0.41) Item 3, blues (43.83) DIF (19.54) DIF (3.45) (3.57) (35.73) DIF Item 4, good (4.09) (10.5) (4.32) (1.14) (12.01) DIF Item 5, mind (3.38) (4.16) (5.66) (1.67) (1.50) Item 6, depressed (10.20) (2.57) (0.70) (0.69) (5.19) Item 7, effort (14.20) DIF (0.77) (4.04) (0.46) (3.43) Item 8, hopeful (6.00) (1.66) (4.12) (1.81) (2.70) Item 9, failure (11.68) (7.54) (0.08) (0.83) (9.37) Item 10, fearful (7.47) (1.57) (9.23) (0.03) (0.77) Item 11, sleep (14.81) DIF (19.01) DIF (56.76) DIF (12.7) DIF (4.60) Item 12, happy (11.18) (67.05) DIF (21.99) DIF (0.01) (62.14) DIF Item 13, talked less (0.80) (12.67) DIF (19.01) DIF (2.55) (11.09) Item 14, lonely (20.13) DIF (1.61) (16.25) DIF (5.52) (3.92) Item 15, unfriendly (8.89) (25.84) DIF (7.32) (2.33) (18.16) DIF Item 16, enjoyed (12.37) DIF (127.47) DIF (114.08) DIF (2.14) (136.84) DIF Item 17, crying (53.50) DIF (55.57) DIF (1.59) (9.10) (45.84) DIF Item 18, sad (14.88) DIF (1.49) (20.34) DIF (3.71) (2.13) Item 19, disliked (27.65) DIF (20.73) DIF (1.09) (7.88) (24.61) DIF Item 20, get going (4.14) (0.91) (0.62) (0.60) (2.73) Total # DIF Items 8 8 6 1 7 Note. High = high acculturated group; Low = low acculturated group; English = group interviewed in English; Spanish = group interviewed in Spanish. Items in bold and underlined are common DIF items detected by both CFA and IRT methods. In CFA, DIF flagged if 2 were > 11.88 with 2 degree of freedom. Reverse-coded item. a Reference group. b Focal group 112

PAGE 122

113 Table 12. Study 3 DIF Results from Item Response Theory (df =157, df = 4) Acculturation-DIF Language-DIF Acculturation Language-DIF High a vs. Low b English b vs. Spanish a High-English b vs. High-Spanish a High-Spanish b vs. Low-Spanish a High-English b vs. Low-Spanish a Models 2 ( 2 ) 2 ( 2 ) 2 ( 2 ) 2 ( 2 ) 2 ( 2 ) Baseline Model (Referent: Item1) 41432.5 40341.7 21282.7 33326.3 29721.5 Comparison Models Item 2, appetite (3.0) (3.7) (5.5) (2.6) (4.4) Item 3, blues (3.0) (15.1) (6.0) (4.0) (17.3) DIF Item 4, good (17.4) DIF (14.0) (10.9) (7.5) (21.8) DIF Item 5, mind (5.0) (2.8) (3.2) (3.2) (5.6) Item 6, depressed (12.0) (6.2) (1.6) (7.1) (12.6) Item 7, effort (4.8) (17.4) DIF (20.1) DIF (2.7) (17.4) DIF Item 8, hopeful (6.7) (39.3) DIF (40.0) DIF (3.7) (32.8) DIF Item 9, failure (7.3) (10.9) (7.7) (0.7) (14.2) Item 10, fearful (3.6) (11.6) (11.2) (3.7) (7.8) Item 11, sleep (7.5) (5.9) (10.0) (13.4) (3.3) Item 12, happy (14.9) (37.7) DIF (21.8) DIF (5.8) (32.0) DIF Item 13, talked less (2.2) (4.7) (4.8) (4.7) (3.0) Item 14, lonely (5.6) (3.0) (4.3) (7.8) (3.5) Item 15, unfriendly (7.5) (15.1) (7.4) (2.8) (15.3) Item 16, enjoyed (29.4) DIF (88.1) DIF (61.9) DIF (6.5) (83.6) DIF Item 17, crying (17.4) DIF (14.1) (3.8) (13.3) (14.9) Item 18, sad (8.7) (3.8) (4.3) (12.4) (3.0) Item 19, disliked (15.9) (25.6) DIF (13.1) (6.8) (27.3) DIF Item 20, get going (5.7) (7.6) (4.9) (2.2) (10.3) Total # DIF Items 3 5 4 0 7 Note. High = High Acculturated Group; Low = Low Acculturated Group. Items in bold and underlined are common DIF items detected by both CFA and IRT methods. In IRT, DIF flagged if 2 were > 16.31 with 4 degree of freedom. Reverse-coded item. a Reference group. b Focal group

PAGE 123

114 group over the high acculturated group, sugge sting a greater endorsement of low enjoyment among low acculturated subjects. In other words, high acculturated Mexican Americans were more likely than low acculturated to endorse the positive side of the enjoyed item. For the crying item, low acculturated elders showed a greater tendency to endorse in general, but at severe levels of depre ssive symptoms among Mexican Americans, high acculturated elders showed a greater endorsement. Language-bias As presented in the second data column of Table 11 and 12, in the comparison of groups interviewed in English and Spanish, CFA identified eight DIF items (Items # 3, 11, 12, 13, 15, 16, 17, and 19) and IRT displayed five DIF items (Items # 7, 8, 12, 16, and 19). There were three common DIF items (Items # 12, I was happy, #16, I enjoyed life and #19, people disliked me) favori ng the group interviewed in Spanish, which indicates that those interviewed in Spanish ha d greater propensities to endorse the three DIF items than those interviewed in English. It should be noted that because of reversescoring, responses to the two positive affect items (Items #12, happy and #16, enjoyed) indicate that those interviewed in Spanish were less likely to endorse the happy and enjoyed life end of the items th an were those interviewed in English. Acculturation Language-bias Three sets of acculturation language comparisons were made to evaluate whether item bias was more likely to be a ssociated with acculturation or language: 1) to examine how instrument language (English/Spanish) was associated with item bias within the same level of acculturation (i.e., controlling the acculturation effect), high acculturated Mexican Americans interviewed in English (i.e., High-English) and high

PAGE 124

115 acculturated Mexican Americans interviewed in Spanish (i.e., High-Spanish) were compared; 2) to examine how acculturation a ffected item bias on the CES-D within the same language use condition (i.e., controlli ng the language effect), high acculturated Mexican Americans interviewed in Spanish (i.e., High-Spanish) and low acculturated Mexican Americans interviewed in Spanish (i.e ., Low-Spanish) were compared; and 3) to examine possible interaction effects of acculturation and instrument language, high acculturated Mexican Americans interviewed in English (i.e., High-English) and low acculturated Mexican Americans interviewed in Spanish (i.e., Low-Spanish) were compared. First, in the comparison of High-Eng lish and High-Spanish combinations, the amount of instrument language DIF was sli ghtly decreased by controlling acculturation effects (Table 11 and Table 12, third data co lumn). Compared with the language DIF results, both CFA and IRT flagged slightly less DIF items: CFA identified six DIF items (Items # 11, 12, 13, 14, 16, and 18) and IRT flagged four DIF items (Items # 7, 8, 12, and 16). Two common DIF items (Items #12, I wa s happy and #16, I enjoyed life) were identified in both CFA and IRT. Both were positive affect items and favored the HighSpanish group. Since the two positive items were reverse-coded, this indicates that those who were higher in acculturati on but interviewed in Spanish were less likely to report positive feelings than were those in th e High-English group. Results from this comparison suggest that instrument language affected the measurement equivalence of the CES-D even when the level of acculturation was conditioned at the same level, which also indicates that acculturation was not an im portant factor to explain language DIF.

PAGE 125

116 Second, in the comparison of High-Spanish and Low-Spanish (Tables 11 and 12, fourth data column), the amount of accultura tion bias was reduced after controlling the effects of language. The two common DIF items identified from the acculturation DIF results (Items #16 and #17) disappeared in th is comparison after the effect of language was controlled. One DIF item (Item #11) was flagged only in CFA, whereas eight DIF items in CFA and three DIF items in IRT were identified from the acculturation DIF results. Results clearly suggest that the accu lturation-related item bias in the CES-D was explained by language differences among Mexican Americans. Lastly, in the comparison of High-E nglish and Low-Spanish (Tables 11 and 12, last data column), more common DIF items were identified compared to any other comparisons, suggesting possible interaction effects of acculturation and instrument language. Five common DIF items were iden tified (Items #3, I could not shake off the blues, #4, feeling good about myself, #12, I was happy, #16, I enjo yed life, and #19, people disliked me). All five favored Low-Spanish, suggesting that Low-Spanish was more likely to endorse the five items than was High-English (reca ll that Items # 4, 12, and 16 had been reverse-coded, so these three items represent low positive affect). With regard to three posi tive items, it was easier for High-E nglish to respond to the positive side of these items. Discussion The primary purpose of Study 3 was to examine how the level of acculturation and instrument language influences meas urement equivalence of the CES-D among Mexican American elders. The H-EPESE data set used in the analyses was unusual in that relatively large numbers of English proficient subjects were interviewed in Spanish.

PAGE 126

117 The data thus provided a unique opportunity to investigate whether responses to the CESD items were more likely to be associated with acculturation or instrument language by comparing groups after conditioning acculturatio n and language at the same level. Study 3 anticipated that item responses to the CESD would vary by the level of acculturation and language of interview. In addition, the us e of two analytic approaches, CFA and IRT, was designed to provide stronger evid ence for the potential item bias. One major finding was that the level of acculturation and instrument language independently affected measurement bi as in the CES-D among older Mexican Americans. It is worth mentioning that th is may be the first work in gerontology to differentiate the effects of instrument language and accult uration on the CES-D. CFA and IRT identified two accultura tion biased (I enjoyed life and I had crying spells) and three language biased items (I was happy, I enjoyed life and people disliked me). One item (I enjoyed life) was biased by both the level of acculturation and language, which will be discussed later. These results supported previous studies showing different patterns of depressive symptoms associat ed with acculturation (e.g., Jang, Kim, & Chiriboga, 2005; Nguyen et al., 2007) and language (e.g., Guarnaccia et al., 1989). Findings indicate individual di fferences in responding to th e items of the CES-D even within the same racial/ethni c group, which may also reflect different ways of perceive, feel, and express their symp toms depending upon how much people are acculturated and what language people are using. One novel finding was that the effect s of acculturation on the measurement equivalence of the CES-D were entirely explained by language differences. The identified effects of acculturation on item bias in the CES-D were eliminated when the

PAGE 127

118 same language condition was given to high and low acculturated Mexican Americans. In contrast, the identified effects of langua ge on the CES-D were diminished but still persisted even when the same level of accu lturation was given. These results clearly suggest the greater effects of language over acculturation on measurement bias in the CES-D. One possible explanation for these results is that instrument language may explain more fundamental differences that lead to response biases to depressive symptoms and therefore within the same language group (those interviewed in Spanish in the present study), differences between high and low acculturated groups may be diminished. Another possible explanation may be th at people who report in their native language of Spanish are more likely to repor t their symptoms emotionally (e.g., Cuellar & Roberts, 1984) and under this conditi on, their degree of acculturation may not influence their responses to depressive sy mptoms because of their elevated symptom expression. However, it is unfortunate that this Study 3 could not compare high and low acculturated Mexican Americans among those interviewed in English due to the small sample size of low acculturated Mexican Americans who responded in English (N = 58). Since no studies have addresse d this issue before, further investigation is needed to understand this phenomenon. A final explanation is that the biases th at resulted from language of interview could simply stem from the difficulties invol ved in finding culturally equivalent wording in the different languages. While the translat ions may be technically correct, even with respect to colloquial expressi ons, the extended connotations of the words used in the

PAGE 128

119 English and Spanish versions may exert a su btle bias in terms of predisposition to respond. Study 3 found four meaningful DIF items biased by the level of acculturation and language, which may reflect sociocultural diffe rences. First, two positive feeling items were biased. I enjoyed life was the only it em biased by both the level of acculturation and language of interview: Mexican Am ericans who were highly acculturated and interviewed in English had a greater propensity to report the positive side of this item than their counterparts. Another positive item biased by language of interview (I was happy) also showed a greater endorsement of the positive side of this item among subjects interviewed in English. These response patterns may reflect mainstream American culture learned by high accultu rated Mexican Americans and by those interviewed in Englishbut it should be recalled that only 58 low acculturated subjects were interviewed in English. There has been research suggesting that positive feelings are prominent in mainstream American cu lture (Ying, 1989) or at least a greater willingness to report positive feelings (Iw ata & Buka, 2002). However, it should be noted that findings in this dissertation St udy 3 showed disagreement with findings from Nguyen et al (2007), which indicated that low acculturated Hispanic women were predisposed to endorse their positive feelings. These contradictory findings may be due to the use of different samples: their sa mple included young Mexican American women, whereas the present sample included older Me xican Americans. Further investigation with different Mexican American samples, and with a wider age range, is needed. Second, consistent with a recent acculturation DIF study among Hispanic women (Nguyen et al., 2007), this study found crying spells to be acculturation-biased. The

PAGE 129

120 crying spells item was over-endorsed by lo w acculturated Mexican Americans except at the highest depression scores, which may re flect their adherence to Mexican culture, where crying may be an acceptable beha vior reflecting suffering (Azocar, Aren, Miranda, & Mu oz, 2001; Golding, Aneshensel, & Hough, 1991). Interestingly, however, high acculturated elders were more likely to endorse the crying item in the situation where their level of depressive symptomatology was high. Nguyen et al. (2007) has also suggested this nonuniform DIF for the crying spells item across high and low acculturated groups, although their study did not address the question of whether one group was more favored to respond. Lastly, with regard to the people disliked me item biased by language, the present study identified a greater endorsement among thos e interviewed in Spanish. Previous studies have suggested this interp ersonal problem item was more likely to be biased by race/ethnicity, especially among Bl acks (Cole, Kawachi, Maller, & Berkman, 2000) and Mexican Americans (Kim, Chiri boga, & Jang, 2007). I suspect selfperceptions of discrimination experienced by Mexican Americans interviewed in Spanish as a possible source of their pr edisposition to endorse this it em. This response pattern is not surprising given research suggesting that those who are more acculturated (i.e., those with greater usage of Englis h) were less likely to experi ence discrimination than those less acculturated (i.e., those with greater usage of Spanish) (Finch, Kolody, & Vega, 2000). Findings from this Study 3 help to expl ain cultural differences in responses to depressive symptoms among Mexican Americans. The identified differences in response patterns call attention to limitations of curre nt screening for depression using the CES-D

PAGE 130

121 across different cultural groups even within the same raci al/ethnic population. The cutoff point of 16 for probable depression has b een used since CES-Ds initial development on European Americans (Andresen et al., 1994 ; Radloff, 1977) and has not changed. Even though we know that some racial/ethni c groups score higher than non-Hispanic Whites, there has been no serious effort at determining what might be an optimal cut-off point for each diverse group. The results of this paper suggest that researchers and clinicians should carefully consider how th e CES-D is used. In addition, the results suggest at least one possible approach that might be used in future work: using DIF findings to provide weighting systems that vary by age, gender, race/ethnicity, and my particular interest in the present study the level of acculturation as well as language preference as a means of adjusting cut-off scores. There are some limitations of this Study 3 that warrant consideration. As mentioned earlier, since the present dissertation study included a small sample of low acculturated Mexican Americans interviewed in English (N = 58), the effects of acculturation within the English-speaking group was not considered. The examination of differences in the responses to the CES-D also needs to extend beyond the Mexican American population and beyond acculturation and language as a sorting factor. The issue of how to deal with uncommonly iden tified DIF items by CFA and IRT remain unsolved at present. Furthe r investigation will be needed to find a source of these discrepant items, and I hope that psychometricians can develop an analytic framework that will ultimately lead to a cl ear answer to this question. In sum, the present dissertation Study 3 highlights the importance of understanding differences in responses to depressive symptoms within the same

PAGE 131

122 racial/ethnic group. When established measures are used to screen and assess for depression in diverse racial/ethnic groups, re searchers should rec ognize the risk that people from different cultur al background may tend to be misclassified due to their different responses. Culturally appropriate screens for depression should be a high priority in health disparitie s research, and this line of research should be extended to other mental health screeni ng tools to generate meaningful comparisons across diverse groups.

PAGE 132

123 CHAPTER FIVE: DISCUSSION Summary of the Study A critical factor in cross-cu ltural or cross-national research is the comparability of instruments across the diverse groups being st udies. This factor becomes even more critical when the intent of an instrument is to detect problem behaviors that can be mitigated through interventions. The purpose of this dissertation was to examine how culture defined broadly, in the case of the present study, to include racial/ethnic, sociodemographic, accultura tion and language variations influences measurement properties of the CES-D, one of the most commonly used screening tools for the detection of depressive symptomatology and of probable depression. Specifically, this dissertation research focused on identifying race/ethnicity-, sociodemographic-, and acculturation and inst rument language-related measurement bias in the CES-D. The samples, consisting of Whites, Blacks, and Mexican Americans, were drawn from two relatively large multistage st udies. One of these studies, the New Haven EPESE, was funded as part of a set of four studies known collectively as the Established Populations for Epidemiologic Studies of the Elderly (EPESE). The other was a study of Mexican Americans that is often referred to as the Hispanic EPESE (H-EPESE) since it included nearly all of the questions asked in the four EPESE studies, as well as questions specific to a Hispanic population. Overview of Findings A series of three substudies were conduc ted in this dissertation. These studies successively examined issues of measurement bi as in greater and grea ter detail. Thus, in

PAGE 133

124 the first study, the focus was simply on the extent to which there were general issues of measurement bias when the three racial/ethni c groups (i.e., Whites, Blacks, and Mexican Americans) were considered. In the sec ond study, the focus was specifically on how sociodemographic characteristics influence m easurement bias within each racial/ethnic group. Effects of age, gender, and educatio nal attainments on measurement bias in the CES-D were tested in Whites, Blacks, and Mexican Americans separately. In the third study, the focus was on Mexican American elders to investigate issues of measurement bias in the CES-D when the level of accu lturation and instrument language were considered. Results demonstrated the utility of the research design. In essence, Study 1 found a lack of measurement equivalence of the CES-D among Mexican Americans in the comparison with both Whites and Blacks. Race/ethnicity-specific items were also identified in Study 1: two interpersonal rela tion items in Blacks and four positive affect items in Mexican Americans. Study 2 iden tified the crucial role of gender and educational attainment on item bias in the CES-D. The interac tion between gender and educational level and race/ethnicity was also found in Study 2: Mexican American women and lower educated Blacks had a greate r predisposition to endorse the crying item. In Study 3, acculturation and instrument language-biased items were identified in Mexican Americans. Study 3 also suggest ed that acculturation-bias was entirely explained by whether the CES-D was administer ed in the English or the Spanish versions.

PAGE 134

125 Implications Methodological Implications To put these results in context, it may be helpful to review some of the major questions facing the field of differential item functioning (DIF). There are at least three major issues needing to be addressed in DI F research. As already addressed in this dissertation research, perhaps the most importa nt issue in DIF research may be how to identify DIF accurately. A second issue in DIF research may be how to explain the source of DIF with regard to meaningful psyc hological constructs such as cultural values, gender role, and social desirabi lity. A third issue in DIF research may be how to deal with DIF once it is identified. Detecting DIF Regarding the accurate identification of DI F, this dissertation study used multiple analytic strategies to detect DIF more accu rately. A few studies have addressed the importance of using multiple procedures to detect DIF in previous research (e.g., Hambleton, 2006; Hambleton, & Rogers, 1989). Using two of the most common DIF methods, CFA and IRT, three substudies report ed DIF items that showed up consistently across the two analytic procedures in order to make conclusions stronger as to which items showed DIF and which ones did not. In all three substudies, results from CFA and IRT gave us a great amount of information to explain identified DIF items in the CES-D. These results imply that this dissertati on makes a significant contribution to DIF detection strategies in the field of gerontology and meas urement research. Explaining DIF

PAGE 135

126 With regard to the explanation of DIF, each study in this dissertation was able to at least partially explain the identified DIF items in terms of sociocultural and psychological constructs. In Study 1, over-e ndorsement of interpersonal relation items among Blacks were in part explained by percep tions of racial discrimination by Blacks. The differential responses to depressive symptoms among Mexican Americans, when compared to Whites and Blacks, were mostly explained by cultural differences such as less hesitation to admit psychological distress and extreme response style on positive affect items. In Study 2, sociocultural diffe rences between groups such as gender role, self-expressiveness related to educational experience, and so cial desirabi lity explained the identified gender and edu cational bias in the CES-D. In Study 3, sociocultural differences within Mexican Americans e xplained the identified acculturation and instrument language DIF. Most importantl y, Study 3 found that acculturation-bias was mostly explained by differences in instrument language (English or Spanish), which may reflect flaws in the instrument translation process or inadequate item formulations (e.g., complex wording or differences in connot ations and social desirability). Dealing with DIF Perhaps one of the most critical issues for future DIF research may be to develop optimal procedures to deal with DIF once it is identified. In other words, what are implications for research if DIF is found? McHorney and Fleishman (2006) contrasted two phases of research. In the measurem ent development phase, instruments can be easily modified by removing items that manife st DIF and replacing them with different items. However, in a later phase of resear ch, researchers may have limited opportunities to modify instruments.

PAGE 136

127 Especially when secondary data are used to detect DIF, replacing items may not be possible. Removing DIF items, while a perfectly acceptable strategy, may have potential negative consequences. Item removal may sacrifice content validity (McHorney & Fleishman, 2006). For exampl e, in Study 1, sixteen items functioned differentially across Mexican Americans and White s. In other words, only four of the twenty CES-D items were identified to functi on similarly across the two groups. It is not clear whether it is possible to fully capture depressive symp toms using four items when the array of symptomatology is great. In this context, removing DIF items may also adversely affect other psychometric properties such as reliability. In cases that researchers identify DI F items but cannot remove these DIF items, adjusting cutoff scores can be used. For exam ple, the magnitude of DIF in addition to the statistical significance of DIF could be identified first in order to estimate parameters that can provide information on effect size. It s hould be noted that onl y some DIF procedures such as DFIT, MIMIC modeling, and logistic regression can estimate parameters that provide information on effect size (McHor ney & Fleishman, 2006). This dissertation study could not provide adjusted cutoff scores due to the DIF procedures used in the present investigation (i.e., CF A and IRT-Likelihood Ratio test). Much more work needs to be done with regard to establishing ge neral guidelines for ga uging the magnitude of DIF effects, which may eventually link to adjusting cutoff scores. In addition, as suggested by McHorney and Fleishman (2006), as research investig ating DIF continues to accumulate, it will be more important to develop recommendati ons or guidelines for how to proceed once DIF has been detected.

PAGE 137

128 Practice Implications This line of research has important implicat ions with regard to how we screen for depression in different cultural groups, with in the United States and cross-nationally. Basically, this line of research shows that although co re depressive symptoms exist across all cultures, different cu ltures conceptualize the prob lem of depressive symptoms in different ways. In certain non-Western cu ltures, there may be no equivalent concepts for depression, but that does not mean depr ession is absent (Marsella, Sartorius, Jablensky, & Fenton, 1985). Depressive symp toms may be experienced, expressed, and responded to in different manners. Thus, this line of research assumes that cultural differences may be found in conceptualization, meaning, and symptom expression of depression across different cultural groups. Therefore, it is important that researchers, clinicians, and practitioners know that depres sion may present differently across different ethnic groups (Minsky, Vega, Miskimen, Gara & Escobar, 2003). According to the Diagnostic and Statistical Ma nual of Mental Disorders, Fourth Edition, Culture can influence the experience and communi cation of symptoms of depression. Underdiagnosis or misdiagnosis can be redu ced by being alert to ethnic and cultural specificity in the presenting complaints of a major depressive episode. (American Psychiatric Association, 1994, p.324). This line of research argues that within the context of cultural variations in the conceptualization, expression, and experien ce of depressive disorders, screening instruments might be unable to identify depr essive disorder for specific cultural groups, and thus adequate measures are needed. In other words, reliable and culturally valid screening instruments to establish equivalence across cultures will be an essential

PAGE 138

129 component of culturally competen t clinical practice, especia lly when differences in the lay conceptualization of de pression are found within a given culture. Culturally appropriate or equivalent measures will help improve interventions for depressive disorders among racially or ethnically divers e older adults. It may not be absolutely necessary to have a different scale for each cultural group. However, some modifications of instruments may increase specificity and sensitivity in detec ting depression for each cultural group. Also, when the CES-D is used to screen for depression, researchers and clinicians must recognize the risk that peopl e from racially or ethnically diverse groups are more likely to be misclassif ied in epidemiological studies. When we screen for depression in differe nt racial/ethnic groups, a combined use of core depressive symptoms and culture-spe cific symptoms may be useful not only for understanding unique cultural phenomena in specific contexts, but also for enabling comparisons across cultures. For example, in an epidemiologic survey in Puerto Rico (Guarnaccia, Canino, Rubio-S tipec, & Bravo, 1993), one item was added to the Diagnositc Interview Schedule (DIS) as king whether the respondent had ever experienced ataque de nervios (nerves attack). The item was introduced in order to obtain information on an idiom of distress indigenous to Puerto Rican culture. The prevalence of this item showed that 13.8% of the sample reported having had a nervous attack at least once in their lifetime and 63% of these met criteria for one of the DIS diagnoses tested, usually depres sive disorder. This approach may be particularly useful because of the following reasons: 1) the core DIS remained unchanged, permitting crosscultural comparisons with basically the sa me instrument and 2) the additions or modifications to the algorithms that were introduced did not alter th e original algorithms,

PAGE 139

130 thus allowing the evaluation of depressive disorders according to the original DIS procedures as well as according to the modi fications introduced in Puerto Rico. This example shows how epidemiological methods can be augmented with culturally specific research strategies without abandoning the ba sic epidemiological goal of across-group generalizability (Canino et al., 1997). When some cultural groups appear to e xperience different depressive symptom clusters than others, clinicia ns and therapists working w ith those groups may need to adjust their own concepts of depression to permit appropriate diagnosis and treatment. As noted before, they may also need to be aware that depression may present differently across different racial/ethnic groups. We a ll need to view depression as a fuzzy concept or a family of overlapping concepts ra ther than as a single disorder that presents in a uniform way (Crockett et al., 2005). Cultu ral competence of clinicians and therapists is of particular importance b ecause culturally aware therapis ts and clinicians know that body language, goal setting, decision-making styles, and assessment tools are all culturally laden. Sue and colleagues (1991), fo r example, found ethnic match of provider and patient showed longer durati on of mental health treatmen t, as well as better patient response to treatment. Researchers and clin icians need to become more culturally sensitive, and culturally informed researchers and clinicians may incorporate some cultural concepts, such as self-orientation, values, family struct ure, and individualismcollectivism orientations. This line of research may provide us eful insights on how we improve the detection of depression among racially or et hnically diverse olde r adults. Mui and colleagues (2002) also suggest that practitioners should be aware that biases stemming

PAGE 140

131 from poor equivalence may produce errone ous estimates of symptoms, and that adjustments such as detection of culturally inappropriate items and changes in cut-off scores, particularly if false positives ar e a concern, may be wa rranted. Bilingual or bicultural practitioners may be well suited to attend to such issues. Limitations and Future Research Despite the abovementioned implications to research and practice, limitations and directions for future research should be noted. As mentioned briefly, one major question is how to deal with the difference in resu lts yielded from the CFA and IRT methods. Throughout the three studies for dissertation, CFA and IRT did not yield identical results. There may be two possible reasons for these di fferences. First, given that it is more difficult to detect small amount of DIF than large amount of DIF, the different results may be more likely when there is relativel y small amount of DIF rather than large amounts. Another possible explanation for the different results is that observed score differences on the CES-D items across diverse cultural subgroups may play less of a role in the calculation for one of the two methods. However, questions still remain as to how to interpret uncommonly detected DIF items. Clearly, more work needs to be done in health disparities and measurement research to develop clear guidelines to interpret and deal with the different results from various DIF procedures. One challenging part in DIF research is that none of the DIF analytic procedures produce identical results (e.g., Hambleton, 2006). Hambleton also suggested that the more different the procedures, the more likely they will be to produ ce different results. For example, Crane and colleagues (2006) summarized the various DI F results with the Mini-Mental State Examination (MMSE) and found some differences with the same DIF

PAGE 141

132 procedure using interchangeable software programs (e.g., MULTILOG and Parscale). This is why a number of psychometricians are recommending applying multiple DIF detection approaches for more accurate DIF results and for more definitive information concerning which items are showing DIF and which ones are not (e.g., Hambleton, 2006; Wang & Russell, 2005). As mentioned earlier, it has been suggested to use the items that reveal consistent DIF across different me thods (Hambleton, 2006). Based on the results from this dissertation study, I strongly believe that researchers should also apply multiple analytic procedures and compare results for future DIF research in order to provide confidence for findings. It may be also help ful to replicate the DIF findings from one study to another with different DIF analytic procedures. Another limitation is that this disserta tion (Study 1 and Study 2) did not control the potential time and cohort differences be tween the samples from the New Haven EPESE (collected in 1981-1982) and Hispanic EPESE (collected in 1993-4). The over ten year differences between those two samp les may have led to differential response patterns. In addition, Study 1 and 2 included a relatively small sample of Blacks. Both limitations underscore the importance of appropriate nationally representative datasets that can provide enough information to capture racial/ethnic dispar ities in health for future research. Future research should pay more a ttention to identif ying and explaining interaction effects between DIF items in depression screening tools and various exogenous factors such as gender, race/ethni city, and educational attainment. These interaction effects have not been considered in previous research. Results from Study 2 and Study 3 showed several nonuniform DIF ite ms in the CES-D suggesting interaction

PAGE 142

133 effects. For example, in Study 2, an interac tion effect between a c rying spells item and gender was found. Unlike previous findings showing womens higher endorsement on the crying spells item, this interaction eff ect in Study 2 showed evidence that men were more likely to cry than women when their de pressive symptoms were severe. Although this finding should be revisited and further investigation on the c rying item with different samples is warranted, this is very meaningful because it is the first to be addressed in depression research using two sophisticated modern statistical methods (CFA and IRT). Given that identified inter action effects in this dissertation provided research and practice implications, future research should focus on replicating these current findings to ensure and identifying ot her meaningful interaction effects between DIF items in depression screening tools and other unknown exogenous variables. Most importantly, future research should focus on the optimal depression screen and optimal cut-score for the CES-D. Alt hough this dissertation study identified culturebiased items in the CES-D, optimal cutoff sc ores for depression screening tools have not been identified for racially or ethnically dive rse older adults, or ind eed for individuals of any age or gender. Because cut-off scores ar e not 100% sensitive with respect to the gold standard criterion variable against wh ich they were validated, there may be misclassification errors (Teresi & Holmes, 2002). These errors can be compounded in diverse racial/ethnic samples. Previous research has reporte d that when using the same cut-off scores of the CES-D, prevalence rates across racial/e thnic groups varied dramatically, ranging from 3.5% (Germans) to more than 30% (Korean Americans). These findings show that because culturally non-sensitive items may result in more false positives as well as false negatives, cut-off scores for the CES-D will result in higher or

PAGE 143

134 lower prevalence estimates of depression. Thus, at a minimum optimal cut-off scores for the CES-D should be identified to screen raci ally/ethnically diverse older adults with clinically significant depression. Final Thoughts Finally, when we consider culture an d depressive symptoms, the following questions are helpful to refrain from im posing diagnostic categories and criteria developed in one culture on another (Tanaka-Matsumi, 2001): 1) How is depression defined by the profession and by the indigenous culture?; 2) What words and concepts are used to describe depressi on?; 3) Are different words a nd concepts equivalent?; 4) What aspects are known to be culturally sim ilar and variable?; 5) What would account for cultural similarities and differences?; and 6) How does one communicate depression to others in the same culture? These questions immediately call for te sting and establishing the cultural validity of diagnos tic categories and their criteria Application of culturally sensitive and valid assessment of depr ession will produce culturally competent prevention and treatment for depression, and this will be eventually tailored to meet the needs of specific ethnic and cu ltural groups. The field of cros s-cultural depression is in great need of evaluating the utility of culture-accommodating assessment and treatment practice. In sum, we must be careful in making comparisons of depressive symptoms across diverse racial/ethnic groups We learned that there ar e universal core depressive symptoms similar across all cultures, but depressive symptoms are experienced, expressed, and responded to in different manners across cultures. If we want to optimally understand, assess, and diagnose depression, we need to take into account the cultural

PAGE 144

135 contexts in which it operates. It is impossi ble for us to create culture-free depression measures now. But, by improving existing depr ession measures, we can make them more culturally informed or culturally appropriate in different racial/ethnic groups, and that should be our goal.

PAGE 145

136 REFERENCES Alwin, D. F., & Wray, L. A. (2005). A lifespan developmental perspective on social status and health. Journal of Gerontology: Social Sciences, 60B (Special Issues II), 7-14. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4 th eds). Washington, DC: Author. Andresen, E. M., Carter, W. B., Malmgren, J. A., & Patrick, D. L. (1994). Screening for depression in well older adults: Evalua tion of a short form of the CES-D. American Journal of Preventive Medicine, 10 77-84. Angel, R., & Guarnaccia, P. J. (1989) Mind, body, and culture: Somatization among Hispanics. Social Sciences and Medicine, 28 1229-1238. Azocar, F., Aren, P., Miranda, J., & Mu oz, P. F. (2001). Differential item functioning in a Spanish translation of th e Beck Depression Inventory. Journal of Clinical Psychology, 57, 355-365. Baker, F. M., Velli, S. A., Freidman, J., & Wiley, C. (1995). Screening tests for depression in older Black versus White patients. American Journal of Geriatric Psychiatry, 3 43-51. Beals, J., Manson, S. M., Whitesell, N. R., M itchell, C. M., DKovins, D. K., Simpson, S., & Spicer, P. (2005). Prevalence of major depressive episode in two American Indian reservation populations: Unexpected findings with a stru ctured interview. American Journal of Psychiatry, 162 1713-1722.

PAGE 146

137 Beekman, A. T., Deeg, D. J., VanLimbeek, J., Braam, A. W., DeVries, M. Z., & VanTilburg, W. (1997). Criterion validity of the Center for Epidemiologic Studies Depression scale (CES-D): results from a community-based sample of older subjects in the Netherlands. Psychological Medicine, 27 231-235. Berkman, L. F., Berkman, C. S., Kasl, S., Freeman, D. H., Leo, L., & Ostfeld, A. M., Cornoni-Huntley, J., & Brody, J. A. (1986). Depressive symptoms in relation to physical health and functioning in the elderly. American Journal of Epidemiology, 124, 372-388. Berry, J., & Kim, U. (1988). Acculturation and mental health. In P. Dasenm, J. Berry & N. Sartorius (Eds.), Health and cross-cultural psychology (pp. 207-238), Newbury Park, CA: Sage Publications. Bettencourt, B. A., & Sheldon, K. (2001). Soci al roles as mechanisms for psychological need satisfaction within groups. Journal of Personality and Social Psychology, 81, 1131-1143. Bingenheimer, J. B., Raudenbush, S. W., Leventhal, T., & Brooks-Gunn, J. (2005). Measurement equivalence and differentia l item functioning in family psychology. Journal of Family Psychology, 19 441-455. Blazer, D. G., Landerman, L. R., Hays, J. C., Simonsick, E. M., & Saunders, W. B. (1998). Symptoms of depression among community-dwelling elderly AfricanAmerican and White older adults. Psychological Medicine, 28 1311-1320. Bravo, M. (2003). Instrument development: Cultural adaptations for ethnic minority research. In G. Bernal, J. E. Trimble, A. K. Burlew, & F. T. L. Leong (Eds.),

PAGE 147

138 Handbook of racial and et hnic minority psychology (pp.220-236). Thousand Oaks, CA: Sage Publications. Breslau, J., Aguilar-Gaxiola, S., Kendler, K., Su, M., Williams, D., & Kessler, R. C. (2005). Specifying race-ethnic differences in risk for psychiat ric disorder in a USA national sample. Psychological Medicine, 35 1-12. Briones, D. F., Heller, P. L., Chalfant, P., Roberts, A. E., Aguirre-Hauchbaum, S. F., & Farr, W. F. (1990). Socioeconomic status ethnicity, psychological distress, and readiness to utilize a mental health facility. American Journal of Psychiatry, 147, 10, 1333-1340. Brislin, R. W., Lonner, W. & Thorndike, R. (1973). Cross-cultural methods. New York: Wiley. Byrne, B. M., & Watkins, D. (2003). The i ssue of measurement invariance revisited. Journal of Cross-Cultural Psychology, 34 155-175. Cabassa, L. J. (2003). Measuring acculturation: Where we are and where we need to go. Hispanic Journal of Behavioral Sciences, 25, 127-146. Callahan, C. M., & Wolinsky, F. D. (1994). The effect of gender and race on the measurement properties of the CES-D in older adults. Medical Care, 32 341-356. Canino, G., Lewis-Fernandez, R., & Bravo, M. (1997). Methodological challenges in cross-cultural mental health research. Transcultural Psychiatry, 34 163-184. Chapleski, E. E., Lamphere, J. K., Kaczyns ki, R., Lichtenberg, P. A., & Dwyer, J. W. (1997). Structure of a depression meas ure among American Indian elders: Confirmatory factor analysis of the CES-D scale. Research on Aging, 19, 462-485.

PAGE 148

139 Chvez, L. M., & Canino, G. (2005, April). Toolkit on translating and adapting instruments Retrieved February 6, 2006, from the Evaluation Center at Human Services Research Institute Web site :http://www.tecathsri.org/pub_pickup/pn/pn54.pdf Cheng, A. T. A. (2001). Case definition and culture: Are peop le all the same? British Journal of Psychiatry, 179, 1-3. Cheng, A. T. A., & Williams, P. (1986). The design and development of a screening questionnaire (CHQ) for use in community studies of mental di sorders in Taiwan. Psychological Medicine, 16 415-422. Chiriboga, D. A. (2004). Some thoughts on the measurement of acculturation among Mexican American Elders. Hispanic Journal of Behavioral Sciences, 26 274-292. Chiriboga, D. A., Black, S. A., Aranda, M., & Markides, K. (2002). Stress and depressive symptoms among Mexican American elderly. Journal of Gerontology: Psychological Sciences, 57B P559-P568. Chiriboga, D. A., Jang, Y., Banks, S., & Kim, G. (2007). Acculturation and its effect on symptom structure in a sample of Mexican American elders. Hispanic Journal of Behavioral Sciences, 29, 83-100. Cho, M. J., Nam, J. J., & Suh, G. H. (1998) Prevalence of sympto ms of depression in nationwide sample of Korean adults. Psychiatry Research, 81 341-352. Clarke, I. (2000). Extreme response style in cross-cultural research: An empirical investigation. Journal of Social and Be havioral Personality, 15, 137-152.

PAGE 149

140 Cole, S. R., Kawachi, I., Maller, S. J., & Berkman, L. F. (2000). Test of item-response bias in the CES-D scale: Experien ce from the New Haven EPESE study. Journal of Clinical Epidemiology, 53 285-289. Cornoni-Huntley, J., Blazer, D. G., Lafferty, M. E., Everett, D. F., Brock, D. B., & Farmer, M. E. (1990). Established Populations for Epidemiologic Studies of the Elderly, Volume II: Resource Data Book. NIH Publication No. 90-495. Cornoni-Huntley, J., Ostfeld, A. M., Taylor, J. O., Wallace, R. B., Blazer, D., Berkman, L. F., Evans, D. A., Kohout, F. J., Lemke, J. H., Scherr, P A., & Korper, S. P. (1993). Established Populations for Epidem iologic Studies of the Elderly: Study design and methodology. Aging Clinical and Expe rimental Research, 5 27-37. Coyne, J. C., & Marcus, S. C. (2006). Health disparities in care for depression possibly obscured by the clinical significance criterion. The American Journal of Psychiatry, 163 1577-1579. Crane, P. K., Gibbons, L. E., Jolley, L., & Van Belle, G. (2006). Differential item functioning analysis with ordinal logist ic regression techniques: DIFdetect and difwithpar. Medical Care, 44 (Suppl. 3), S115-S123. Crockett, L. J., Randall, B. A., Shen, Y.-L ., Russel, S. T., & Driscoll, A. K. (2005). Measurement equivalence of the Center for Epidemiological Studies Depression Scale for Latino and Anglo a dolescents: A national study. Journal of Consulting and Clinical Psychology, 73 47-58. Cuellar, J., Harris, L. C., & Jasso, R. (1980). An acculturation scale for Mexican American normal and clinical populations. Hispanic Journal of Behavioral Sciences, 2 199-246.

PAGE 150

141 Drasgow, F., Levine, M. V., Tsien, S., Williams, B., & Mead, A. D. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143-165. Dunlop, D. D., Song, J., Lyons, J. S., Manheim, L. M., & Chang, R. W. (2003). Racial/ethnic differences in rates of depression among pre-retirement adults. American Journal of Public Health, 93 1945-1952. Fabrega, H. (1990). Hispanic mental health research: A case for cultural psychiatry. Hispanic Journal of Behavioral Sciences, 12, 339-365. Finch, B. K., Kolody, B., & Vega, W. A. ( 2000). Perceived discrimination and depression among Mexican-Origin adults in California. Journal of Health and Social Behavior, 41 295-313. Foley, K. L., Reed, P. S., Mutran, E. J., & DeVellis, R. F. (2002). Measurement adequacy of the CES-D among a sample of older African-Americans. Psychiatry Research, 109, 61-69. Gatz, M., & Hurwicz, M-L. (1990). Are old people more depr essed? Cross-sectional data on Center for Epidemiological Studies Depression Scale (CES-D) factors. Psychology and Aging, 5, 284-290. Golding, J., Aneshensel, C., & Hough, R. (1991). Responses to depression scale items among Mexican-Americans and non-Hispanic Whites. Journal of Clinical Psychology, 47, 61-75. Gonzalez, H. M., Haan, M. N., & Hinton, L. (2001). Acculturation a nd the prevalence of depression in older Mexican Americans: Baseline results of the Sacramento Area Latino Study on Aging. Journal of the American Geriatrics Society, 49 948-953.

PAGE 151

142 Gonzalez-Roma, V., Hernandez, A. & Gomez-Be nito, J. (2006). Power and Type I error of the mean and covariance structure an alysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41 29-53. Gregorich, S. E. (2006). Do self-report inst ruments allow meaningful comparisons across diverse population groups?: Testing measurement invariance using the confirmatory factor analysis framework. Medical Care, 44 (Suppl. 3), S78-S94. Guarnaccia, P. J., Angel, R., & Worobey, J. L. (1989). The factor structure of the CES-D in the Hispanic Health and Nutrition Examination Survey: the influences of ethnicity, gender, and language. Social Science and Medicine, 29, 85-94. Guarnaccia, P. J., Canino, G., Rubio-Stipec, M., & Bravo, M. (1993). The prevalence of ataque de nervios in the Puerto Rico di saster study: The ro le of culture in psychiatry epidemiology. Journal of Nervous and Mental Disease, 181 157-165. Haberman, P. (1970). Ethnic differences in ps ychiatric symptoms reported in community surveys. Public Health Reports, 85 495-502. Hambleton, R. K. (2006). Good practices fo r identifying different ial item functioning. Medical Care, 44 (Suppl. 3), S182-S188. Hambleton, R. K., & Rogers, H. J. (1989). Detecting potentially biased test items: Comparison of IRT area and Mantel-Haenszel statistic. Applied Measurement in Education, 2 313-334. Hammen, C., & Padeskym, C. (1977). Sex diffe rences in the expression of depressive responses on the Beck Depression Inventory. Journal of Abnormal Psychology, 36, 609-614.

PAGE 152

143 Haringsma, R., Engels, G. I., Beekman, A. T. F., & Spinhoven, P. (2004). The criterion validity of the Center for Epidemiological Studies Depression Scale (CES-D) in a sample of self-referred elders with depressive symptomatology. International Journal of Geriatric Psychiatry, 19 558-563. Hays, J. C., Landerman, L. R., George, L. K., Flint, E. P., Koening, H. G., Land, K. C. et al. (1998). Social correlates of the dime nsions of depression in the elderly. Journal of Gerontology: Social Sciences, 53, 31-39. Hazuda, H. P, Stern, M. P., & Haffner, S. M. (1988). Accultura tion and assimilation among Mexican Americans: Scal es and population-based data. Social Science Quarterly, 69 687-705. Hoppe, S. K., & Heller, P. L. (1975). Aliena tion, familism and the utilization of health services by Mexican-Americans. Journal of Health and Social Behavior, 16, 304314. Hui, C. H., & Triandis, H. C. (1985). Measurement in cross-cultural psychology. Journal of Cross-Cultural Psychology, 16 131-152. Hui, C., & Triandis, H. (1989). Effects of culture and response format on extreme response style. Journal of Cross-Cultural Psychology, 20 296-309. Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory: Applications to psychological measurement. Homewood, IL: Dow Jones Irwin. Inoba, A., Thoits, P. A., Ueno, K., Gove, W. R., Evenson, R. J., & Sloan, M. (2005). Depression in the United St ates and Japan: Gender, marital status, and SES patterns. Social Sciences and Medicine, 61, 2280-2292.

PAGE 153

144 Irwin, M, Artin, K. H., & Oxman, M. N. ( 1999). Screening for depression in the older adult: Criterion validity of the 10-it em Center for Epidemiologic Studies Depression Scale (CES-D). Archives of Internal Medicine, 159 1701-1704. Iwata, N., & Buka, S. (2002). Race/ethnicity and depressive symptoms: a crosscultural/ethnic comparison among university students in East Asia, North and South America. Social Science and Medicine, 55, 2243-2252. Iwata, N., Turner, R. J., & Lloyd, D. A. ( 2002). Race/ethnicity and depressive symptoms in community-dwelling young adults: a di fferential item functioning analysis. Psychiatry Research, 110 281-289. Jang, Y., Kim, G., & Chiriboga, D. A. (2005). Acculturation and manifestation of depressive symptoms among Kor ean American older adults. Aging and Mental Health, 9, 500-507. Jang, Y., Kim, G., Chiriboga, D. A., & Ki ng-Kallimanis, B. (2007). A bidimensional model of acculturation for Korean American older adults. Journal of Aging Studies, 21, 267-275. Janssen, J., Beekman, A. T. F., Comijs, H. C., & Deeg, D. J. H. (2006). Late-life depression: The differences between earlyand late-onset illness in a communitybased sample. International Journal of Geriatric Psychiatry, 21 86-93. Jenkins, J. H. (1988). Ethnopsychiatric interp retations of schizophr enic illness: The problem of nervios within Mexican-descent families. Culture, Medicine and Psychiatry, 12 303-331. Johnson, T. P. (1998). Approaches to equivale nce in cross-cultural and cross-national survey research. ZUMA-Nachrichten Spezial, 3 1-40. Retrieved February 20,

PAGE 154

145 2006, from http://www.gesis.org/Publikationen/Zeitschriften/ZUMA_Nachrichten _spezial/documents/zns pezial3/znspez3_01_Johnson.pdf Johnson, T., & van de Vijver, F. (2003). Social desirability in crosscultural research. In F. van de Vijver & P. Mohler (Eds.), Cross-Cultural Survey Methods. (pp.193202). New York, NY: Wiley. Jones, R. N. (2006). Identification of m easurement differences between English and Spanish language versions of the Mini-Mental State Examination: Detecting differential item functioning using MIMIC modeling. Medical Care, 44 (Suppl. 3), S124-S133. Jreskog, K. G., & Srbom, D. (2006). LISREL 8.8 for windows [computer software]. Lincolnwood, IL: Scientific Software International. Kaplan, H. I., & Sadock, B. J. (2003). Synopsis of Psychiatry: Behavioral Science/ Clinical Psychiatry (9 th ed.). Philadelphia, PA: Lipincott Williams & Wilkins. Kessler, R. C., & Ustun, T. B. (2004). Th e World Mental Health (WMH) Survey Initiative Version of the World H ealth Organization (WHO) Composite International Diagnosti c Interview (CIDI). International Journal of Methods in Psychiatric Research, 13 93-121. Kessler, R. C., Foster, C., Webster, P. S., & House, J. S. (1992). The relationship between age and depressive symptoms in two national surveys. Psychology and Aging, 7, 119-126. Kim, G., Chiriboga, D. A., & Jang, Y. (2007). Measurement equivalence of the Center for Epidemiological Studies Depressi on Scale among White, Black, and Mexican

PAGE 155

146 American respondents in the New Haven and the Hispanic EPESE Manuscript submitted for publication. Kim, G., Jang, Y., & Chiriboga, D. A. (2006). Response to the CES-D short form among Koreans and Korean-Americans: Variatio ns by age group and the level of acculturation. Unpublished manuscript. Kleinman, A. (2004). Cu lture and depression. New England Journal of Medicine, 351 951-953. Krause, N. (2007). Self-expression and depressive symptoms in late life. Research on Aging, 29, 187-206. Krause, N., & Goldenhar, L. M. (1992). A cculturation and psychological distress in three groups of elderly Hispanics. Journal of Gerontology, 47, S279-S288. Krause, N., & Liang, J. (1992). Cr oss-cultural variations in de pressive symptoms in later life. International Psychiatrics, 4 (Suppl. 2), 185-202. Krause, N., & Liang, J. (1993). Stress, soci al support, and psychological distress among the Chinese elderly. Journal of Gerontology: Psychological Sciences, 48 282-291. Krause, N., & Markides, K. S. (1985). Em ployment and psychological well-being in Mexican American women. Journal of Health and Social Behavior, 26 15-26. Lee, E.E., & Farran, C. J. (2004). Depre ssion among Korean, Korean American, and Caucasian American family caregivers. Journal of Transcultural Nursing, 15 1825 Liang, J. (2002). Assessing cross-cultural co mparability in mental health among older adults. In J. H. Skinner, J. A. Teresi, D. Holmes, S. M. Stahl, & A. L. Stewart

PAGE 156

147 (Eds.), Multicultural measurement in older populations (pp.11-21). New York: Springer Publishing Company. Liang, J., Van Tran, T., Krause, N., & Markid es, K. (1989). Generational differences in the structure of the CES-D scale in Mexican Americans. Journal of Gerontology: Social Sciences, 44 S110-S120. Lord, F. M. (1980). Applications of item response theo ry to practical testing problems. Hillside, NJ: Erlbaum. MacCallum, R. C., & Austin, J. T. (2000). A pplications of structural equation modeling in psychological research. Annual Review of Psychology, 51, 201-226. Mackinnon, A., McCallum, J., Andrews, G ., & Anderson, I. (1998). The Center for Epidemiological Studies Depression Scale in older community samples in Indonesia, North Korea, Myanmar, Sri Lanka, and Thailand. Journal of Gerontology: Psychological Sciences, 53B P343-P352. Madianos, M. G., Gournas, G., & Stefanis C. N. (1992). Depressive symptoms and depression among elderl y people in Athens. Acta Psychiatrica Scandinavia, 86 320-326. Manson, S. M., Shore, J. H., & Bloom, J. D. (1985). The depressive experience in American Indian communities: A challenge for psychiatric theory and diagnosis. In A. Kleinman & B. Good (Eds.), Culture and depression (pp. 331-368). Berkeley: University of California Press. Marin, G., Gamba, R. J., & Marin, B. V. (1992). Extreme response style and acquiescence among Hispanics. Journal of Cross-Cultural Psychology, 23 498509.

PAGE 157

148 Marsella, A. J., Sartorius, N., Jablensky, A., & Fenton, F. R. (1985). Cross-cultural studies of depressive disorders: An overv iew. In A. Kleinman & B. Good (Eds.), Culture and depression (pp. 300-323). Los Angeles, CA: University of California Press. Markides, K. S., Liang, J., & Jackson, J. S. (1990). Race, ethnicity, and aging: Conceptual and methodological issues. In R. H. Binstock & L. K. George (Eds.), Handbook of aging and the social sciences (3 rd ed., pp. 112-129). San Diego, CA: Academic Press. McCallum, J., Mackinnon, A., Simons, L., & Simons, J. (1995). Measurement properties of the Center for Epidemiological Studies Depression Scale: An Australian community study of aged persons. Journal of Gerontology: Social Sciences, 50B 182-189. McHorney, C. A., & Fleishman, J. A. (2006). Assessing and understanding measurement equivalence in health outcome measures. Medical Care, 44 (Suppl. 3), S205-S210. McHorney, C. A., & Fleishman, J. A. (2006). Assessing and understanding measurement equivalence in health outcome measures. Medical Care, 44 (Suppl. 3), S205-S210. Meade, A. W., & Lautenschlager, G. J. (2004). A comparison of item response theory and confirmatory factor analytic me thodologies for establishing measurement equivalence/invariance. Organizational Research Methods, 7 361-388. Mechanic, D. (1978). Sex, illness, illness be havior, and the use of health services. Social Sciences and Medicine, 12 207-214. Medina-Mora, M. E., Borges, G., Lara, C., Benj et, C., Blanco, J., Fleiz, C., et al. (2005). Prevalence, service use, and demogr aphic correlates of 12-month DSM-IV

PAGE 158

149 psychiatric disorders in Mexico: Results from the Mexican National Comorbidity Survey. Psychological Medicine, 35, 1-11. Miller, T. Q., Markides, K. S., & Black, S. A. (1997). The factor structure of the CES-D in two surveys of elderly Mexican Americans. Journal of Gerontology: Social Sciences, 52B 259-269. Minsky, S., Vega, W., Miskimen, T., Gara, M., & Escobar, J. (2003). Diagnostic patterns in Latino, African American, and Europ ean American psychiatric patients. Archives of General Psychiatry, 60 637-644. Mirowsky, J., & Ross, C. E. (2003). Education, social status, and health. New York: Aldine de Gruyter. Morris, M. W., Leung, K., Ames, D., & Lickel, B. (1999). Views from inside and outside: Integrating emic and etic insi ghts about culture and justice judgment. Academy of Management Review, 24 781-796. Mui, A. C., Burnette, D., & Chen, L. M. ( 2002). Cross-cultural assessment of geriatric depression: A review of the CES-D and GDS. In J. H. Skinner, J. A. Teresi, D. Holmes, S. M. Stahl, & A. L. Stewart (Eds.), Multicultural measurement in older populations (pp.147-177). New York: Springer Publishing Company. Mui, A. C., Burnette, D., & Chen, L. M. ( 2002). Cross-cultural assessment of geriatric depression: A review of the CES-D and GDS. In J. H. Skinner, J. A. Teresi, D. Holmes, S. M. Stahl, & A. L. Stewart (Eds.), Multicultural measurement in older populations (pp.147-177). New York: Springer Publishing Company. Myers, H., & Rodriguez, N. (2003). Acculturati on and physical health in racial and ethnic minorities. In K. M. Chun, P. B. Organista, & G. Marin (Eds.), Acculturation:

PAGE 159

150 Advances in theory, measurement, and applied research (pp. 163-185). Washington, DC: American Psychological Association. Nathanson, C. A. (1975). Illness and the feminine role: a theoretical review. Social Sciences and Medicine, 9, 57-62. Ng, K.-Y., & Earley, P. C. (2006). Culture + Intelligence: Old constructs, new frontiers. Group and Organization Management, 31, 4-19. Nguyen, H. T., Clark, M., & Ruiz, R. J. (2007) Effects of acculturation on the reporting of depressive symptoms am ong Hispanic pregnant women. Nursing Research, 56, 217-223. Nguyen, H. T., Kitner-Triolo, M., Evans, M. K., & Zonderman, A. B. (2004). Factorial invariance of the CES-D in low socioeconomic status African Americans compared with a nationally representative sample. Psychiatry Research, 126 177187. Norris, M. P., Arnau, R. C., Meagher, M. W., & Bramson, R. (2005). The efficacy of somatic symptoms in assessing depression in older primary care patients. Clinical Gerontologist, 27 43-57. Papassotiropoulos, A., & Heun, R. (1999). Scr eening for depression in the elderly: A study on misclassification by screening instruments and improvement of scale performance. Progress in Neuro-Psychopharmacogy and Biological Psychiatry, 23, 431-446. Parker, G., Gladstone, G. L., & Chee, K. T. (2001). Depression in the planets largest ethnic group: The Chinese. American Journal of Psychiatry, 158, 857-864.

PAGE 160

151 Patel, V., & Mann, A. (1997). Etic and emic cr iteria for non-psychotic mental disorder: The study of the CISR and care provider assessment in Harare. Socical Psychiatry and Psychiatric Epidemiology, 32 84-89. Patel, V., Abas, M., Broadhead, J., Todd, C., & Reeler, A. (2001). Depression in developing countries: Lessons from Zimbabwe. British Medical Journal, 322 482-484. Patel, V., Simunyu, E., Gwanzura, F., Le wis, G., & Mann, A. (1997). The Shona Symptom Questionnaire: The developmen t of an indigenous measure of nonpsychotic mental disorder in Harare. Acta Psychiatrica Scandinavica, 95 469-475. Perczek, R., Caver, C. S., Price, A. A ., & Pozo-Kaderman, C. (2000). Coping, mood, and aspects of personality in Spanish transl ation and evidence of convergence with English versions. Journal of Personality Assessment, 74 63-87. Perkins, A. J., Stump, T. E., Monahan, P. O., & McHorney, C. A. (2006). Assessment of differential item functioning for dem ographic comparisons in the MOS SF-36 health survey. Quality of Life Research, 15, 331-348. Perreira, K. M., Deeb-Sossa, N., Harris, K. M., & Bollen, K. (2005). What are we measuring? An evaluation of the CESD across race/ethnicity and immigrant generation. Social Forces, 83 1567-1602. Pike, K. L. (1954 ). Language in relation to a unified theory of the structure of human behavior. Glendale, CA: Summer Institute of Linguistics. Posner, S. F., Stewart, A. L., Marn, G., & Pr ez-Stable, E. J. (2001). Factor variability of the Center for Epidemiological Studies Depression Scale (CES-D) among urban Latinos. Ethnicity and Health, 6 137-144.

PAGE 161

152 Radloff, L. (1977). The CES-D Scale: a self-re port depression scale for research in the general population. Applied Psychological Measurement, 1 385-401. Rait, G., & Burns, A. (1998). Screening for depression and cognitive impairment in older people from ethnic minorities. Age and Ageing, 27 271-275. Raju, N., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517-529. Ramirez, M., Ford, M. E., Stewart, A. L., & Teresi, J. A. (2005). Measurement issues in health disparities research. Health Services Research, 40, 1640-1657. Ren, X. S., Amick, B. C., & Willimans, D. R. (1999). Racial/ethnic di sparities in health: The interplay between discrimina tion and socioeconomic status. Ethnic Disparities, 9, 151-165. Roberts, R. E. (1980). Reliability of the CES-D scale in different ethnic contexts. Psychiatry Research, 2 125-134. Roberts, R. E., Rhoades, H. M., & Vernon, S. W. (1990). Using the CES-D scale to screen for depression and anxiety: Effects of language and ethnic status. Psychiatry Research, 31, 69-83. Roberts, R. E., Vernon, S. W., & Rhoades, H. M. (1989). Effects of language and ethnic status on reliability and validity of the Center for Epidemiologic StudiesDepression Scale with psychiatric patients. Journal of Nervous and Mental Disease, 177, 581-592. Robison, J., Curry, L., Gruman, C., Covington, T., Gaztambide, S., & Blank, K. (2003). Depression in later-life Puerto Rican prim ary care patients: The role of illness,

PAGE 162

153 stress, social integr ation, and religiosity. International Psychogeriatrics, 15 239251 Ross, C. E., & Mirowsky, J. (1984). Components of depressed mood in married men and women: the Center for Epidemio logic Studies Depression Scale. American Journal of Epidemiology, 119, 997-1004. Saez-Santiago, E., & Bernal, G. (2003). Depression in ethnic minorities: Latinos and Latinas, African Americans, Asian Americans, and Native Americans. In G. Bernal, J. E. Trimble, A. K. Burlew, & F. T. L. Leong (Eds.), Handbook of racial and ethnic minority psychology (pp.401-428). Thousand Oaks, CA: Sage Publications. Samejima, F. (1969). Estimation of latent ab ility using a response pattern of graded scores. Psychometrika Monograph, 17 1-100. Schaffer, B. S., & Riordan, C. M. (2003). A review of cross-cultural methodologies for organizational research: A best-practice approach. Organizational Research Methods, 6, 169-215. Shafer, A. B. (2006). Meta-analysis of th e factor structures of four depression questionnaires: Beck, CES-D, Hamilton, and Zung. Journal of Clinical Psychology, 62, 123-146. Shamasundar, C., Krishna Murthy, S., Prakash, O., Prabhakar, N., & Subbakrishna, D. (1986). Psychiatric morbidity in a general practice in an Indian city. British Medical Journal, 292 1713-1715.

PAGE 163

154 Simon, G. E., Goldberg, D. P., Korff, M. V., & Ustun, T. B. (2002). Understanding crossnational differences in depression prevalence. Psychological Medicine, 32 585594. Somervell, P. D., Beals, J., Kinzie, J. D., Leung, P., Boehnlein, J., Matsunaga, D., & Manson, S. M. (1993). Use of the CES-D in an American Indian village. Culture, Medicine and Psychiatry, 16 503-517. Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292-1306. Statistical Solutions (2001). Solas 3.0 for missing data analysis: User reference MANUAL. Crosses Green, Cork: Ireland: Statistical Solutions, Ltd. Sternberg, R. J. (2004). Culture and intelligence. American Psychologist, 59 325-338. Stewart, A. L., & Npoles-Springer, A. M. (2003). Advancing health disparities research: can we afford to ignore measurement issues? Medical Care, 41 1207-1220. Stinivasan, T. N., & Suresh, T. R. (1990). Non-specific symptoms and screening of nonpsychotic morbidity in primary care Indian Journal of Psychiatry, 32 77-82. Stommel, M., Given, B. A., Given, C. W., Ka laian, H. A., Schulz, R., & McCorkle, R. (1993). Gender bias in the measurement properties of the Center for Epidemiological Studies Depression Scale (CES-D). Psychiatry Research, 49 239-250. Stout, W. (1990). A new item response theory modeling approach with applications to unidimensionality assessme nt and ability estimation. Psychometrika, 55 293-325.

PAGE 164

155 Sue, S., Fujino, D. C., Hu, L., & Takeuchi, D. T. (1991). Community mental health services for ethnic minority groups: A test of the cultural responsiveness hypothesis. Journal of Consulting and Clinical Psychology, 59 533-540. Sulaiman, S., Bhugra, D., & De Silva, P. (2001). Perception of depression in a community sample in Dubai. Transcultural Psychiatry, 38 201-218. Sundquist, J., & Winkleby, M. (2000). Count ry of birth, acculturation status and abdominal obesity in a national sample of Mexican-American women and men. International Journal of Epidemiology, 29, 470-477. Suthers, K. M., Gatz, M., & Fiske, A. ( 2004). Screening for depr ession: A comparative analysis of the 11-item CES-D and the CIDI-SF. Journal of Mental Health and Aging, 10 209-219. Swenson, C. J., Baxter, J., Shetterly, S. M., Scarbro, S. L., & Hamman, R. F. (2000). Depressive symptoms in Hi spanic and Non-Hispanic White rural elderly: the San Luis Valley Health and Aging Study. American Journal of Epidemiology, 152 1048-1054. Tanaka-Matsumi, J. (2001). Abnormal psychology and culture. In D. Matsumoto (Ed.), The handbook of culture and psychology (pp.265-286). New York: Oxford University Press. Teresi, J. A. (2002). Statistic al methods for examination of differential item functioning (DIF) with applications to cross-cultural measurement of functional, physical and mental health. In J. H. Skinner, J. A. Teresi, D. Holmes, S. M. Stahl, & A. L. Stewart (Eds.), Multicultural measurement in older populations (pp.23-34). New York: Springer Publishing Company.

PAGE 165

156 Teresi, J. A. (2006). Overview of qualit ative measurement methods: equivalence, invariance, and differential item f unctioning in health applications. Medical Care, 44(Suppl. 3), S39-S49. Teresi, J. A., & Holmes, D. (2002). Some methodological guidelines for cross-cultural comparisons. In J. H. Skinner, J. A. Teresi, D. Holmes, S. M. Stahl, & A. L. Stewart (Eds.), Multicultural measurement in older populations (pp.3-10). New York: Springer Publishing Company. Thissen, D. (1991). MULTILOG users guide (Version 6.0). Mooresvi lle, IN: Scientific Software. Triandis, H. C., & Brislin, R. W. (1984). Cross-cultural psychology. American Psychologist, 39 1006-1016. Trimble, J. E. (2003). Introduction: Social chan ge and acculturation. In K. M. Chun, P. B. Organista, & G. Marin (Eds.), Acculturation: Advances in theory, measurement, and applied research (pp. 3-13). Washington, DC: American Psychological Association. Turvey, C. L., Wallace, R. B., & Herzog, R. (1999). A revised CES-D measure of depressive symptoms and a DSM-based m easure of major depressive episodes in the elderly. International Psychogeriatrics, 11 139-148. U.S. Department of Health and Human Services (2001). Mental health: Culture, race, and ethnicity-A supplement to mental he alth: A report of the Surgeon General Rockville, MD: U. S. Department of Health and Human Services, Substance Abuse and Mental Health Services Admi nistration, Center for Mental Health Services.

PAGE 166

157 U.S. Department of Health and Human Services (2005). 2005 National healthcare disparities report (NHDR) Retrieved January 15, 2006, from http://www.qualitytools.ah rq.gov/disparitiesreport/ Van de Vijver, F. (2001). The evolution of cross-cultural research methods. In D. Matsumoto (Ed.), The handbook of culture and psychology (pp.77-97). New York: Oxford University Press. Van de Vijver, F. J. R., & Leung, K. ( 2000). Methodological issues in psychological research on culture. Journal of Cross-Cultural Psychology, 31 33-51. Van Tran, T. (1997). Exploring the equivalence of factor structure in a measure of depression between Black and White women: Measurement issues in comparative research. Research on Social Work Practice, 7 500-517. Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3 4-70. Wang, M., & Russell, S. S. (2005). Measurem ent equivalence of the Job Description Index across Chinese and American worker s: results from confirmatory factor analysis and item response theory. Educational and Psychological Measurement, 65, 709-732. Watson, L. C., Lewis, C. L., Kistler, C. E ., Amick, H. R., & Boustani, M. (2004). Can we trust depression screening instruments in health old-old adults? International Journal of Geriatric Psychiatry, 19 278-285.

PAGE 167

158 Weissman, M. M, Bland, R. C., Canino, G. J ., Faravelli, D., Greenwald, S., Hwu, H. G., et al. (1996). Crossnational epidemiology of ma jor depression and bipolar disorder. Journal of the American Medical Association, 276 293-296. Williams, D. R. (2005). The health of U. S. racial and ethnic population. Journal of Gerontology: Series B, 60B (Special Issue II), 53-62. World Health Organization (1983). Depressive disorders in different cultures: Report of the WHO collaborative study of standardized assessment of depressive disorders Geneva: Author. Wray, L., A., Alwin, D. F., & McCammon, R. J. (2005). Social status and risky health behaviors: results from the Health and Retirement Study. Journal of Gerontology: Series B, 60B (Special Issue II), 85-92. Yang, F. M., & Jones, R. N. (in press). Center for Epidemiologic Studies-Depression scale (CES-D) item response bias fo und with Mantel-Haenszel method was successfully replicated using latent variable modeling. Journal of Clinical Epidemiology Ying, Y. (1989). Nonresponse on the Center for Epidemiological Studies-Depression Scale in Chinese Americans. International Journal of Social Psychiatry, 35 156163. Zich, J. M., Attkisson, C. C., & Greenfield, T. K. (1990). Screening for depression in primary care clinics: The CES-D and the BDI. International Journal of Psychiatry in Medicine, 20 259-277.

PAGE 168

ABOUT THE AUTHOR Giyeon Kim received a Bachelors Degr ee in Human Development and Family Studies from Duksung Womens University in 2000 and a M.A. in Human Development from Ewha Womans University in 2002. She entered the School of Aging Studies at the University of South Florida for her Ph.D. degree in 2003. While in the Ph.D. program at the University of South Florida, Ms. Kim was involved in a number of health disparities studies as a Principal Investigator/co-Principal Investigator and has been devoted to devel oping an understanding of advanced analytic models in the mental health and minority agi ng research. She was a coauthor of thirteen peer-reviewed articles and received The Sout hern Gerontological So cietys Student Paper Award for her work on measurement issu es in multi racial/ethnic society. She is currently working as a Post-Doctoral Fellow at Temple University and conducts research on racial/ethni c disparities in health and healthcare utilization among older adults.


xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam 2200409Ka 4500
controlfield tag 001 001988971
005 20090216140945.0
007 cr bnu|||uuuu
008 090216s2007 flu s 000 0 eng
datafield ind1 8 ind2 024
subfield code a E14-SFE0002205
035
(OCoLC)307662148
040
FHM
c FHM
049
FHMM
090
HQ1061 (Online)
1 100
Kim, Giyeon.
0 245
Measurement equivalence of the center for epidemiological studies depression scale in racially/ethnically diverse older adults
h [electronic resource] /
by Giyeon Kim.
260
[Tampa, Fla] :
b University of South Florida,
2007.
500
Title from PDF of title page.
Document formatted into pages; contains 158 pages.
Includes vita.
502
Dissertation (Ph.D.)--University of South Florida, 2007.
504
Includes bibliographical references.
516
Text (Electronic dissertation) in PDF format.
520
ABSTRACT: This dissertation study was designed to examine measurement equivalence of the Center for Epidemiological Studies Depression (CES-D) Scale across White, African American, and Mexican American elders. Specific aims were to identify race/ethnicity-, sociodemographic-, and acculturation and instrument language-related measurement bias in the CES-D. Three studies were conducted in this dissertation to accomplish these aims. Two existing national datasets were used: the New Haven Established Populations for Epidemiologic Studies of the Elderly (EPESE) for the White and African American samples and the Hispanic Established Populations for Epidemiologic Studies of the Elderly (H-EPESE) for the Mexican-American sample. Differential item functioning (DIF) analyses were conducted using both confirmatory factor analysis (CFA) and item response theory (IRT) methods. Study 1 focused on the role of race/ethnicity on the measurement bias in the CES-D.Results from Study 1 showed a lack of measurement equivalence of the CES-D among Mexican Americans in the comparison with both Whites and Blacks. Race/ethnicity-specific items were also identified in Study 1: two interpersonal relation items in Blacks and four positive affect items in Mexican Americans. Study 2 focused on identifying sociodemographic-related measurement bias in responses to the CES-D among diverse racial/ethnic groups. Results from Study 2 showed that gender and educational attainment affected item bias in the CES-D. The interaction between gender and educational level and race/ethnicity was also found in Study 2: Mexican American women and lower educated Blacks had a greater predisposition to endorse the 'crying' item. Focusing on Mexican American elders, Study 3 examined how level of acculturation and language influence responses to the CES-D. In Study 3, acculturation and instrument language-biased items were identified in Mexican American elders.Study 3 also suggested that acculturation-bias was entirely explained by whether the CES-D was administered in the English or the Spanish versions. Possible reasons for item bias on the CES-D are discussed in the context of sociocultural differences in each substudy. Findings from this dissertation provide a broader understanding of sociocultural group differences in depressive symptom measures among racially/ethnically diverse older adults and yield research and practice implications for the use of standard screening tools for depression.
538
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
590
Co-advisor: David A. Chiriboga, Ph.D.
Co-advisor: Yuri Jang, Ph.D.
653
Depressive symptoms
Measurement equivalence
Health disparities
Differential item functioning
CES-D
690
Dissertations, Academic
z USF
x Aging Studies
Doctoral.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.2205