USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
xml version 1.0 encoding UTF8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 001920199 003 fts 005 20080107130409.0 006 med 007 cr mnuuuuuu 008 080107s2006 flu sbm 000 0 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0001842 035 (OCoLC)187912549 040 FHM c FHM 049 FHMM 090 TA145 (ONLINE) 1 100 Ye, Xin. 0 245 Development of models for understanding causal relationships among activity and travel variables h [electronic resource] / by Xin Ye. 260 [Tampa, Fla] : b University of South Florida, 2006. 3 520 ABSTRACT: Understanding joint and causal relationships among multiple endogenous variables has been of much interest to researchers in the field of activity and travel behavior modeling. Structural equation models have been widely developed for modeling and analyzing the causal relationships among travel time, activity duration, car ownership, trip frequency and activity frequency. In the model, travel time and activity duration are treated as continuous variables, while car ownership, trip frequency and activity frequency as ordered discrete variables. However, many endogenous variables of interest in travel behavior are not continuous or ordered discrete but unordered discrete in nature, such as mode choice, destination choice, trip chaining pattern and timeofday choice (it can be classified into a few categories such as AM peak, midday, PM peak and offpeak). A modeling methodology with involvement of unordered discrete variables is highly desired for better understanding the causal relationships among these variables. Under this background, the proposed dissertation study will be dedicated into seeking an appropriate modeling methodology which aids in identifying the causal relationships among activity and travel variables including unordered discrete variables. In this dissertation, the proposed modeling methodologies are applied for modeling the causal relationship between three pairs of endogenous variables: trip chaining pattern vs. mode choice, activity timing vs. duration and trip departure time vs.mode choice. The data used for modeling analysis is extracted from Swiss Travel Microcensus 2000. Such models provide us with rigorous criteria in selecting a reasonable application sequence of submodels in the activitybased travel demand model system. 502 Dissertation (Ph.D.)University of South Florida, 2006. 504 Includes bibliographical references. 516 Text (Electronic dissertation) in PDF format. 538 System requirements: World Wide Web browser and PDF reader. Mode of access: World Wide Web. 500 Title from PDF of title page. Document formatted into pages; contains 203 pages. Includes vita. 590 Adviser: Ram M. Pendyala, Ph.D. 653 Travel behavior. Discrete choice model. Econometric modeling. Endogenous variable. Mixed logit model. Discretecontinuous model. 690 Dissertations, Academic z USF x Civil Engineering Doctoral. 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.1842 PAGE 1 Development of Models for Unders tanding Causal Relationships Among Activity and Travel Variables by Xin Ye A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Civil and Environmental Engineering College of Engineering University of South Florida Major Professor: Ram M. Pendyala, Ph.D. John J. Lu, Ph.D., P.E. Manjriker Gunaratne, Ph.D., P.E. Xuehao Chu, Ph.D. Gabriel Picone, Ph.D. Date of Approval: May 30, 2006 Keywords: travel behavior, discrete choice model, econometric modeling, endogenous variable, mixed logit model, discretecontinuous model Copyright 2006, Xin Ye PAGE 2 Acknowledgments I am indebted to my academic advisor, Dr. Ram M. Pendyala, for his guidance, patience, encouragement and support throughou t my fiveyear Ph.D. study. I thank Dr. John Lu, Dr. Elaine Chang, Dr. Manjri ker Gunaratne, Dr. Xuehao Chu, Dr Gabriel Picone and Dr. Steve Polzin for serving on my committee and providing their valuable suggestions. I am grateful to Dr. Chandra R. Bhat and Mr. Abdul Pinjari in Department of Civil, Architectural & Envi ronmental Engineering at University of Texas at Austin. Abdul used to be one of my colleagues and cl ose friends at USF. After he joined the Ph.D. program at UT Austin, we still keep in touch and continue our friendship. At the early stage of this study, I greatly benefits from his help and discussion with him. I also want to express my gratitude to my colleague, Amlan Banerjee and my schoolmate, Pan Liu. Pan gave me an important suggestion at the final stage of this study. In addition, I would like to acknowledge Dr. Giovanni Go ttardi in Jenni+Gottardi AG, Zurich, Switzerland for providing me with comprehens ive data used in th is study. Finally, I dedicate my dissertation effort to my pare nts Bangjie Ye and Rong Chen and my wife Wenjia Guan. Without their support it would never happen. PAGE 3 Table of Contents List of Tables iv List of Figures vii Abstract viii Chapter One: Introduction 1 1.1 Background 1 1.2 Objectives 8 1.3 Scope 9 1.4 Outline 10 Chapter Two: Modeling Methodology 11 2.1 Background 11 2.1.1 Review of Lin ear Regression Model 11 2.1.2 Review of Simultaneous Equations Model (Joint Relationship Among Continuous Variables) 13 2.1.3 Review of Structural Equations Model (Causal Relationship Among C ontinuous Variables) 15 2.1.4 Review of Discrete Choice Model 21 2.1.5 E ndogenous Variable in Discre te Choice Model 26 2.2 Modeling Methodology for Causal An alysis in Discrete Choices (Discrete Choice Modeling Methodol ogy with Endogenous Variable) 27 2.2.1 Recursive Bivariate Probit Model (Causal Relationship Between Two Binary Choice Variables) 27 2.2.2 Simultaneous Equations Model Using Lee Transformation (Causal Relationship Between On e Continuous Variable and One Multinomial Choice) 32 2.2.3 Mixed Simultaneous Equations Model Using Flexible Error Structure 39 2.2.3.1 Mixed Discretecontinuous Mode l (Causal Relationship Between One Continuous Variable and One Multinomial Choice Variable) 39 2.2.3.2 Mixed Binarymultinomial Choice Model (Causal Relationship Between One Binary Choice Variable and One Multinomial Choice Variable) 47 2.3 Nonnested Test for Choosing Alternative Causal Structure 53 i PAGE 4 2.3.1 Cox Test for Separa te Families of Hypothesis 53 2.3.2 Nonnested Test in Discrete Choice Model 53 2.3.3 Extension to Discretecontinuous Model System 56 2.4 Monte Carlo Study for Bivariat e Probit Model and Lee Model 59 2.4.1 Introduction 59 2.4.2 Monte Carlo Studies for Bivariate Probit Model 60 2.4.3 Monte Carlo Studies for Recu rsive Bivariate Probit Model 63 2.4.4 Monte Carlo Studies for Lee Model 72 2.4.5 Summary 78 Chapter Three: Dataset Pr eparation and Description 80 3.1 Introduction to Swiss Travel Survey 80 3.2 Dataset Description at Household Level 80 3.3 Dataset Description at Person Level 82 3.4 Dataset Description at Trip Level 83 Chapter Four: Empirical Estimation Results 86 4.1 Causal Models Between Tr ip Chaining and Mode Choice (Recursive Bivariat e Probit Model) 86 4.1.1 Background 86 4.1.2 Dataset Preparation and Desc ription for Modeling Analysis 89 4.1.3 Model Estimation Results 94 4.1.3.1 Estimation Results for Nonwork Tours 94 4.1.3.2 Estimation Results for Work Tours 100 4.1.4 Model Performance Comparisons Based on Nonnested Test 103 4.1.5 Discussions and Conclusions 106 4.2 Causal Models Between Activ ity Timing and Activity Duration (Mixed Discretecontinuous Model and Lee Model) 111 4.2.1 Background 111 4.2.2 Data Preparation and Description for Model Analysis 113 4.2.3 Model Estimation Results 118 4.2.3.1 Estimation Results for Noncommuters 118 4.2.3.2 Estimation Results for Commuters 124 4.2.4 Model Performance Comparisons Based on Nonnested Test 135 4.2.5 Discussions and Conclusions 136 4.3 Causal Models Between Trip Timing and Mode Choice (Mixed Binarymultinomial Choice Model) 142 4.3.1 Background 142 4.3.2 Dataset Preparation and Descrip tion for Modeling Analysis 145 4.3.3 Model Estimation Results 150 4.3.3.1 Estimation Results for Noncommuters 150 4.3.3.2 Estimation Results for Commuters 156 4.3.4 Model Performance Comparison Based on Nonnested Test 161 4.3.5 Discussions and Conclusions 162 ii PAGE 5 Chapter Five: Conclusions and Discussions 168 5.1 Contribution to the Field 168 5.1.1 Methodological Contribution 168 5.1.2 Behavioral Contribution 169 5.1.3 Practical Contribution 169 5.1.4 Empirical Contribution 170 5.2 Future Research Direction 171 References 173 Bibliography 180 Appendices 182 Appendix A: Gauss Code for Genera ting and Storing Halton Sequences 183 Appendix B: Gauss Code of Mixed Discretecontinuous Model (Exemplified by Noncommuter Model Where Timeofday Choice Affects Activity Duration) 185 Appendix C: Gauss Code of Discretecontinuous Model Based on Lee Transformation (Exemplified by Noncommuter Model Where Timeofday Choice Affects Activity Duration) 190 Appendix D: Gauss Code of Mixed Binarymultinomial Choice Model (g i Fixed at 1, Exemplified by Noncommuter Model Where Binary Timeofday Choice Affects Multinomial Mode Choice) 193 Appendix E: Gauss Code of Mixed Binarymultinomial Choice Model (f i  = g i , Exemplified by Noncommut er Model Where Binary Timeofday Choice Affects Mu ltinomial Mode Choice) 198 About the Author End Page iii PAGE 6 List of Tables Table 1.1 Summary of Bias in When x 1 2 is Omitted 13 Table 2.1 Statistics of Estimators for Bivari ate Probit Model 64 Table 2.2 Statistics of Estimators for Recursive Bivariate Probit Model 69 Table 2.3 Statistics for Nonnested Test Application 72 Table 2.4 True Values of Parameters in the Model 74 Table 2.5 Statistics of Estimators from Joint Estimation Procedure (without Endogenous Variables) 76 Table 2.6 Statistics of Estimators fr om Recursive Estimation Procedure (without Endogenous Variables) 77 Table 2.7 Statistics of Es timators from Joint Estimation Procedure (with Endogenous Variables) 77 Table 2.8 Statistics of Estimators fr om Recursive Estimation Procedure (with Endogenous Variables) 79 Table 3.1 Household Characteristics of Swiss Travel Microcensus 2000 81 Table 3.2 Person Characteristics of Sw iss Travel Microcensus 2000 83 Table 3.3 Trip Characteristics of Swiss Travel Microcensus 2000 (Trip Purpose Distributio n by Trip Mode) 84 Table 3.4 Trip Characteristics of Swiss Travel Microcensus 2000 (Trip Mode Distribution by Trip Purpose) 84 Table 4.1 Household Characteristics of Swiss Travel Microcensus 2000 and Zurich Subsamples 91 Table 4.2 Person Characteristics of Swiss Travel Microcensus 2000 and Zurich Subsamples 92 Table 4.3 Crosstabulation of Mode Choice and Tour Type for Nonwork Tours 93 iv PAGE 7 Table 4.4 Crosstabulation of Mode Choice and Tour Type for Work Tours 93 Table 4.5 Nonworktour Model Variable De scription and Statistics (N = 4901) 96 Table 4.6 Nonworktour Model 97 Table 4.7 Worktour Model Variable Desc ription and Statistics (N = 1711) 101 Table 4.8 Worktour Model 104 Table 4.9 Comparisons of Goodnessoffit of Recursive Bivariate Probit Models 106 Table 4.10 Household Characteristics of Swiss Travel Microcensus 2000 and Sample for Model of Maintenance Activity Duration and Timeofday Choice 115 Table 4.11 Person Characteristics of Swi ss Travel Microcensus 2000 and Sample for Model of Activity Duration and Ti meofday Choice 116 Table 4.12 Description of Endogenous Variables in Noncommuter Sample 117 Table 4.13 Description of Endogenous Variables in Commuter Sample 117 Table 4.14 Description and Definition of Variables in Timingduration Model 125 Table 4.15 Noncommuter Model (Duration Timeofday) 126 Table 4.16 Noncommuter Model (Timeofday Duration) 127 Table 4.17 Commuter Model (Duration Timeofday) 133 Table 4.18 Commuter Model (Timeofday Duration) 134 Table 4.19 Simulationbased Hypothesis Test for Error Covariance of Identified Mixed Discretecontinuous Models 135 Table 4.20 Comparison of Goodnessoffit of Timingduration Models 137 Table 4.21 Household Characteristics of Swiss Travel Microcensus 2000 and Sample for Model of Mode Choice and Timeofday Choice 146 Table 4.22 Person Characteristics of Swiss Travel Microcensus 2000 and Sample for Model of Mode Choice and Timeofday Choice 147 Table 4.23 Crosstabulation of Mode Choice and Timeofday Choices for Noncommuters 148 v PAGE 8 Table 4.24 Crosstabulation of Mode Choice and Timeofday Choices for Commuters 149 Table 4.25 Variable Description in Timingmode Choice Model 152 Table 4.26 Noncommuter Model (Mode Timeofday) 155 Table 4.27 Noncommuter Model (Timeofday Mode) 157 Table 4.28 Commuter Model (Mode Timeofday) 159 Table 4.29 Commuter Model (Timeofday Mode) 160 Table 4.30 Comparison of Goodnessoffit of Timingmode Choice Models 162 vi PAGE 9 List of Figures Figure 1.1 Joint Relationship Between Mode Choice and Destination Choice 7 Figure 1.2 Causal Rela tionship Between Mode Choice and Trip Chain Type Choice 8 Figure 2.1 Distribution of z (N = 930 and N1 = 1000) 68 Figure 4.1 Diagram of Consistent Causal Relationship Identified by Joint Timingduration Model for Noncommuters 138 Figure 4.2 Diagram of Consistent Caus al Relationship Id entified by Joint Timingduration Model fo r Commuters 138 Figure 4.3 Diagram of Causal Relati onship of Mixed Binarymultinomial Choice Models for Noncommuters 164 Figure 4.4 Diagram of Causal Relationship of Mixed Binarymultinomial Choice Models for Commuters 164 vii PAGE 10 Development of Models for Unders tanding Causal Relationships Among Activity and Travel Variables Xin Ye ABSTRACT Understanding joint and causal relationships among multiple endogenous variables has been of much in terest to researchers in the field of activity and travel behavior modeling. Structur al equation models have b een widely developed for modeling and analyzing the cau sal relationships among travel time, activity duration, car ownership, trip frequency and activity frequency. In the model, travel time and activity duration are treated as conti nuous variables, while car ownership, trip frequency and activity frequency as ordered discrete variables. However, many endogenous variables of interest in travel behavior ar e not continuous or ordered discre te but unordered discrete in nature, such as mode choice, destination c hoice, trip chaining pattern and timeofday choice (it can be classified in to a few categories such as AM peak, midday, PM peak and offpeak). A modeling methodology with involve ment of unordered discrete variables is highly desired for better understanding the causal relationships among these variables. Under this background, the proposed dissertation study will be dedicated into seeking an appropriate modeling methodology which aids in identifyi ng the causal relationships among activity and travel variables incl uding unordered discrete variables. viii PAGE 11 In this dissertation, the proposed mo deling methodologies are applied for modeling the causal relationship between three pairs of endogenous variables: trip chaining pattern vs. mode choice, activity timi ng vs. duration and trip departure time vs. mode choice. The data used for modeling analysis is extracted from Swiss Travel Microcensus 2000. Such models provide us with rigorous criteria in selecting a reasonable application sequence of submodels in the activitybased travel demand model system. ix PAGE 12 Chapter One: Introduction 1.1 Background Understanding joint and causal relationshi ps among multiple dependent variables has been of much interest to researchers in the field of activity and travel behavior modeling (Fujii and Kitamura, 2000; Golob, 2003). An important reason for this interest is the transition of travel demand model fr om tripbased approach to activitybased approach and from aggregate le vel to disaggregate level. Tripbased approach, namely fourstep travel demand model consisting of trip generation, trip distribution, mode split and ne twork assignment, played a key role in transportation planning over the past decades. Nowadays, as the planning emphasis shifts from infrastructure construction to transportation system management, travel demand management (TDM) and transportation contro l measure (TCM), the tripbased model, which solely focuses on trip frequencies, is insufficient to evaluate emerging policies of transportation planning. The performance of policies is unlikely to be measured by the number of trips being suppressed or induce d, which are estimated in the tripbased modeling system. However, activitybased model is considered a powerful tool for evaluating these policies in a reasonable fram ework. Built upon the theory that travel is derived demand from activity, activitybased ap proaches directly model and estimate the activity variables (e.g. activity ti ming and duration) and then deri ve the travel variables. 1 PAGE 13 Thus, activitybased model is reflective of the change in activity pattern as well as in travel pattern in response to planning policies. It is considered that peoples activity pattern is a better measurement for evaluati ng policies than trip frequency due to its potential to reflect peoples life style and life quality. The other tendency in the development of travel demand mo del is its moving from aggregate level to disaggregate le vel in developed countries over the past decades. The most conventional travel dema nd model was developed at aggregate level, in which traffic analysis zone (TAZ) is treated as basic analytical unit. In such models, the independent variables generally inco rporate demographic and socioeconomic characteristics and geographical information at zone level, while the dependent variable is the total number of trips originated from TAZ. The general argument on aggregatelevel model is that zones do not really make trips but only people living in the zone do. The model aggregating the total number of tr ips by the people in th e zone as dependent variables lacks of a sound behavioral base due to its disregarding the interactions and constraints among the trips made by the same i ndividual or by the individuals in the same household. On the other side, the disaggreg atelevel travel de mand model directly considers the each household or individual travel er as basic analytical unit instead of TAZ in aggregatelevel model, thus the rules governing peoples travel behavior can be reflected in the model. Cooperating with activitybased approach, disaggregatelevel model can be developed at indi vidual level for realizing a mi crosimulation of travelers activity and travel patterns w ith adequate consideration of interactions and constraints among both activity and travel decisions. Albotrass by Timmermans and Arentze (2000), CEMDAP by Bhat et al. (2004) and FAMO S by Pendyala et al. (2005) are typical 2 PAGE 14 examples of the daily activity travel pattern microsimulators that have been developed recently. Similar to the traditional tripbased model, the practical activitybased model incorporates a series of submodels such as activity generation model, activity timing and duration model, mode split model, destination choice model etc. The sequence of submodels application is a critic al issue in the process of microsimulation because only the model truly reflecting travel ers decisionmaking process is capable of providing an accurate estimate on their activity and travel patterns, mode and destination choice behavior etc. However, in most cases, there is no theorem that can explicitly identify the ambiguous process of decisionmaking on travel behavior. For example, if discretionary activitys timing and duration are estimated in separate submodels, it is not straightforward to determine the sequence of the model implementation. It can be conjectured that people may first time the activities then decide the activity duration conditional on the timing, or they decide the activity duration earlier and then time the activity given the predetermined duration. Extensive data may be collected for answering these questions but conventional trip diary da ta is usually the only resource for travel behavior analysts and travel demand modelers Merely based on conventional trip diary data, this dissertation is dedi cated for a better understandi ng of the ambiguous decisionmaking sequence in travel behavior by analyz ing activity and travel variables within an advanced econometric modeling framework. There are dependent variab les and independent variab le in statistical or econometric models. Both dependent variab les and independent variables are random variables. Independent vari ables are predetermined and influence the mathematical 3 PAGE 15 expectation or expected occurrence probability (e.g. discrete model or count data model) of dependent variables. He reinafter, expectation is referred to as mathematical expectation or expected occurrence probability in the interest of brevity. In activitybased travel demand model, th e dependent variables include activity and travel variables such as activity timing and duration, activity location choice, trip timing, mode choice and route choice, etc. Peoples demographi c characteristics (e.g. ag e and gender), socioeconomic characteristics (e.g. income and em ployment) and geographi cal characteristics (e.g. land use pattern and transportation system ) serve as independent variables. Given different values of the independent variables, the expectation of dependent variables will appear different. For example, females are expected to allocate more time on shopping activities than males, where female is an independent variable and shopping activity duration is a dependent variable. In most econometric models, random distur bance needs to be specified to take account of unobserved or unspecified independent variables. It is necessary to assume that the specified independent variable is not correlated with the random disturbance for accurately estimating the impact of this vari able on the dependent variable. Once this assumption is violated, this independent variable is called endogenous variable. Otherwise, this variable is called exoge nous variable. In ec onometric literature, instrumental variables are usually adopted for accurately estimating the impact of endogenous variable. The details about this approach will be reviewed in Chapter 2. In many cases, dependent variables are not mutually correlated, particularly in the context of travel behavior modeling. In th is dissertation, two types of relationships among dependent variables: joint relationshi p and causal relationship are defined. The 4 PAGE 16 joint relationship between two dependent vari ables indicates the existence of common unobservable variables which simultaneously in fluences the expectation of these two dependent variables. For example, as shown in Figure 1.1, car ownership is an exogenous variable for shopping mode choi ce and the distance between home and shopping center is an exogenous variable fo r both shopping mode choice and shopping destination choice. However, there are probably unobserved variables which simultaneously influence these two dependent variables. For instance, in US household travel survey, household income data collec tion suffers a rather low response rate (usually lower than 70%) due to the consid erable privacy attention. Thus, income is rarely specified into an applicable travel demand model but income may simultaneously influence shopping mode choice and shopping destin ation choice. In th at case, these two variables, indicating shoppi ng mode choice and shopping des tination choice, have joint relationship according to the previous definition. Causal relationship between two depe ndent variables indicates that the expectation of one dependent variable is predetermined and then influences the expectation of the other depe ndent variable. Herein, th e causal relationship is not referred to as a deterministic causeeffect rela tionship, where cause must lead to effect. Instead, causal rela tionship in this dissert ation specifically indicat es that one endogenous variable exerts impact on the expectation of the other endogenous variable. Figure 1.2 illustrates the causal relationship, where car ownership and household size serve as exogenous variables for mode c hoice and household size serves as exogenous variable for trip making within a homebased trip chain (tour). There are probably some unobservable variables such as income, habit, preference, which are not specified into the 5 PAGE 17 model but simultaneously influence expectati on of both dependent variables. Meanwhile, one may conjecture that there might be causal relationship between both dependent variables: mode choice and stopmaking in homebased trip chain. On one hand, multistop trip chain making is more dependent on auto mode given its convenience and flexibility (the causal direction is described by the arrow in Figure 1.2). It means that if people first decide to pursue a multistop trip chain, the expectation of auto mode selection turns to be higher. On the other hand, auto usage may stimulate the desire to make multistop trip chain. The travelers using auto have potential to serve passengers, thereby causing more stops within a trip chai n (the causal directi on is described by the dashed arrow in Fig 1.2). It means that if people first decide to use auto mode, the expectation of multistop trip ch ain making turns to be higher. Joint relationship and causal relationship ubiquitously exist among the travel and activity variables due to the complexity of tr avel behavior. Similar relationships exist between travel time and activity duration (Kitamura et al., 1996; Golob et al., 2000), activity timing and duration (Pendyala and Bh at, 2004), trip chaining and mode choice (Bhat and Sardesai, 2006; Ye et al., 2006), trav el timing and mode choice (Tringides et al., 2004) and activitytravel pattern and time use pattern between household members (Meka and Pendyala, 2002). From the viewpoint of econometrics, th e impact of exogenous variables on the continuous endogenous variables can be consis tently estimated us ing Ordinary Least Square (OLS) method in a linear regression model. However, the impact of one dependent variable on the other dependent va riable cannot be consistently estimated using OLS method due to common unobservable variables (see Chapter 2 for details). 6 PAGE 18 Structural Equations Model (SEM), in a simultaneous equation modeling system, has been a fullfledged approach for consistently and efficiently estimating the coefficients and modeling the causal relationships among c ontinuous variables and ordered discrete variables indicated by c ontinuous latent variab les. In travel behavior study, SEM has been widely developed for modeling and an alyzing the causal rela tionships among auto ownership, travel time, activity duration, trip frequency and activity frequency. Golob (2003) gives a comprehensive review of SEM application in travel behavior study. In those models, travel time and activity durat ion are treated as continuous variables, whereas auto ownership, trip frequency and ac tivity frequency usually as ordered discrete variables. The mechanism of SEM will be reviewed in Chapter 2 in a detailed manner. Distance from Home Shopping Mode (Drive, Ca rpool or Transit) Shopping Center A vs. Shopping Center B Unobservable Variables Car Ownership Figure 1.1 Joint Relationship Between Mode Choice and Destination Choice However, many variables of interest in tr avel behavior study are not continuous or ordinal but unordered discrete in nature, such as mode choice, des tination choice, trip chaining pattern and timeofday choice (if the timing can be classified into a few categories including AM peak, midday, PM peak and offpeak). A modeling methodology with involvement of unordered discre te variables is highly desired for better 7 PAGE 19 understanding the ambiguous decision sequenc e and causal relationships among travel and activity variables. 1.2 Objectives This dissertation intends to propose a modeling me thodology integrating the unordered multinomial discrete endogenous variables into the framework of structural equation model that used to only allow the causal analysis among continuous endogenous variables and ordered discrete endogenous variables. There is more than one approach to r ealize the objective. Based on different assumptions, one may have different model structure and estimation results. This dissertation will be focused on two types of models: the causal model based on Lees transformation (Lee, 1992) and the mixed causa l model. The performance of these two types of model will be compared in the c ontext of activitytravel behavior analysis. Car Ownership Mode Choice (Drive, Carpool or Transit) Number of Stops in Homebased Trip Chain (One Stop or Multiple Stops) Household Size Unobservable Variables Figure 1.2 Causal Relationship Between Mode Choice and Trip Chain Type Choice 8 PAGE 20 1.3 Scope The proposed modeling methodology will be applied to help understand the causal relationship between the following relevant travel and activity variables: Timing and duration of maintenance activities Trip chaining pattern and mode choice Trip departure timing and mode choice Causal analysis between timing and dur ation of maintenance activities is conduced by joint modeling methodology based Lee transformation and mixed discretecontinuous model. Simulati onbased hypothesis test will be proposed to examine the significance of error correlation in mixed discretecontinuous models. Recursive bivariate probit model is adopted to analyze the causal relationship between trip chaining patter and mode choice. The causal relations hip between trip depa rture timing and mode choice is analyzed in the proposed mixed bina rymultinomial choice model, in which trip departure timing is treated as a binary c hoice (peak vs. nonpeak) and mode choice as multinomial choices (Singleoccupancy vehicl e, Highoccupancy ve hicle, Transit and Nonmotorized mode). In addition, appropriate statistical tests will be applied for comparing the performance of competing models under altern ative causal structures in a rigorous way. Nonnested test for discrete choice model (B enAkiva and Lerman, 1983) can be directly applied to compare recursive bivariate probit model and mixed binarymultinomial choice model under alternative causa l structures. However, it is inappropriate to directly apply it for joint discretecontinuous model. In this dissertation, an extension of non9 PAGE 21 nested test will be proposed for rigorous ly comparing nonnested discretecontinuous model. 1.4 Outline The remainder of the dissertation is orga nized as follows. Chapter 2 initially reviews the existing modeling methodology fo r causal analysis among continuous and ordereddiscrete endogenous va riables, and then proposes the modeling methodology for causal analysis with involvemen t of unordered discrete variab les. Then, nonnested test and its extension are formulated, followed by a series of Monte Ca rlo studies regarding the modeling methodology and the statistical test. Chapter 3 briefly introduces Swiss Travel Microcensus 2000 from which datase t is extracted for m odeling analysis and provides a brief description of datasets. Chap ter 4 presents the empirical results of model estimation within the scope of the causal m odeling analysis between trip chaining pattern and mode choice, between ac tivity timing and duration, a nd between trip departure timing and mode choice. Conclusions a nd contributions are summarized and some recommendations for future research are provided in Chapter 5. 10 PAGE 22 Chapter Two: Modeling Methodology 2.1 Background 2.1.1 Review of Linear Regression Model In econometrics, linear regression model is a standard model for quantifying the impacts of exogenous variables on the expectation of contin uous endogenous variable. Assume the random continuous vari able y can be expressed as: y = 0 + 1 x 1 + 2 x 2 + + n x n + u, (2.1.1) where x 1 x 2 ,,x n are random exogenous variables of interest, 0 1 ,, n are constant coefficients associated with exogenous variab les, and u is a random variable which takes account of all the other unspecified or unobs erved factors influencing y. A critical assumption of linear regression model is that the expectation of u gi ven all the exogenous variables is zero, denoted as E(ux 1 x 2 .., x n ) = 0. The impact of exogenous variable x i on expectation of y can be measured by E(yx i )/ x i = i In other words, the impact estimation of the exogenous variable on endogenous variable is attributab le to an accurate estimation of i Based on a random sample from populati on with respect to random variables y and x 1 x 2 ,,x n Ordinary Least Square (OLS) method can offer statistically consistent estimators of i by minimizing the sum of observed [y ( 0 + 1 x 1 + 2 x 2 + + n x n )] 2 The estimators can be simply derived and expres sed in a form of matrix algebra. Let Y = 11 PAGE 23 X + u, Y is a column vector of (y 1 y 2 y N ), where N represents the sample size. X is an N n matrix of and u is a column of (u Nn 2N1N n2 22 21 n1 12 11x..xx ........ x..xx x..xx 1 u 2 u N ). Z = = (Y X ) (Y X ). (2.1.2) N 1i 2 inn i22i11 0i)]x x x ([y To minimize Z, let Z/ = 0. It is easy to show that XY + 2(XX) = 0, then and Y'X)X'X( 1 2 Z/ 2 = (XX) 1 > 0, which ensures that z is minimized. Since and Y = X + u, then it can be shown that By taking expectati on on both sides of the equation, one may obtain that It indicates that is an unbiased estimator of If Var(ux) is denoted as Y'X)X'X( 1 u'X)X'X(u) X('X)X'X( 1 1 )Xu(E'X)X'X()X (E1 2 Var(X) = E[()()] = E[uu](XX) 1 = 2 (XX) 1 In the following simple example, the prope rty of estimators is discussed under the situation where one exogenous variable is unobservable. Suppose the population model is y = X 1 1 + X 2 2 + u but X 2 is unobservable. One can only specify X 1 in the model as y = X 1 1 + u where u = X 2 2 + u. Applying OLS to estimate the misspecified model, one may obtain (2.1.3) y'X)X'X( 1 1 11 1 Since the true population model is y = X 1 1 + X 2 2 + u, y in Equation (2.1.3) can be rewritten as )u X X('X)X'X( 22 111 1 11 1 u'X)X'X(X'X)X'X(1 1 11 221 1 11 1 (2.1.4) 12 PAGE 24 Take expectation on both sides of the equation, then given the assumption that E(ux 21 1 11 211X'X)X'X(E ) E( 1 ) = 0. The extent to which is inconsistent depends on the second term including 1 2 and the correlation between x 1 and x 2 Table 1.1 offers a summary of bias in when x 1 2 is omitted from the model specification. Table 1.1 Summary of Bias in When x 1 2 is Omitted Corr(x 1 x 2 ) > 0 Corr(x 1 x 2 ) < 0 2 > 0 Positive bias Negative bias 2 < 0 Negative bias Positive bias The bias in estimators caused by the om ission of exogenous va riables is called omission bias. Such bias ubiquitously exis t in a linear regression model when it is applied to travel behavior anal ysis, particularly in the mode l aimed at modeling the causal relationship among activity and travel variable s. More advanced estimation technique has been developed for consistently estima ting the impact of endogenous variables. These approaches will be reviewed in Section 2.1.3. 2.1.2 Review of Simultaneous Equations Model (Joint Relationship Among Continuous Variables) Linear regression model can only accommodate one de pendent variable. In activitybased travel demand model, there is usually more than one continuous dependent variable of interest (e.g. travel time and ac tivity duration). Some exogenous variables, which are unspecified or unobservable but influence these dependent variables simultaneously, lead to correlations among random disturbances in each single linear regression model. As defined previousl y, these dependent variables have joint 13 PAGE 25 relationships. SeeminglyUnrelated Regressi on (SUR) model is an appropriate modeling framework for multiple continuous dependent variables by accommodating their joint relationships in the rando m error correlations. SUR model takes the form of Y i = X i i + u i i = 1,2,M, where Y i is the i th continuous dependent variable, X i is a vector of exogenous explanatory variables for Y i i is a vector of model parameter for X i u i is random disturbance in the i th model. The set of equations may be written as M 2 1 M 2 1 M 2 1 M 2 1u ... u u ... X000 0...00 00X0 000X y ... y y (2.1.5) Let u = [u 1 , u 2 ,, u M ] and assume E[uX 1 X 2 ,X M ] = 0 Assume (2.1.6) I...II ............ I...II I...II V ]X,X ,XE[uu'MM 2M 1M M2 22 21 M1 12 11 M 21 where I represents identity matrix. Since each equation is a classical linear re gression model, Ordinary Least Square (OLS) estimators for the parameters in each e quation are consistent but inefficient. OLS method cannot utilize complete information from the data and yield estimators with lower confidence level. Instead, Generalized L east Square (GLS) met hod can provide both consistent and efficient estimators for all the parameters in the model systems, where GLS = [XV 1 X] 1 [XV 1 Y]. (2.1.7) 14 PAGE 26 Let = then V = MM 2M1M M2 22 21 M1 12 11... ............ ... ... I ( denotes Kronecker product) V 1 = 1 I Thus, SUR = [X( 1 I )X] 1 [X( 1 I )Y] and Var( ) = [X( SUR 1 I )X] 1 (2.1.8) A feasible estimator of matrix can be obtained by estimating each of the M equations separately by OLS method a nd using the residuals to estimate ij i.e. T e'e ji ij Here, T is sample size and e i e j represent OLS residuals in single equation i and j, respectively. In summary, GLS method, which is able to accommodate the joint relationships among the continuous variables in its covariance variance matrix of random error terms, needs to be applied for both consistent and efficient estimators in SUR model. Recursive estimation using OLS provides consiste nt but inefficient estimators. 2.1.3 Review of Structural Equations Mode l (Causal Relationship Among Continuous Variables) Similar to the SUR model, Structural E quations Model (SEM) is also a set of linear regression models, but it differs from SUR model in its inclus ion of the dependent variables as explanatory vari ables in the modeling system. This characteristic makes SEM a powerful modeling methodology for anal yzing the causal relationships among continuous endogenous variables. A typical structural equatio ns model (with G continuous endogenous variables) is defined by a matrix equation system as shown in 15 PAGE 27 Equation (2.1.9). (2.1.9) G 1 G 1u . u B XY Y . Y This equation can be rewritten as (2.1.10) uXBYY (or) (2.1.11) ) uX()BI(Y1where Y: a column vector of dependent variables, B: a matrix of parameters associated w ith righthandside endogenous variables, X: a column vector of exogenous variables, : a matrix of parameters associat ed with exogenous variables, and u: a column vector of error terms associated with the de pendent variables. SEM specifies dependent variables Y as e xplanatory variables as well as the other exogenous explanatory variables and estimates parameter matrix B to capture inherent causal relationship among dependent variables Y. The nonzero correlation between Y and u caused by simultaneous equations violates the assumption of Ordinary Least Square (OLS) method: E(Yu) = 0, thus OL S method for each single linear regression model will not yield consistent estimator on parameter matrix B. Based on Instrumental Variable (IV) approach, econometricians developed 2Stage Least Square (2SLS) and 3Stage Least Square (3SLS) method to obtain consistent estimators of parameters associated with e ndogenous variables. 3SLS estimator is more efficient than 2SLS estimator, since the former accommodates unequal variance of u in each single equation. 16 PAGE 28 Here, the mechanics of 2SLS is illustrate d in a simple example. Suppose one has the following structural equations mode l with two dependent variables. 2222 12 111uxy uyxy (2.1.12) where y 2 is specified into the model for y 1 and x 1 and x 2 are exogenous variables, thus x 1 and x 2 are uncorrelated with u 1 and u 2 u 1 and u 2 are correlated due to the common unspecified variables. Because y 2 must be correlated with u 2 y 2 must be correlated with u 1 Thus, y 2 is an endogenous variable in the model for y 1 OLS method cannot yield consistent estimator for One may obtain consistent estimator in the following way. Let Z = [x 1 x 2 ], X = [x 1 y 2 ], b = Let where Z is called instrumental variables. Then 1 1 1 1 1 1 1u'Z)X'Z(b)uXb('Z)X'Z(y'Z)X'Z(b ) N u'Z (limp) N X'Z (limpb)b (limp1 N 1 N N Since x 1 and x 2 are uncorrelated with u 1 and y 2 is correlated with x 2 0) N u'Z (limp1 N and 0) N X'Z (limp1 N Thus, which indicates that is consistent estimators for both b)b (limpN 1 1y'Z)X'Z( 1 and Alternatively, it can be shown that the same consistent estimator can be achieved by two stage least s quares (2SLS) method. In the first stage, regress y 2 on x 2 to get and In the second stage, regress y 22 1 222y'x)x'x( 22 1 222222y'x)x'x(x xy 1 on x 1 and to get the same consistent estimator for 2y 1 and In addition to Least Square (LS) approach, Maximum Li kelihood (ML) method can also be applied to consistently estimate the parameters in SEM. LimitedInformation Maximum Likelihood (LIML) estimate and FullInformation Maximum Likelihood 17 PAGE 29 (FIML) estimate in ML method are exactly th e counterparts of 2SLS and 3SLS in Least Square (LS) method. With normally dist ributed disturbances FIML is not only consistent but also efficient among all the estimators. The other advanced estimation approaches, such as AsymptoticallyDistrib utionFree Weighted Least Squares (ADF or ADFWLS) which is free of the asymptotical distribution on random terms, have been developed and applied in literature (see Golob, 2003 for more detailed review on estimation method of SEM). The endogenous variables in SEM can be eith er continuous or ordered in nature. For the ordered discrete variable x, it is assumed that ther e is a latent continuous variable u which is normally distributed with zero mean and unit variance. The connection between x and u is that x = i is equivalent to i1 < u < i where 0 = 1 < 2 < ... < k1 and k = + Here, all the i are called threshold values. If there are k categories, there are k1 unknown thresholds. Essentia lly, the procedure is exactly same as developing an ordered probit model without a ny explanatory variables. A series of conditions can be established as: If 0 < u < 1 x = 1, then Prob(x = 1) = ( 1 ) () = ( 1 ) ; If 1 < u < 2 x = 2, then Prob(x = 2) = ( 2 ) ( 1 ); If k1 < u < k x = k, then Prob(x = k) = (+ ) ( k1 ) = 1 ( k1 ). Then one can formulate a likelihood f unction for all the observations as N 1i )kx( 1k )2x( 1 2 )1x( 1i i i)(1...)()()( L (2.1.13) 18 PAGE 30 Thresholds i are estimated by maximizing the loglik elihood function. According to the estimated thresholds, this procedure can tr ansform ordered discrete variables into continuous variables, which serves as endogeno us variables in SEM instead of original ordinal variables. SEM has been widely applied in the cont ext of travel behavior analysis. The literature is briefly review ed as follows. Kitamura, et al. (1992) and Golob, et al (1994) are the first known application of SEM to joint activity durat ion and travel time data. Kitamura (1996) and Pas (1996) are two overviews that include discussions of the role of SEM in activity and timeuse modeling. Lu and Pas (1997) present an SEM of in home activities, outofhome activities (by type), and travel (measured various ways), conditional on socioeconomic variables. Estimation is by normal maximum likelihood, and the emphasis is on interpretation of the direct and indirect effects. The data ar e derived from the Greater Portland, Oregon metropolitan area. Golob and McNally (1997) present an SEM of the interaction of household heads in activity and travel demand, with data from Portland. Activities ar e divided into three types, and SEM results are compared using maximum likelihood (ML) and generalized least squares (GLS) estimation methods. They conclude that GLS methods should be used to estimate SEM when it is app lied to activity participation data. Fujii and Kitamura (2000) studied the latent demand effects of the opening of new freeways. The authors used an SEM to determine the effects of commute duration and scheduling variables on after work discretiona ry activities and their trips. Data are collected from the OsakaKobe Region of Japan. 19 PAGE 31 Kuppam and Pendyala (2000) presented three SEMs estimated by GLS using data from Washington, DC. The models focuse d on relationships between: (1) activity duration and trip generation, (2) durations of inhome and outofhome activities, and (3) activity frequency and trip chain generation. Simma and Axhausen (2001) developed an SEM that captured relationships between male and female heads of household with regard to activity and travel demands. The dependent variables included car ownership, distances traveled by males and females, and male and female trips by two types of activities using data from the Upper Austria. Meka and Pendyala (2002) investigated th e interaction between two adults in one household in terms of their travel and activ ity time allocation by SEM based on Southeast Florida data. An interesting tradeoff within nonwork travel time and nonwork activity time between two adults was quantified and interaction of travel decisions between household members was verified by SEM. Golob (2003) pointed out that current lim itation is that SEM estimation methods will only support dichotomous and ordered poly chotomous categorical variables. This implies that a multinomial discrete choice variable must be represented in terms of a multivariate choice model by breaking it down into component dichotomous variables linked by free error covariances (Muthen, 1979). However, in that case, the discrete choice model is inconsistent with utility ma ximization theorem when it is embedded into the current SEM system. 20 PAGE 32 2.1.4 Review of Discrete Choice Model In travel demand model, unordered discrete dependent va riables, typified by mode choice, are usually modeled in discrete choice modeling framework. Mcfadden (1973) initially derived the multinomial logit m odel based on random utility and utility maximum theorem. This modeling methodology is briefly reviewed as follows: Given that each individual has a feasible choice set denoted by C n we define J n J to be the number of feasible choices. The probability that any alternative i in C n is chosen by decision maker n if and only if the random utility U in corresponding to alternative i is greatest among all U jn where j J n j i: )ij,Cj,UUPr( (i)Pn jn in n and ]ij,Cj),Umax(UPr[ (i)Pn jn in n Let where V inininV U in is a systematic component of random utility. In practice, V in is usually parameterized by a linear comb ination of explanatory variables as ( 0 + 1 x 1 + 2 x 2 + + n x n ). This linear specification is same as the specification in linear regression model. in is a random component of random utility, which takes account of unobserved factors that influence the random u tility value. In linear regression model, the random component is assumed to be nor mally distributed, whereas in multinomial logit model, the random components in are assumed to be i.i.d. standard gumbel distributed. The reason for sele cting this distribution in place of normal distribution is to derive a simple probability f unction for observations by taking advantage of properties of gumbel distribution: 1. Maximum among a number of independent gumbel random variables with identical scale parameter is still gumbel distributed. 21 PAGE 33 2. The difference between two independent gumbel random variables with identical scale parameter is logistically distributed. It can be shown that I 1j jn in n)Vexp( )Vexp( (i) P The parameters i in systematic components can be easily estimated by Maximum Li kelihood Estimation (MLE) method. One serious problem in multinomial log it model is its IIA (Independence of Irrelevant Alternatives) property. IIA property holds if for a specific individual the ratio of the choice probabilities of any two alternatives is entire ly unaffected by the systematic utilities of any other alternatives. IIA problem can be expressed in terms of the crosselasticity of logit probabilities. Multinomial logit model has uniform crosselasticities: the crosselasticities of all alternatives with respect to a change in an attribute affecting only the utility of alternative j are equal for all alternatives other than j. It is not reasonable in real cases. For deriving the probability function, it is critical to assume the random components are identically and independently distributed, in which IIA problem is rooted. If this assumption is violated, we cannot obtain such a simple probability function. In real cases, one is usually unable to specify all the explanatory variables into the systematic components because quite a few variables are unobservable or unquantifiable. Omitted influential variables will be absorbed into the random components. If two random components take account of common omitted variables, they will be correlated rather than being independent This situation is analogous to that in SUR model discussed in Section 2.1.2. Howe ver, the correlation among the random error terms in discrete choice model is more harm ful than the correlation in multiple linear regression models. In multiple linear regression models, even if the correlation is not 22 PAGE 34 accommodated, the estimators of coefficients fo r exogenous variables are still consistent. However, in multinomial logit model, th e ignored correlations or unequal variance among random error terms will lead to inconsistent estimators of coefficients because the probability function used for estimating coefficients is not correctly formulated under the incorrect assumption that random error terms are mutually independent and identical. Since unobserved factors influencing such disc rete choices are mutually correlated, they should be considered to have joint relationships according to the definition of this dissertation. Nested logit model is widely adopted in travel demand modeling arena, which can overcome IIA problem by accommodating the jo int relationships among discrete choices. It is assumed that there exists routine of sequential choice behavior, following which decision makers first select a choice combinati on (nest) that is composed of two or more alternatives with correl ated random error terms and then sele ct each alternative in the nest. The utility function for alternative i in one nest can be formulated as U i = V i + n + i where the random component of utility function consists of two parts n and i n is common random component appearing in all the utility functions in the nest and i is i.i.d distributed. With the presence of n the correlation among the a lternatives in one nest can be accommodated in the model. Assume i is standard gumbel distributed with scale parameter as 1 and n is distributed so that n + i is gumbel distributed with a positive scale parameter Since the variance of ( n + i ) must be greater than that of i and the variance of gumbel dist ribution is equal to 2 /6 2 must fall into the range from 0 to 1. It turns to be a standard to examine whether the selected alternatives belong to the same nest when nested logit model is applied. 23 PAGE 35 A disadvantage of nested logit model is that the correlation between two choices in different nest is completely ignored. There is no substitution pattern between two choices under two different nests, but it is sometimes not the case in reality. To accommodate such substitution pattern, cro ssnested logit model can be adopted by taking the formula belonging to GEV (Genera lized Extreme Value) family proposed by Mcfadden (1974), therefore the mathematical form of crossnested logit model is not unique. The application of crossnested logi t can be found, but not much widely, in the literature of travel beha vior analysis and travel demand model (Vovsha, 1995). A multinomial probit model can be formulated if the random error terms in utility functions are assumed to be multivariate normally distributed instead of gumbel (extreme value) distributed. Due to the desirable property of normal dist ribution, variance and covariance associated with utility functi ons can be accommodated in multivariate normally distributed random error terms. In the past decade, considerable advance has been made in the estimation technique of multinomial probit model. GHK simulator (2000) is developed for computing the likelihood value of multinomial probit model, particularly for the case where num ber of alternatives in choice set is greater than three. However, in the literature of travel de mand analysis, the application of multinomial probit model can be rarely found presumably due to its difficulty in computation and estimation. Bhat (1995) proposed heteroskedastic logit model which assumes the random error terms in utility functions are still independently gumbel distributed but with respect to unequal variance. Except for the accomm odation of correlations among random error terms, the assumption of unequa l variances is an alternative way to release IIA problem 24 PAGE 36 in multinomial logit model. The likelihood f unction of heteroskedastic logit model does not have closed form, thus a numerical method, such as the a dopted LaguerreGauss Quadrature method, is required to approximate loglikelihood functi on and to estimate the model parameters. Mixed logit model is a generalization of multinomial logit model. It involves the integration of multinomial logit formula over the distribution of ra ndom parameters. The typical probability function from mixed logit model is d)(f )x'exp( )x'exp( )(PI 1j qjj qii qi (2.1.14) where represent random parameters whose dens ity function can be represented by f(  ). is constant parameters associated to the de nsity function. In practice, only a subvector of is randomized and the rest of them stills remain constant. If 0 term (alternative specific constant) in utility function is randomized, the mixed logit model can allow a flexible random error structure for a comp rehensive accommodation of the substitution pattern among the alternatives. The probabi lity function (2.1.14) does not have a closed form, thus numerical method is required to ap proximate the probability values. Train and Mcfadden (2000) use Maximum Simulated Li kelihood Estimation (MSLE) Method to estimate a mixed logit model, where M onte Carlo method wi th a pseudorandom sequence is applied to approximate the like lihood function without closed form. Bhat (1996) initially adopte d Halton sequence, which is a quasirandom sequence more evenly covering the distributional domain, to approxi mate the likelihood func tion of mixed logit model and estimate the modeling parameters Halton sequence used in MSLE takes much less time than pseudorandom sequence does to reach the same level of estimation 25 PAGE 37 accuracy. With the advance of simulation technique in estimation, mixed logit model turns to be prevalent in the area of travel behavior modeling and considered as a new and promising generation of discrete choice model (Walker 2002). 2.1.5 Endogenous Variable in Discrete Choice Model Discrete choice model may also contain endogenous variables as the specified variables in utility function ar e correlated with ra ndom error terms. Analogous to linear regression model, the coefficient of e ndogenous variables cannot be consistently estimated in a conventional di screte choice model. Simila r to linear regression model, instrumental variables can be used for consistently estimating the coefficient of endogenous variables. There are two type s of endogenous variables: continuous vs. discrete (endogenous dummy variable), co rrespondingly, we may need two different approaches for these two types of endogenous variables. The joint model system for linear regression model can be adopted to overcome the endogenous problem in discrete choice models. As long as th e error correlations are accommodated in the model system, the coefficient of endogenous variable can be consistently estimated. Such modeling methodology is highly desired for causal analys is among discrete choi ces that entails the specification of endogenous variables into ut ility functions. This modeling methodology will be proposed in next section (Section 2.2). 26 PAGE 38 2.2 Modeling Methodology for Causal Analysis in Discrete Choices (Discrete Choice Modeling Methodology with Endogenous Variable) 2.2.1 Recursive Bivariate Probit Model (Causal Relationship Between Two Binary Choice Variables) The causal relationship between two binary choices can be modeled in a bivariate probit modeling framework. The modeling methodology will be presented in the context of causal analysis between trip chaining patt ern and mode choice. The terms trip chain refers to a sequence of trips that begins at home, involves visits one or more other places, and ends at home. Depending on the number of places visited within the tour or chain, the tour may be classified into two patterns: simple and complex. A tour or chain with a single stop or activity outside the home locati on is defined as a simple tour, whereas a tour or chain with more than one stop outside the home locat ion is defined as a complex tour. Depending on the usage of auto in the tour, the tour is classified into two modes: auto and nonaut o. If the tours complexity /simplicity and auto/nonauto mode choice are treated as two binary choices, the bivariate probit model can be formulated at tour level to simultane ously analyze their probabilities with accommodation of random error correlation. It is very important to allow the correlation between random error terms in the model system. Analogous to FIML estimation in linear regression model system, the coe fficient of endogenous variable can be consistently estimated in two binary probit m odels as long as the co rrelation between two error terms is accommodated. 27 PAGE 39 The general formulation is as follows: (2.2.1) qq q q qqq q M 'xT T 'zMwhere q is an index for observations of tour (q = 1, 2, Q); Mq is a latent variable representi ng the mode choice for tour q; Tq is a latent variable representing the complexity of tour q; Mq = 1 if Mq > 0, = 0 otherwise; i.e., Mq is a dummy variable indicating whet her tour q uses the auto mode; Tq = 1, if Tq > 0, = 0 otherwise; i.e., Tq is a dummy variable indicating whether tour q is complex; zq is a vector of explanatory variables for Mq *; xq is a vector of explanatory variables for Tq *; are two vectors of model coefficients a ssociated with the e xplanatory variables zq and xq, respectively; is a scalar coefficient for Tq to measure the impact of tours complexity on mode choice; is a scalar coefficient for Mq to measure the impact of mode choice on the choice of tour complexity; q and q are random error terms, which are standa rd bivariate normally distributed with zero means, unit variances, and correlation i.e. q, q ~ 2 (0, 0, 1, 1, ). Based on this normality assumption, one can derive the prob ability of each possible combination of binary choices for tour q: ] 'x, 'z[ )0T,0Mprob(q q2 q q (2.2.2) ] ), 'x(, 'z[ )] 'x([ )0T,1Mprob(q q2 q 1 q q (2.2.3) 28 PAGE 40 ] 'x), 'z([ )] 'z([ )1T,0Mprob(2 1 q q (2.2.4) )] 'x([ )] 'z([ 1)1T,1Mprob(q 1 q 1 q q (2.2.5) ] ), 'x(), 'z([ q q 2where 1[.] is the cumulative distribution function for standard univariate normal distribution. 2[.] is the cumulative distribution function for standard bivariate no rmal distribution. The sum of the probabilities for the four co m binations of two binary choices should be equal to one, i.e., 1)1T,1M(prob )1T,0M(prob)0T,1M(prob)0T,0M(probq q q q q q q q (2.2.6) Substituting equations (2.2.2) through (2.2.5) in to equation (2.2.6), it can be shown that (2.2.7) ] ), 'x(), 'z([ ] 'x, 'z[ q q 2 qq2 ] 'x), 'z([ ] ), 'x(, 'z[ q q 2 q q2This equation does not hold unless either or is equal to zero. This requirement, known as the logical consistency condition (M addala, 1983), will lead to two different recursive simultaneous modeling structures s uggesting two different causal relationships: 1. = 0, 0 (Mode Choice Tour Complexity) qq q q qq q M 'xT 'zM (2.2.8) 29 In this structure, mode choice is predeter m ined as per the first functional relationship. Then, the choice of mode is specified as a dummy variab le in the second functional relationship for tour complexity to directly measure the impact of mode choice on the complexity of the trip chain or tour. PAGE 41 2. 0, = 0 (Tour Complexity Mode Choice) qq q qqq q 'xT T 'zM (2.2.9) Conversely, one may consider th e alternative structure in which tour complexity is predetermined as per the second functional relationship. The complexity of the tour is specified as an explanatory va riable influencing mode choice as per the first functional relationship. Thus, the desirable feature of the bivariate probit model in which the coefficients of two endogenous dummy vari ables do not coexist in both functional relationships provides an appr opriate modeling framework to analyze the unidirectional causality between tour complexity and mode choice. To facilitate formulating likelihood functions, equations (2.2.2) through (2.2.5) can be rewritten in a format including onl y the cumulative distribution function of the standard bivariate normal distribution. ] 'x, 'z[ )0T,0Mprob(2 q q (2.2.10) ] ), 'x(, 'z[ )0T,1Mprob(2 q q (2.2.11) ] 'x), 'z([ )1T,0Mprob(2 q q (2.2.12) ] 'x, 'z[ )1T,1Mprob(2 q q (2.2.13) Equations (2.2.10) through (2.2.13) and the corresponding likelihood functions can be summarized by the following general formulati ons for the two different unidirectional causal structures (Greene, 2003): 1. = 0, 0 (Mode Choice Tour Complexity) ] ),M 'x( 'z [ probqqq qqqq2q (2.2.14) 30 PAGE 42 31 Q 1q qqq qqqq2 ),M 'x( 'z L (2.2.15) 2. 0, = 0 (Tour Complexity Mode Choice) ] 'x ,T 'z [ probqqqqqqq2q (2.2.16) Q 1q qqqqq qq2 'x ),T 'z( L (2.2.17) where and 1M2q q 1T2qq As the likelihood functions of the recu rsive bivariate probit model and the common bivariate probit model are virtually identical, parameter estimation can be accomplished using readily available software such as LIMDEP 8.0 (Greene, 2002). The endogenous nature of one of the de pendent variables in the simultaneous equation system can be ignored in formulating the likelihood function. Analogous to multiple linear regression model system for continuous variables, the exogenous variables in the utility f unction without endogenous dummy variable serves as instrumental variables for the endogenous dummy variable. This modeling methodology is also suitable for better estimating th e impact of endogenous dummy variable in a binary choice model as long as good inst rumental variables are available. This modeling method has been frequently adopted in economic literature. For example, Greene (1998) applied this modeli ng methodology to quantify the impact of the inclusion of a womens studies program on th e offering of gender economics courses in liberal arts colleges. Rhine et al. (2006) use this modeling methodology to estimating the influence of being unbanked (not having checking and/or saving account) on the probability of obtaining financial servi ces from currency exchanges. PAGE 43 2.2.2 Simultaneous Equations Model Using L ee Transformation (Causal Relationship between One Continuous Variable and One Multinomial Choice) In the context of travel behavior analysis, we often m eet the situation where there are two dependent variables: one is con tinuous and the other is multinomial unordered discrete in nature. For example, outofho me activity type choice and activity duration can be considered as this type of two depe ndent variables. Outofhome activity can be shopping, recreation, service (taki ng children to school, riding friends to airport etc.), thus the variable indicating ac tivity type choice is unordered di screte in nature. While the activity duration for each activity type can be treated as continuous dependent variable. People will jointly make decisions on activ ity type choice and activity duration but one usually cannot observe all the influential f actors regarding activity type choice and activity duration. Thus, a modeling methodology is required to accommodate this kind of joint relationship. Analogous to SUR model for continuous dependent variables, one may introduce the correlation be tween random errors into the joint model system. However, discrete choice is usually modele d in a logitbased m odeling framework, where the random error terms must be gumbel distributed. Unlike normal distribution, correlation cannot be accommodated between tw o gumbel distributi ons or between one gumbel distribution and one nor mal distribution. From the perspective of multivariate statistics, there are infinite number of possible joint distributions given two gumbel marginal distributions and a constant correlation between them or given one gumbel marginal distribution, one normal marginal distri bution and their constant correlation. In other words, one cannot derive a unique joint distribution fo r an identifiable likelihood function that allows the correlation between gumbel random error terms in logitbased 32 PAGE 44 discrete choice model and normal error terms in linear regression model for continuous dependent variables. As bivariate normal distribution can allow a constant correla tion between its two marginal univariate normal distributions, L ee (1983) proposes a transformation that converts gumbel error terms into normal error te rms so as to establish a bivariate normal distribution between discrete c hoices and continuous variable. Bhat (1998) applied this discretecontinuous modeling methodology based on Lee transformation to jointly model travelers' activitytype choice fo r participation, homestay dura tion before participation in an outofhome activity and outo fhome activity duration of part icipation. In this study, activitytype choice is modeled as unordered discrete variables using multinomial logit model, while homestay duration and outofhome activity duration are modeled as continuous variables in two loglinear regression models. In addition, Bhat (2001) jointly modeled commuters activity type choice, activity dur ation, and travel time deviation to the activity location relative to the direct tr avel time from work to home using the same modeling methodology. Pendyala and Bhat (2004) extended this modeling framework by specifying endogenous unordered discrete variables a nd endogenous continuous variables as explanatory variables in mutual model functi ons so as to quantify the causal relationship between them. If the model de veloped in Bhat (1998) is cons idered an extension of SUR model system by integrating unordered discre te variable and continuous variables, by analogy, the model with endogenous variables as explanatory variables in Pendyala and Bhat (2004) is exactly an extension of SE M model involving both continuous variables and unordered discrete variable within a causal modeling system. Analogous to SUR 33 PAGE 45 model, the joint estimation technique adopted in Bhat (1998) will improve the efficiency of parameter estimators but will not influence the consistent property of parameter estimators which can be obtained from either recursive or joint estimation approach (see Section 2.4.2 for an examination). However, similar to SEM model, the joint estimation technique is necessary for the model for consistently estimating the parameter for endogenous variables. The following is modeling formula and estimation method for discretecontinuous modeling methodology a dopted in Pendyala and Bhat (2005). Let i be an index for alternatives in disc rete choice set (i = 1, 2,, I) and let q be an index for observations (q = 1, 2,, Q). Consider the following equation system : qq q q qiqiqii qi D x a a z u (2.3.1) qi ~ i.i.d. Gumbel(0,1), q ~ N(0, 2). uqi is the indirect (latent) utility associated with the ith choice for the qth observation, Dq is a vector of dumm y variables of length I representing discrete choice, is a column vector of coefficients, i.e. representing the effects of different discrete choice on activity duration, qi is a standard extremevalue (Gumbel) distributed error term assumed to be independently and identically distributed (i.i.d.) across alternatives and observations, is a continuous variable and is its coefficient. The error term q is assumed to be i.i.d. normally distributed across observations with a mean of zero and variance of qa2. In Equation 1, the alternative i will be chosen (i.e., Dqi =1) if the utility of that alternative is the maximum of I alternatives. Defining umaxvqi qj ij ,I,,2,1j qi (2.3.2) 34 PAGE 46 the utility maximizing condition for the choice of the i th alternative may be written as: Dqi if and only if i z qi > v qi Let F i (v qi ) represent the marginal distribution function of vqi implied by the assumed IID extreme va lue distribution for the error terms qi (i=1,2,,I). Using the properties that the maximum over id entically distributed extreme value random terms is extreme value distributed and the difference of two identically distributed extreme values terms is logistically di stributed, the implied distribution for vqi may be derived as: ij qjj qi i)zexp()yexp( )yexp( )yvPr()y(F (2.3.3) Therefore, (2.3.4) )az(F)1DPr(qiqi ii qi )az(F1)0DPr(qiqi ii qi (2.3.5) Both F i (y) and 1 (y) (inverse of standard normal cu mulative distribution function) are monotone increasing functions, so ]}, )v([F )]az([FPr{ ] vazPr[ 1) DPr(qii 1 qiqi ii 1 qiqiqi i qi (2.3.6) Let then (2.3.7) ] )v([F vqii 1 qi },v)]az([FPr{ 1) DPr(* qi qiqi ii 1 qi It can be easily shown that v qi is standard normally distribu ted. One can introduce a new latent variable: qi qiqi ii 1 qiv)]az([F D (2.3.8) which is able to indicate binary response of D qi since Pr(D qi > 0) = Pr( > 0) = Pr(D qi qiqi ii 1v)]az([F qi = 1), (2.3.9) Pr(D qi < 0) = Pr( < 0) = Pr(D qi qiqi ii 1v)]az([F qi = 0). (2.3.10) 35 PAGE 47 Equation system (2.3.1) may now be rewritten as: qq q q qi q qi q qi qiqi ii 1 qi D x a 0 Dif 1 D0, Dif 0 D,v)]a z( [F D (2.3.11) A correlation i between the error terms vqi and q is allowed to accommodate common unobserved factors influencing the discrete ch oice and the continuous variable. Since aq is partially determined by q and vqi is correlated with q if i is unequal to zero, aq is correlated with random error term vqi in the first equation. Similarly, Dq is also correlated with random error term q in the second equation. The endogenous nature of dependent variables Dq and aq entails the fullinformation maximum likelihood method to jointly estimate their corresponding parameters and Limitedinformation maximum likelihood estimation (sequential estimation) does not provide consistent estimators for the coefficients of endogenous variables. In Equations (2.3.4), replacing aq with the second equati on of (2.3.1), one obtains: Pr(Dqi = 1) = (2.3.12) ) xz(Fqiiiq iqi iiSimilarly, it can be shown that Pr(Dqi = 0) = if D) xz(F1qijiq iqi iiqj = 1 (2.3.13) 1)0DPr()1DPr(qi qi, then i i = i j (2.3.14) Three possible restrictions may be imposed on the m odeling coefficients to satisfy Equation (2.3.14) known as logical consistency: 1. i 0 and i = j 0, which implies that the continuous variable appears in the right hand side of the equation for the discre te choice and a vector of dummy variables corresponding to the discrete choice also appe ar in the model for the continuous variable. 36 PAGE 48 However, the coefficients on the dummy variables must be mutually identical. The modeling specification constraint by this condition is prac tically meaningless, since discrete variables ought to have varied impact s on the continuous variable and thus have unequal coefficients. 2. i 0 and i = j = 0, which implies that the continuous variable appears in the utility function of the discrete choice variab le but the discrete c hoice variable does not appear in the model for the continuous variab le. This restriction will lead to a recursive structure for the endogenous variables, wher e the continuous variable is predetermined and then influences the discrete variable. 3. i = 0, in which case Equation (2.3.14) is always satisfied; then i and j can take any unequal values. This restriction will lead to the other recursive structure, where the discrete variable is predetermined and then influences the continuous variable. Accordingly, the condition of logical consistency only allows two alternative recurs ive structures. The first is the case where 0 and = 0: continuous variable discrete variable, where c ontinuous endogenous variable aq is predetermined from the linear model and appear in utility functions uqi as an explanatory variable for discrete variables. The fullinformation likelihood function for estimating parameters in this case is equal to: L = ,)(b) (l 1Q 1q I 1i D qiqqi (2.3.15) where (.) is the standard norma l density function, and lq and bqi are defined as follows: 1 l)az(F b xa l2 i qiqqiii 1 qi q q q (2.3.16) 37 PAGE 49 The second case is when = 0 and 0: discrete variable continuous variable, where the vector of discrete variable Dq is predetermined by the utility functions uqi and then serves as explanatory variables in th e linear model for continuous variable aq. The fullinformation likelihood function is the same as Equation (2.3.15), but here 2 i qiqiii 1 qi q q q q1 l)z(F b Dxa l (2.3.17) A statistical test is required to identify th e dom inant causal relationship between discrete variables and continuous va riable. A statistical test is pr oposed in Chapter 3 to select the causal model indicating the dominant cau sal relationship among the population. It is necessary to further discuss the u nderlying problem of discretecontinuous model system based on Lees transformati on. The modeling system is derived as: qq q q qi q qi q qi qiqi ii 1 qiDxa 0D if 1D ,0D if 0D ,v)]az([F D (2.3.18) ] )v([F vqii 1 qi (2.3.19) and (2.3.20) umaxvqi qj ij ,I,,2,1j qi The correlation i between vqi and q is caused by common unobservable variables in random error term qi and q but i is not equal to the correlation between the random error term i in utility function i and the random error term in linear regression model. i is a nonlinear function with respect to not only corr( i, ) but also corr(j, ), because by plugging Equation (2.3.20) into Equation (2.3.19). ] ) umax([F vqi qj ij ,I,,2,1j i 1 qi i does not represent the correlation between j and therefore i does not have a straightforward behavioral interp retation. Indeed, Schmertmann (1994) shows 38 PAGE 50 that the Lee model places substantial re strictions on the c ovariance between the continuous variable and discrete choice mode ls. Using a Monte Carlo study, he further found that the Lee model is sign ificantly biased when this a ssumption is violated. In the following section (Section 2.2.3.1), we propose an alternative modeling methodology, called mixed discretecontinuous model, whic h is able to directly accommodate the correlation between random error term in each utility function and random error term in continuous model without nonlinear transformation. 2.2.3 Mixed Simultaneous Equations Mode l Using Flexible Error Structure 2.2.3.1 Mixed Discretecontinuous Model (Causa l Relationship Between One Continuous Variable and One Multinomial Choice Variable) The gumbel random error term adopted in the utility function for discrete choice model does not allow the correl ation with the random error term in continuous model or in other utility functions fo r discrete choice. One alternative for accommodating such correlations between discrete choices and c ontinuous variable is to employ multinomial probit model for discrete choice, where the error terms are multivariate normally distributed instead of being gu mbel distributed. However, l ogitbased discrete choice is being applied much more widely than multinomial probit model due to its more applicability, thus logit model is persistently adopted for modeling discrete choice in this dissertation. Similar to nested logit model (see Section 2.1.4), one may assume that the random error term in utility functi ons consists of two independent random components: one represents a heterogeneity which is normally distributed and the other is still standard 39 PAGE 51 gumbel distributed as usual. Such modeling methodology for discrete choice model is called mixed logit model (see Section 2.1.4). If the variance of heterogeneity is unequal across the utility functions, one may have a heteroskedastic logit model which avoids the pitfall of IIA (Bhat, 1995). However, Bhat (1995) uses gumbeldistributed random error terms with unequal variance rath er than mixed normal and gumbel error terms in mixed heteroskedastic logit model. Meanwhile, one may assume that the random error term for continuous model consists of i random componen ts, all of which are normally distributed. The modeling system can be formulated as: q q q q qiqiiqiqii qikm D' x'a nf az' u (2.4.1) where qi ~ i.i.d. Gumbel(0,1). m q and n qi are multivariate normally distributed with zero expectations and unit variances. Correlations among n qi are zero and correlations between n qi and m q are i f i and k represent the standard deviation of normal random components in utility functions and linear re gression model. In this study, we emphasize the correlation between discrete choices and continuous variable but ignore the correlation among discrete choi ces. Under the multivariate normality assumption, one may rewrite q I 1j 2 j I 1j qjj q1)n(m (2.4.2) where q is a new random variable which is standard normally distributed and independent of n qi and qi Then the model system can be reformed as: 1knk D' x'a nf az' uq I 1j 2 j I 1j qjj q q q qiqiiqiqii qi (2.4.3) 40 PAGE 52 By replacing k j with g j and I 1j 2 j1 k with the mixed joint modeling system can be reduced to ng D' x'a nf az' uq I 1j qjj q q q qiqiiqiqii qi (2.4.4) where n qi ~ Normal(0,1) and q ~ Normal(0,1) and qi ~ i.i.d. Gumbel(0,1). It implies that one univariate normal heterogeneity simultaneously appearing in both latent utility function and continuous model with unequal st andard deviations performs as well as multivariate normal heterogeneities for consistently estimating the coefficient or of endogenous variables. Similar to Section 2.2.2, either or needs to be zero, which leads to two alternative causal structures: 1) = 0 and 0, discrete choice continuous variable and 2) 0 and = 0, continuous variable discrete choice. In this joint model system, the correla tion between latent utility function and random error term in continuous model can be calculated as ) g)( 6 (f gf )a,u(Corr2 I 1j 2 j 2 2 i ii q qi (2.4.5) As f i and g i approach positive or negative infinity, lim[Corr(u qi a q )] is equal to 1; meanwhile, as f i approaches positive (or negative) infinity and g i approaches negative (or positive) infinity, lim[Corr(u qi a q )] is equal to 1. Thus, theoretically speaking, this specification of heterogeneity can accommodate any degree of correlation between latent utility function and continuous model. And the correlation has a reasonable behavioral interpretation that positive or negative correlation can explicitly indicate the same or the opposite impact of unobserved or the unspeci fied common variable s on latent utility 41 PAGE 53 function and continuous dependent variable. On this aspect mixed discretecontinuous model is better than Lee model. In a ddition, the mixed discretecontinuous model specifies a heteroskedastic logit model for discrete choice, which can avoid the IIA problem in multinomial logit model. On the other side, it might be conjectured that the coefficient estimation for endogenous variables will be very sensitive to the covariance structures of random error terms. An appr opriate specification of random error terms is critical to accurately estimate the impact of endogenous va riable, which helps us better understand the underlying causal relati onship among peoples activ ity and travel behavior. Based on the derived joint model system, we need derive the probability function for each observation and use maximum likelihood estimation to estimate the parameters. Conditional on nqi, the probability of each observati on is equal to the product of probability for discrete choice observation and probability density for continuous observation, noted as: Prob(Dq, aqnqi) = (2.4.6) )nnn(F)nnn(LqI,,2q,1q I 1i D qI,,2q,1qiqi where I 1j qjiqjqjj qiiqiqii qI,,2q,1qi)nf azexp( ) nf azexp( )nnn(L and (2.4.7) F(nq1, nq2,, nqI) = 2 I 1i qii q q 2)nfx'a( 2 1 exp 2 1 (2.4.8) To obtain unconditional probability one needs to integrate nqi over its distributional domain and then has Prob(Dq,aq) = (2.4.9) qI q qnnn21)n(d...)n(d)n(d)na ,Prob(D ...qI q2 q1 qiqq 42 PAGE 54 Here, (.) represents cumulative distribution fu nction of standard normal distribution. The likelihood function can be formed as (2.4.10) Q 1q qq)a,Prob(D L Because the likelihood function does not ha ve a closed form, we need apply Maximum Simulated Likelihood Estimation Method (MSLE) to estimate the model parameters. The idea is to draw a set of random seeds from known distribution and input these random values into probability function to approximate the integral value. Bhat (2000) found quasirandom sequen ce: Halton sequence can bette r cover the distributional domain than conventional random sequence (called pseudorandom sequence). It was found that, in terms of one dimensional inte gral, with as few as 50 Halton draws, the error measures are smaller than those from 1000 pseudorandom draws and those from 75 Halton draws are much smaller than from 2000 pseudorandom draws. To save computational time, we employ Halton seque nce to generate random seeds that are uniformly distributed from 0 to 1 and use 1 (.) to convert these s eeds to be standard normally distributed. The generation of Halton draws is explicitly presented in Train (1999), thus the procedure is not repeated in this dissertati on. In Appendix A, the code written in Gauss programming language (Aptec h, 2005) for generating halton sequence is attached, which is same as the standard code for mixed logit model by Train (1996). Standard normal seeds in r th iteration, noted as n r are input into Prob(D q a q n qi ) to calculate P r After repeating this procedure R times and accumulating the P r value, one can approximate value of Prob(D q a q ) using Then the routine of maximum likelihood procedure can be followed to consis tently estimate the pa rameters including /RPR 1r r i , f i g i , or In this dissertation, R is select ed as 100. Gauss programming language 43 PAGE 55 (Aptech, 2005) is used to code the likelihood function and its firstorder derivative for the procedure of likelihood maximization (see Appendix B for details). As stated by Walker (2002), a small number of quasirandom draws will mask the underidentification issue and yield erroneous estimators. Therefore, we have to carefully specify the heterogeneity in the following mixed joint model system. ung D' x'a nf az' uq I 1i qii q q q qiqiiqiqii qi (2.4.11) Due to the slight difference between normal a nd gumbel distribution, standard deviation f i of normal heterogeneity can be identified according to the differences between each pair of latent utility functions. However, in continuous model, the random error term u q and heterogeneity n qi are both normally distributed wit hout any slight difference. The linear combination of normal random variable s is still of normality, whose expectation and variance is respectively e qual to the sum of expectations and the sum of squares of standard deviations regarding these normal random variables. Thus, estimation of g i depends on the identification of f i in latent utility function. Without identification of f i g i will be absorbed into q and turns to be unidentif iable. The reason why g i is identifiable is straightforward. The procedure that f i are identified through latent utility functions does not depend on the information from the continuous model. And continuous model itself can yield estimator for standard devi ation of random error term since dependent variable in continuous model is directly obs ervable. Finally, in joint model system, covariance between each pair of latent utility function and continuous model can provide additional information for estimating g i 44 PAGE 56 By examining the variancecovariance of utility differences, Walker (2002) established criteria for specifying a flexible error structure in mi xed logit model with respect to identif ication and normalization. She found that a mixed heteroskedastic logit model with M (M > 2) alternatives at most allows (M1) heterogeneities to be identifiable and the valid normalization is to impose zero on the smallest variance of heterogeneity. Practically, one may use a small number of quasirandom draws to estimate an unidentified mixed logit model and obtain prel iminary estimation results. Then a zero restriction needs to be imposed on the smallest vari ance among all the estimated variances of heterogeneity. In our mixed discretecontinuous mode l system, once one of f i is fixed at zero for identificati on and normalization, the corresponding g i turns to be unidentifiable. Thus, the corresponding g i needs to be fixed at zero as well. In the procedure of Maximum Simulate d Likelihood Estimation (MSLE), ttest can be obtained for the estimator of each single parameter based on estimator itself and its standard deviation from th e diagonal elements of the es timated covariancevariance matrix. However, in this study, modelers are more concerned about the significance of the product of f i and g i instead of single parameter f i and parameter g i since f i g i represents covariance between two random components, which indicates sign (+ or ) and magnitude of correlation. One may need to test the following null hypothesis (H 0 ) and alternative hypothesis (H 1 ): H 0 : Cov(u i a) = f i g i > 0 (positive covariance) ; H 1 : Cov(u i a) = f i g i 0 (negative covariance). As MSLE estimator and are essentially maximum likelihood estimator (MLE), they should be asymptotically norma lly distributed. The correlation between if ig if 45 PAGE 57 and can be calculated from the corresponding offdiagonal element in the estimated covariancevariance matrix. Thus, and should be bivariate normally distributed. One approach to calculate the probability to make typeI error, i. e. null hypothesis is rejected when it is correct, is to first derive the cumulative distribution function of and then directly to ca lculate the probability. However, it is rather challenging to derive a tractable cumulative distribution function for calculating the probability. In this dissertation, a simulation appr oach, called simulationbased h ypothesis test, is adopted to approximate the probability and to determine the significance level for the estimated error covariance. ig if ig if ig Since the expectation and variance of estimator of f i and g i and the correlation between them have been estimated in the procedure of MSLE, Monte Carlo method can be applied to generate a large number of tw o random seeds, which are bivariate normally distributed with estimated expectati on, variance and correlation. U 1 and U 2 are two sets of pseudorandom seeds which are independently and uniformly distributed between 0 and 1. Let x = 1 (U 1 ) and )(U 1x y2 12 then Let and ). (0,0,1,1, ~ y)(x,2 )f E()xf std(fi i )g E()yg std(gi i then (f, g) ~ ] ),g (std),f (std),g E(),f [E( i 2 i 2 ii2 where, 1 ( ) is the inverse of cumulative distri bution function of standard normal distribution; is the estimated correlation between and calculated from the corresponding offdiagonal element in the estimated covariancevariance matrix; if ig 2 is 46 PAGE 58 probability density function of bivariate normal distribution; E( ) is expectation of random variable; std( ) is standa rd deviation of random variable. One may calculate the product of each pa ir of f and g and then count the frequency of positive product, denoted as N + The probability to ma ke type I error, i.e. significance level, can be approximated by (1 N + /N), where N is the total number of random seeds. Similarly, if is initially negative null hypothesis that )f E(i )g E(i Cov(u i a) is negative needs to be tested. One may approximate the significance level by (1 N /N), where N represents the count of negative product from each pair of f and g. In this dissertation study, we use 5,000,000 pseudo random seeds (i.e. N = 5,000,000) for accurately estimating the signifi cance level of error covariance estimator represented by if ig 2.2.3.2 Mixed Binarymultinomial Choice Model (Causal Relationship Between One Binary Choice Variable and On e Multinomial Choice Variable) We have presented the modeling methodol ogy for the causal relationship between two binary choices in Section 2.2.1. Binary choices can be modeled by binary probit model, in which the random error term of late nt utility function is normally distributed. The correlation between the random error terms in two latent utility functions can be easily accommodated under the assumption that two random error te rms are bivariate normally distributed. However, in travel behavior analysis, the choices are usually multinomial in nature. Travel mode choice in urban area is a typical example, in which people need choose the most a ppropriate travel mode from origin to destination among all the available alternatives possibly including au to, transit, bicycle or walk. In this case, 47 PAGE 59 recursive bivariate probit model cannot be used to model the causal relationship among discrete variables indicating multinomial unordered choices. In this section, a mixed binarymultinomial choice model will be proposed to allow causal analysis among multinomial unordered choices. Similar to Equation (2.4.2), one may have the following model system for causal modeling analysis betwee n two discrete choices: (g ung D' x'v nf Az' uq I 1i qii q q q qiqiiqiqii qi i needs to be fixed at 1) (2.4.12) q is the index of observations, i represents the index of alternatives in choice set C I consisting of I alternatives. u qi is the latent utility associated with the i th choice in a choice set C I consisting of I alternatives, v q is the latent utility a ssociated with a binary choice in the other choice set C K consisting of two alternatives, where u q is an idiosyncratic random error and i.i.d. standard normally distributed. n qj represents the heterogeneity in each utility function u qi and v q D q is a vector of dummy variable indicating the multinomial choices and A q is a dummy variable indicating the binary choices. Similar to the situation in mixed discretecontinuous model, it is unnecessary and unidentifiable to specify bivariate normally distributed heterogeneity into the model system. Instead, common univariate normally di stributed heterogenei ties are sufficient to accommodate the correlation between each pair of latent utility function for multinomial choices and latent utility f unction for binary choice. In the mixed discretecontinuous model, g i is specified to allow unequal standard deviations of heterogeneity. However, in the current binarymultinomial choice model, normal heterogeneity is assumed to have identic al standard deviation, which needs to be 48 PAGE 60 fixed at one. In the preliminary study, g i is specified into the joint model system. Unfortunately, we never reach the convergen ce in the procedure of maximum likelihood estimation in a real dataset with the involvement of g i In the estimation procedure, the phenomenon is that g i values turn to be ridiculously great and the procedure never converges even after 1000 iterati ons. A plausible explanation is that the second model in the joint model system is basically a binary probit model, in which the dependent variable is an unobservable latent variable, rather than the observable continuous dependent variable in linear regression model. In bi nary probit model, the standard deviation of random error term is not identifiable. Walker (2002) found that the standard deviation of heterogeneity in a binary mixed heterosk edastic logit model is unidentifiable. Analogously, the standard deviation of heter ogeneity in a binary mixed probit model is not identifiable, either. One alternative to deal with this problem is adopted in Eluru and Bhat (2005), where the seat belt usage and accid ent severity are modeled in a joint model which consists of a binary logit model for se at belt usage and an ordered logit model for accident severity. In that work, the common he terogeneity in both latent utility functions is assumed to be normally distributed with identical standard deviation ( ) but with alternative sign (+/) in front of in the random component of the model. In the joint modeling system consisting of binary logit model and ordered logit model, can be identified in both utility functions. Howeve r, the positive or negative correlation between the random error terms cannot be naturally acco mmodated into this specification. The investigators need empirically test the models with both + and signs to justify whether positive or negative correlation is mo re appropriate according to goodnessoffit measures. Due to the symmetric property of normal distribution, one only needs to try 49 PAGE 61 twice to obtain an appropriate estimator for in mixed binaryordered model by comparing the fitness of two m odels. (The combinations: + /and + /+ are the same as the combinations: /+ and /because normal distri bution is symmetric.) However, it will be very cumbersome to explore the possible sign combination if two normal heterogeneities with equal standard de viations are specifie d into mixed binarymultinomial choice model for accommodating the co rrelation between each pair of utility function for multinomial choices and that for binary choice. Suppose there are 4 alternatives for multinomial choices, one has to try 2 4 = 16 times for all the possible combinations and select the best fitness among these different specifications. This approach is inconvenient for practice, thus th e specification in Equa tion (2.4.12) is first adopted in this study. If all the g i is fixed at 1, the sign of co rrelation will be attributable to the sign of f i in the latent utility function for multinomial choices. The correlation can be expressed as )1 I)( 6 (f f )v,u(Corr2 2 i i q qi (2.4.13) As f i approaches positive or negative infinity, lim[Corr(u qi a q )] is equal to 1 I 1 or 1 I 1 If I = 4, 0.447 < Corr(u qi a q ) < 0.447. In other words, the error structure in specification (2.4.12) cannot allow accommoda te the correlation greater than 0.447 or less than 0.447. This is a di sadvantage of specification (2.4. 12) but it aids in identifying the sign of correlation between each pair of utility function for multinomial choices and that for binary choice. According to sign of correlation estimated from specification (2.4.12), we specify (2.4.14), in which the standard deviati ons of common normal heterogeneity are identical. 50 PAGE 62 unf D x 'v nf A z' uq I 1j qjj q q q qiqiiqiqii qi, (2.4.14) where + or  sign in front of fi is imposed if Corr(uqi *, vq *) is estimated to be positive or negative in the first step. In the current specification, 1) f)( 6 (f f )v,Corr(uI 1j 2 j 2 2 i 2 i q qi (2.4.15) As fi approaches positive or negative infinity, lim[Corr(uqi *, vq *)] is equal to 1. Plus the imposed sign for fi, specification (2.4.14) can theore tically accommodate any degree of correlation which ranges from 1 to 1. It is believed that such kind of specification will yield more accurate estimation for the im pact of endogenous variables. Similarly, either or needs to be zero, which lead s to two alternative causal structures: 1) = 0 and 0, multinomial choices binary choice and 2) 0 and = 0, binary choice multinomial choices. The probability function for each obse rvation an d likelihood function can be formulated similar to the procedure in Section 2.2.3. Conditional on the nqi, the probability of each observation is equal to the product of probability for multinomial discrete choice observation and probabil ity for binary choice, noted as: Prob(Dq, Aqnqkj) = (2.4.16) )nnn(F)nnn(LqI,,2q,1qq I 1i D qI,,2q,1qiqi where I 1j qjiqjj qiiqii qI,,2q,1qi)nf x'exp( ) nf x'exp( )nnn(L and (2.4.17) 51 PAGE 63 ])nf D' x'([])nf D' x'[()nnn(FI 1i qii q q A1 q I 1i qii q q A q qI,,2q,1qqq q (2.4.18) To obtain unconditional probability, one needs to integrate nqi over their distributional domains and then have Prob(Dq, Aq) = qI 2 q1 qnnn qI 2q 1q qiqq)n(d)...n(d)n(d)nA ,Prob(D ... (2.4.19) (.) represents the cumulative distribution f unction of standard normal distribution. The likelihood function can be form ulated as (2.4.20) Q 1q qq)A,D(obPr LBecause the likelihood function does not have a closed form, we still apply Maximum Simulated Likelihood Estimation Method (MSLE) to estimate the model parameters. Halton sequence is still a dopted for generating quasirandom sequence which is uniformly distributed from 0 to 1. Then these random seed s are converted to be standard normally distributed using function 1( ) (inverse of CDF of standard normality). Standard normal seeds in rth iteration, noted as nr, are input into Prob(Dq, Aq nqi) to calculate Pr. After repeating this procedure R times and accumulating the Pr value, Prob(Dq, Aq) can be approximated as As mentioned before, R is selected as 100. Then the routine of ma ximum likelihood procedure can be followed to estimate the parameters including /RPR 1r ri, fi, or Gauss programming language (Aptech, 2005) is used to code the likelihood func tion and its firstorder derivative for maximization (see Appendix D for details). 52 PAGE 64 2.3 Nonnested Test for Choosi ng Alternative Causal Structure 2.3.1 Cox Test for Separate Families of Hypothesis A strict statistical test is required for comparing a nd selecting the models under alternative causal structures in favor of id entifying dominant causa l relationship within travel behavior among population. The causal models under alternativ e causal structures actually belong to nonnested structure, therefor e the classical statistical tests, such as likelihood ratio test for nested structure, canno t be applied for this purpose. Two models are in nested structure if and only if one model can be reduced to the other model by imposing restrictions on the parameters. C ox (1961, 1962) initially proposed a statistical test for comparing the models of separate families of hypothesis. Horowitz (1982) simplified this test in the context of discrete choice model by deriving the test into a more compact and more applicable form for comp aring nonnested discrete choice models. BenAkiva and Swait (1986) converted Horowitz test into a form represented by Akaike Information Criterion (AIC) and collected it into the book (BenA kiva and Lerman, 1985). Pendyala and Bhat (2004) drew the conclu sion on the basis of this nonnested test. However, after carefully reviewing the original pa per (Horowitz, 1982); we consider it is inappropriate to directly apply this test to the nonnested discretec ontinuous model. An appropriate test is requir ed for a rigorous comparison between nonnested discretecontinuous models. 2.3.2 Nonnested Test in Discrete Choice Model It is necessary to review the original paper that proposed nonnested test for discrete choice model by Horowitz (1983). In the original paper, the following goodness53 PAGE 65 offit measures are adopted in stead of standard adjusted 2 gg 2 gL 2/KL 1 (2.5.1) ff 2 fL 2/KL 1 (2.5.2) L g and L f are loglikelihood function value for model g and model f, both of which belong to nonnested stru ctures, respectively; K g and K f are number of estimated parameters in model g and model f, respectively; L is loglikelihood function value of the mode l without any explanat ory variables or any parameters (L must be negative since proba bility ranges from 0 to 1). Then fg fg 2 f 2 gL 2/)KK()LL( (2.5.3) 2/)KK()(L)LL(fg 2 f 2 g fg (2.5.4) According to separate family of hypothesis test (Cox, 1961), )u, N2 KK 2 Nu (Normal~ N )LL(fg fg (2.5.5) u is variance, which is always positive. Thus, )1,0(Normal~u/ N2 )KK( 2 Nu N )LL(fg fg (2.5.6) Then by plugging Equation (2.5.4 ) into Equation (2.5.6), )1,0(Normal~u/ N2 )KK( 2 Nu N 2/)KK()(Lfg fg 2 f 2 g (2.5.7) )1,0(Normal~ 2 Nu Nu )(L2 f 2 g (2.5.8) 54 PAGE 66 Therefore, )] 2 Nu Nu zL ([] 2 Nu Nu zL []z Pr[* 2 f 2 g (2.5.9) suppose z is positive. Since and 0zL* 0Nu ]zL2[]z Pr[* 2 g 2 f (2.5.10) as 2 Nu Nu zL* i.e. according to the property of inequality. In this procedure, the term u, which is intractable in empirical work, has been eliminated from the equation. N/zL2u* Without any explanatory variab les or any parameters in discrete choice model, the loglikelihood function value where J represents the number of alternatives in the choice set. Then by plugging it into Equation (2.5.10), one may obtain that ) J/1ln(NL* ])Jln(Nz2[]z Pr[2 g 2 f (2.5.11) Since 2 f and 2 g are not standard output of statistical or econometric software, it is inconvenient to directly appl y Horowitz test. BenAkiva and Swait (1986) replaced 2 f and 2 g with standard adjusted likelihood ra tio indices by slightly adjusting the Horowitz test as follows: 0z},)]KK()Jln(Nz2[ {)z Pr(2/1 12 2 1 2 2 (2.5.12) where L(0) K) L( 11 1 2 1 (2.5.13) L(0) K) L( 12 2 2 2 (2.5.14) 55 PAGE 67 2 1 : Adjusted likelihood ratio index for model g; 2 2 : Adjusted likelihood ratio index for model f; L( 1 ) : Loglikelihood value at convergence in model g; L( 2 ) : Loglikelihood value at convergence in model f; L(0) : Loglikelihood value at zero [= N ln(1/J)]; K 2 and K 1 : the number of parameters in model g and model f. The probability that the adju sted likelihood ratio index of model f is greater by some z > 0 than that of model 1, given that model g is the true model, is asymptotically bounded by the righthand side of equation (2.5.12) above. If the model with the greater 2 is selected, then this bo unds the probability of errone ously choosing the incorrect model over the true specification. With th is test, joint discrete choice models under alternative causal structures can be compared against one another. 2.3.3 Extension to Discretecontinuous Model System Through the mathematical deriva tion, we realize that nonnested test is originated from Cox separate family of hypothesis test without any additional assumptions. Cox test can be applied not only for discrete choice model, but also for any models estimated by maximum likelihood method. Discretecon tinuous model adopted in this dissertation is not an exception. Suppose we have a linear regression model as y = 0 + x + u. A basic model with minimum number of pa rameters is required to provide L value in Equation (2.5.10). Unlike discrete choice m odel, the linear regression model at least needs to contain two parameters: constant 0 and standard deviation of normal error 56 PAGE 68 term. Then one may have y i = 0 + n, n ~ Normal(0,1). For linear regression model, it is easy to show that MLE estimators are ex actly equal to OLS estimators, therefore y N y N 1i i 0 (2.5.15) 1N )yy( N 1i 2 i (2.5.16) Under normality assumption on the random erro r term, the probabil ity density and logprobability density function for each observation i can be expressed as ] 2 )y(y exp[ 2 1 f2 2 i i (2.5.17) and ) 2 ln(] 2 )y(y [)ln(f2 2 i i (2.5.18) By replacing the parameters with OLS/MLE estimators and summing up logprobability density value over the sample, one may obtain L (continuous observations) = ) 2ln(N])yy([ 2 1 )] 2ln( 2 )yy( [)fln(N 1i 2 i 2 N 1i 2 2 i N 1i i ) 2ln(N 2 1N (2.5.19) The loglikelihood function value for naive di screte choice model is same as before: L (discrete observations) = N ln(1/J) ; (2.5.20) L (total) = L (continuous observations) + L (discrete observations) )J 2ln(N 2 1N (2.5.21) 57 PAGE 69 By plugging L (total) into (2.5.10), we obtain })]J 2ln(N21N[z{]z Pr[2 g 2 f (2.5.22) By replacing 2 f and 2 g with standard adjusted like lihood ratio indices, we have }KK)]J 2ln(N21N[z{ ]z Pr[12 2 1 2 2 (2.5.23) where )1N/()yy( N 1i 2 i N = sample size; J = number of alternatives in discrete choice set; y i = i th observation on continuous dependent variable; y = sample mean of y i L(0) K) L( 11 1 2 1 ; (2.5.24) L(0) K) L( 12 2 2 2 ; (2.5.25) 2 1 : Adjusted likelihood ratio index for model g; 2 2 : Adjusted likelihood ratio index for model f; L( 1 ) : Loglikelihood value at c onvergence in model g; L( 2 ) : Loglikelihood value at convergence in model f; L(0) : Loglikelihood value at zero (No parameters for di screte choice model and two parameters: 0 and for linear regression model); K 2 and K 1 : the number of parameters in model g and model f. 58 PAGE 70 The probability that the adju sted likelihood ratio index of model f is greater by some z > 0 than that of model g, given that model g is the true model, is asymptotically bounded by the righthand side of equation (2.5.23) above. If the model with the greater 2 is selected, then this bo unds the probability of errone ously choosing the incorrect model over the true specification. With this procedure, discretec ontinuous models under alternative causal structures can be compared against one another. 2.4 Monte Carlo Study for Bivari ate Probit Model and Lee Model 2.4.1 Introduction In statistical and ec onometric literature, Monte Carlo studies are widely applied to illustrate the properties of estimators and to compare the estimators obtained from different estimation methods. A synthetic random dataset is generated based on pseudorandom sequences, given parameters and model formulations. Then the proposed estimation method is applied to estimate the pa rameters based on this synthetic dataset. One may compare estimators with the true values of parameters which are given in advance and examine the statistical properties of estimators from a large number of simulation experiments. In this dissertation, the consistency pr operty from the both recursive and joint estimation procedure and the efficiency propert y from the joint estimation procedure is of interest. There are two questions to be addressed. One is whether joint estimation of causal model can yield consistent estimato r for endogenous variable. The other is whether nonnested test is valid for comparing the competing causal structures. Monte Carlo studies will be conducted in the contex t of bivariate probit model to illustrate the 59 PAGE 71 consistency of joint estimation results and to validate the boundi ng probability given by the nonnested test for comparing recursiv e bivariate probit mode ls under alternative causal structures. The proposed mixed model is not selected for Monte Carlo studies due to its great time consumption in estimation (One successful estimation of a mixed model using 100 Halton random seeds takes 3 ~ 4 hour s on a personal computer with 3.0GHz Pentium IV CPU). In addition, the estimators from Lees discretecontinuous model are examined based on synthetic dataset whose covariance structure of random error terms are not consistent with Lee models assumption. It is found that, except the coefficients for exogenous variables, all the other parameters are seriously biased when the assumption on covariance structure of rando m error terms is violated. 2.4.2 Monte Carlo Studies for Bivariate Probit Model This section intends to compare the performance of estimation results from recursive estimation and joint estimation of bi variate probit model. Assume the bivariate probit model to take the following formula: (2.6.1) qq21 q qq21 qx T z M There is one constant and onl y one explanatory variable in either utility function. Let the parameter 1 = 0.1, 2 = 0.2, 1 = 0.3 and 2 = 0.15; the explanatory variables z q and x q are uniformly distributed as R(0,3); the random error terms q and q are standard bivariate normally distributed with zero means, unit variances, and correlation i.e. q q ~ 2 (0, 0, 1, 1, ). Let = 0.4 and 60 PAGE 72 0.8, respectively, for an examination of the effect on estimators from the error correlations in various magnitude. The following procedure is employed to ge nerate bivariate normal random seeds. Generate t1 and t2 which are independen tly and uniformly distributed as R(0,1); Let q = 1 (t1) and u q = 1 (t2), then q and u q are both standard univariate normally distributed as N(0, 1) and mutually independent. If let q = q + 2 1 u q it is easy to show that q and q are standard bivariate normally distributed with zero means, unit variances, and correlation as 2 (0, 0, 1, 1, ). After , z, x, and are determined, latent variables M q and T q can be directly calculated. Let M q = 1 if M q > 0; = 0 otherwise and T q =1 if T q > 0; = 0 otherwise. A dataset has been si mulated including four variables: M q T q z q and x q Both recursive and joint estimation method have been applied to estimate the parameters which have been determined in advance, thereby offering an opportunity to directly compare the true para meters and parameter estimators. If running this procedure for thousands of times, we may explicitly il lustrate the statistical property of the parameter estimators. In this study, sample size is selected as 1000 and this procedure is repeated for 1000 times. The statistics of es timators are listed in Table 2.1. The upper block in the table offers st atistics of estimators as = 0.4, within which the lefthandside block shows the results from the recursive estimation, i.e. the parameters being estimated as two recursive binary probit models, whereas the righthandside block provides those from joint estimation proce ss using fullinformation likelihood method. The lower block is provided for the estimators as = 0.8. Similarly, the lefthandside block and the righthandside block in lower position show the statistics from recursive 61 PAGE 73 and joint estimation, respectively, for the situation with higher error correlation. The sample size of the estimators from joint estim ation under higher correlation is 949, while the other three sample sizes are all 1000. That is because the likelihood maximization procedure fails to converge for 51 times (5%) among 1000 repetitions, bu t it is unlikely to influence the statistical distribution of estim ators if such 51 estimators are excluded from the analysis. In the table, Min, Max, Mean and SD columns represent minimum value, maximum value, mean value and standard deviation for the sample of estimators. Tr. Par indicates true value of parameters gi ven in advance. ESD( .) rows indicate the estimated standard deviation of estimator s from maximum likelihood procedure. For convenience, standard deviation is obtained from the outer product of estimated firstorder derivative vectors with respect to the parameters in stead of from estimated Hessian matrix, since in some cases, Hessian matrix is not invertible, but outer product of firstorder derivative can be applie d to approximate the Hessian matrix at convergence. Let R1 = (Mean Tr.Par)/Tr.Par, which is used to measure relative bias in parameter estimators. Let R2 = [Mean(ESD) SD)]/SD, which is used to measure relative bias in the estimate for estimators standard deviation. Let R3 = [SD r SD j )]/SD j which is used to measure relative difference between in the standard deviation of estimators from recursive estimation and joint estimation. As = 0.4, R1 values for al l the parameters in both recursive estimation and joint estimation are less than 5%, which virtually indicates the consistency property of estimators through both estimation methods. In addition, there are no absolute values in 62 PAGE 74 63 R2 and R3 being greater than 5% on both side s, which implies that the joint estimation procedure does not substantially improve the efficiency of estimators compared with recursive estimation procedure wh en the magnitude of correlation is as low as 0.4. As = 0.8, R1 values still suggest the consistency property of estimators on both sides. However, R2 values take 6.2% and 9.3% for recursive estimators associated with two constants in the model. The asymptotical estimators of standard deviations do not fit the observed standard deviations very well. Instead, there is considerable bias within the estimator. The R3 values for all the recurs ive estimators are greater than 14%, thereby indicating that the joint estimation substantially improves efficiency of the parameter estimators when the magnitude of correlation is as high as 0.8. Monte Carlo studies explicitly illustrate the bivariate probit model characterized by the accommodation of the error correlation The greater absolute value the correlation takes, the more efficiency for estimators can be obtained from the bivariate probit model relative to recursive binary probit models. However, both estimation methods will yield consistent estimators on the model parameters, as illustrated by Monte Carlo studies. 2.4.3 Monte Carlo Studies for Recursive Bivariate Probit Model If the Recursive Bivariate Probit Model can indicate two bina ry choices following a sequential manner, let two dummy variables M q and T q indicate such two binary choices PAGE 75 Table 2.1 Statistics of Estimat ors for Bivariate Probit Model = 0.4 Recursive Estimation (Sample Size = 1000) Joint Estim ation (Sample Size = 1000) estimator Min. Max. Mean SD Tr. Par R1 R2 R 3 Min. Max. Mean SD Tr. Par R1 R2 1 0.113 0.359 0.097 0.082 0.100 0 .029 0.000 0.023 0.128 0.341 0. 098 0.080 0.100 0.024 0.010 2 0.078 0.339 0.202 0.047 0.200 0.012 0.022 0.031 0.078 0.353 0.202 0.045 0.200 0.011 0.020 1 0.581 0.078 0.302 0.078 0.300 0.006 0.034 0. 023 0.542 0.082 0.302 0.076 0.300 0.006 0.024 2 0.009 0.313 0.151 0.045 0.150 0.004 0.018 0.031 0.025 0.310 0.151 0.044 0.150 0.004 0.016 0.541 0.242 0.399 0.044 0.400 0.002 0.036 ESD( 1 ) 0.077 0.086 0.082 0.002 0.073 0.084 0.079 0.002 ESD( 2 ) 0.045 0.051 0.048 0.001 0.043 0.050 0.046 0.001 ESD( 1 ) 0.077 0.086 0.081 0.001 0.074 0.084 0.078 0.002 ESD( 2 ) 0.044 0.049 0.046 0.001 0.042 0.048 0.045 0.001 ESD( ) 0.041 0.050 0.046 0.001 = 0.8 Recursive Estimation (Sample Size = 1000) Joint Estim ation (Sample Size = 949) estimator Min. Max. Mean SD Tr. Par R1 R2 R 3 Min. Max. Mean SD Tr. Par R1 R2 1 0.199 0.353 0.100 0.081 0.100 0.005 0.062 0. 155 0.082 0.295 0.102 0.070 0.100 0.024 0.007 2 0.063 0.373 0.201 0.048 0.200 0.007 0.014 0.205 0.080 0.306 0.201 0.040 0.200 0.004 0.011 1 0.563 0.051 0.302 0.078 0.300 0.006 0.093 0. 148 0.560 0.097 0.303 0.068 0.300 0.011 0.035 2 0.028 0.307 0.150 0.046 0.150 0.001 0.006 0.202 0.016 0.253 0.150 0.038 0.150 0.003 0.007 0.868 0.716 0.798 0.025 0.800 0.002 0.025 ESD( 1 ) 0.081 0.094 0.086 0.002 0.065 0.077 0.070 0.002 ESD( 2 ) 0.045 0.052 0.048 0.001 0.036 0.044 0.040 0.001 ESD( 1 ) 0.080 0.091 0.085 0.002 0.064 0.077 0.070 0.002 ESD( 2 ) 0.044 0.049 0.046 0.001 0.035 0.042 0.038 0.001 ESD( ) 0.020 0.032 0.026 0.002 64 PAGE 76 for person q, i.e. M q = 1 if M is selected by person q, M q = 0, otherwise; T q = 1 if T is selected by person q, T q = 0, otherwise. Person q first makes choice decision on M q then on T q The sequential manner contains two implications: 1. Choice decision on M is made before the choice decision on T; 2. Predetermined choice on M exerts im pact on the choice decision on T. Assume there exists a late nt continuous variable M q indicating the dummy variable M q The choice decision M q = 1 if M q > 0, M q = 0, otherwise. Let M q = 1 + 2 z q + q where parameter 1 = 0.1 and parameter 2 = 0.2; the explanatory variable z q is a uniformly distributed random variable as R(0,3). Let q = 1 (t1), where t1~ R(0,1), thus q ~ N(0,1). After M q is determined, person q starts to make decision on T q conditional on M q according to the other latent continuous variable T q T q = 1 + 2 x q + M q + q where parameter 1 = 0.3, parameter 2 = 0.15 and parameter = 0.9. The explanatory variable x q is a uniformly distributed random variable as R(0,3). q ~ N(0,1) and is correlated with q because q and q contain common unobserved variab les. Let the correlation = 0.4 and q = q + 2 1 1 (t2), t2 ~ R(0,1), then q q ~ 2 (0, 0, 1, 1, ). Now T q can be calculated to determine the choice decision of T q : T q = 1 if T q > 0; T q = 1, otherwise. Finally, one obtains a simulated dataset with four variables: M q T q z q and x q Two alternative recursive bivariate pr obit models as in Section 2.2.1 are both applied to estimate the parameters. Obviously, the cau sal structure (T M) is a wrong model specification for the simulated data set, whereas the ca usal structure (M T) is the correct one. In addition to the parameter estimators, adjusted 2 (adjusted likelihood ratio 65 PAGE 77 index) values are recorded from both causal structures in each estimation process for examining the performance of nonnested test The procedure of simulating the dataset and estimating the parameters is repeated 1000 times under various sample size (1000, 2000, 3000 and 5000 respectively) in the interest of finding an appropriate sample size for applying nonnested test. In the upper half of Table 2.2, the left part shows the estimation results for two alternative causal structures using recursiv e estimation procedure (the error correlation is restricted to be zero), where the sample size of synthetic random data (N1) is 1000. R1 = (Mean Tr. Par)/Tr. Par, which is still used to measure relative bias in parameter estimators. Ttest can be c onducted for comparing the mean value of estimators and true parameters according to MLEs (Maximum Likelihood Estimator) desirable property of asymptotical normality. N stands for the sample size of estimators statistics. For example, in joint estimation procedure, N = 930, which means that there are 930 estimations successfully reaching convergen ce among 1000 simulation experiments. The rest parts of Table 2.2 offers the similar stat istics for synthetic datasets with sample size as 2000, 3000 and 5000. In all the tables, it can be found that all the joint estimators are not rejected by the ttest but all the recursive estimator for endogenous variable a nd the constant in the same latent function is rejected to be consiste nt by ttest. It infers that joint estimation procedure is necessary for endoge nous variable instead of recu rsive estimation procedure. It is noticeable that the relative bias in recursive bivariate probit model is substantially greater than that of bivariate probit model. For example, the R1 value for joint estimator is as high as 13.9% when N1 = 1000. In bivariate probit model, as N1 66 PAGE 78 67 = 1000, most relative biases are less than 5%. As N1 increases from 1000 to 5000, absolute value of R1 for decreases from 0.139 to 0.029. The statistical results suggest that a large sample size of dataset be requi red for estimating the endogenous coefficient as accurately as those in bivariate probit m odel. The joint estimation results for wrong causal structure generally provi de inconsistent estimators on constant. As N1 = 3000 or 5000, the estimators on coefficients for exogenou s variables appear cons istent even if the casual structure is wrong. For the row of adjusted 0 2 R1 = ( 0 2 0 2 ) / 0 2 where 0 2 represents the adjusted likelihood ratio inde x at zero for the model under the true causal structure, whereas 0 2 represents that under th e wrong causal structure. Regardless of random datasets size (N1), the relative difference of adjusted likelihood ratio index between true model and wrong model is as little as 0.007 or 0.008. The statistical results indicate the seemingly slight difference in the goodnessoffit measures unde r alternative model structures is informative e nough to identify the model under th e true causal structure. Nonnested test has been in troduced to identify the true causal structure between two alternative ones. For examining the power of nonnested test, we applied this test to each simulation experiment. For each simulation experiment, let z = 2 2 1 2 where 2 2 is adjusted likelihood ratio index at zero for wrong model (noted as model 2) and 1 2 is that for true model (noted as model 1). If z < 0, model 1 performs seemingly better than model 2, thus we should establish null hypothesis that model 1 is true, then calculate the bounding probability (noted as BP) given by {[2 z L(0) + (K 2 K1)] 1/2 } (see Equation 2.5.12 in Section PAGE 79 0 5 10 15 20 25 30 35 40 45 9~88~77~66~55~44~33~22~11~00~11~22~33~44~5Unit: 10E3Percent (%) 68 Figure 2.1 Distribution of z (N = 930 and N1 = 1000) PAGE 80 69 Table 2.2 Statistics of Estimators for Recursive Bivariate Probit Model Joint Estimation N1 = 1000 and N = 930 N1 = 2000 and N = 940 Estimator Min. Max. Mean SD R1 ttest Min. Max. Mean SD R1 ttest 1 (0.1) 0.154 0.377 0.091 0.077 0.087 0.12 0.105 0.307 0.094 0.060 0.061 0.10 2 (0.2) 0.071 0.362 0.205 0.045 0.025 0. 11 0.061 0.347 0.203 0.036 0.016 0.08 1 (0.3) 0.964 0.946 0.225 0.362 0.251 0.21 0.880 0.855 0.261 0.288 0.129 0.14 2 (0.15) 0.010 0.329 0.147 0.048 0. 019 0.06 0.054 0.288 0.150 0.037 0.000 0.00 0.812 1.709 0.775 0.514 0.139 0.24 0.751 1.650 0.832 0.407 0.075 0.17 (0.4) 0.977 0.620 0.326 0.328 0.185 0.23 0.947 0.532 0.362 0.261 0.094 0.15 Adj. R 0 2 0.067 0.146 0.1064 0.012 0.53 0.079 0.140 0.1076 0.010 0.079 Recursive Estimation N1 = 1000 and N = 1000 N1 = 2000 and N = 1000 Estimator Min. Max. Mean SD R1 ttest Min. Max. Mean SD R1 ttest 1 (0.1) 0.156 0.403 0.097 0.079 0.025 0.04 0.105 0.341 0.098 0.062 0.018 0.03 2 (0.2) 0.050 0.363 0.201 0.047 0.005 0. 02 0.063 0.347 0.201 0.037 0.004 0.03 1 (0.3) 0.249 0.476 0.125 0.100 1.417 4.25 0.145 0.434 0.128 0.075 1.426 5.71 2 (0.15) 0.011 0.336 0.158 0.049 0. 050 0.16 0.054 0.288 0.159 0.038 0.062 0.24 0.002 0.575 0.271 0.087 0.699 7.23 0.029 0.477 0.266 0.070 0.704 9.06 Joint Estimation in wrong causal direction N1 = 1000 and N = 908 N1 = 2000 and N = 950 Estimator Min. Max. Mean SD R1 ttest Min. Max. Mean SD R1 ttest 1 (0.1) 1.192 1.030 0.034 0.499 0.658 0.13 1.128 0.989 0.075 0.395 0.247 0.06 2 (0.2) 0.035 0.342 0.184 0.047 0.080 1. 79 0.054 0.343 0.188 0.037 0.061 2.38 1 (0.3) 0.044 0.509 0.291 0.080 1.969 2.39 0.060 0.464 0.296 0.060 1.988 3.27 2 (0.15) 0.019 0.335 0.162 0.046 0.079 1.35 0.064 0.276 0.160 0.037 0.066 1.62 1.211 1.761 0.100 0.696 1 .231 1.647 0.043 0.556 0.10 (0.4) 0.952 0.994 0.095 0.426 1.237 0.01 0.909 0.938 0.128 0.340 1.319 0.08 Adj. R 0 2 0.067 0.145 0.1056 0.012 0.008 0.059 0.139 0.1068 0.010 0.007  PAGE 81 Table 2.2 (Continued) Joint Estimation N1 = 3000 and N = 950 N1 = 5000 and N = 939 Estimator Min. Max. Mean SD R1 ttest Min. Max. Mean SD R1 ttest 1 (0.1) 0.059 0.272 0.096 0.045 0.039 0.09 0.004 0.269 0.095 0.035 0.054 0.14 2 (0.2) 0.110 0.317 0.203 0.026 0.013 0. 12 0.121 0.263 0.203 0.021 0.017 0.14 1 (0.3) 0.862 0.523 0.266 0.226 0.112 0.15 0.751 0.327 0.287 0.169 0.045 0.08 2 (0.15) 0.060 0.228 0.148 0.027 0.015 0.07 0.091 0.219 0.151 0.021 0.006 0.05 0.314 1.602 0.851 0.316 0.055 0.16 0.055 1.503 0.874 0.238 0.029 0.11 (0.4) 0.910 0.393 0.373 0.200 0.066 0.14 0.825 0.158 0.386 0.153 0.035 0.09 Adj. R 0 2 0.087 0.137 0.1084 0.007 0.089 0.131 0.1086 0.006 Recursive Estimation N1 = 3000 and N = 1000 N1 = 5000 and N = 1000 Estimator Min. Max. Mean SD R1 ttest Min. Max. Mean SD R1 ttest 1 (0.1) 0.058 0.256 0.099 0.046 0.013 0.02 0.005 0.268 0.097 0.036 0.026 0.08 2 (0.2) 0.110 0.318 0.201 0.027 0.005 0. 04 0.122 0.263 0.202 0.022 0.009 0.09 1 (0.3) 0.046 0.304 0.133 0.057 1.445 7.60 0.029 0.272 0.127 0.044 1.423 9.70 2 (0.15) 0.062 0.234 0.156 0.028 0.040 0.21 0.095 0.229 0.159 0.021 0.057 0.43 0.065 0.428 0.265 0.051 0.706 12.45 0.158 0.373 0.267 0.039 0.703 16.23 Joint Estimation in wrong causal direction N1 = 3000 and N = 984 N1 = 5000 and N = 998 Estimator Min. Max. Mean SD R1 ttest Min. Max. Mean SD R1 ttest 1 (0.1) 1.089 0.932 0.077 0.342 0.232 0.07 0.876 0.786 0.088 0.267 0.118 0.04 2 (0.2) 0.104 0.303 0.190 0.028 0.050 0.3 6 0.107 0.255 0.193 0.022 0.036 0.32 1 (0.3) 0.171 0.443 0.304 0.046 2.012 13.13 0.164 0.418 0.299 0.036 1.996 16.64 2 (0.15) 0.075 0.233 0.155 0.027 0.033 0.19 0.095 0.230 0.157 0.021 0.048 0.33 1.135 1.618 0.043 0.478 1.017 1.370 0.025 0.376 (0.4) 0.913 0.859 0.127 0.293 1.318 1.80 0.665 0.775 0.140 0.229 1.349 2.36 Adj. R 0 2 0.087 0.136 0.1076 0.007 0.007 0.088 0.130 0.1078 0.006 0.007 70 PAGE 82 2.3.2). If BP < 0.05, model 1 is selected, ot herwise the test result should be recorded as being inconclusive. In other words, 0.05 significance level is tested. If z > 0, model 2 performs seemingly bette r than model 1, the null hypothesis that model 2 is true should be estab lished, then BP value can be ca lculated. Similarly, if BP < 0.05, model 2 is selected, otherwise it is recorded as being inconclusive. Figure 2.1 shows the distribu tion of z from the simula tion experiments as N1 = 1000. The distribution is seriously biased to ward the negative side, which is consistent with expectation that true model will better fit the data in most simulation experiments. Table 2.3 shows the statistical results fo r the application of nonnested test under various sample sizes. As N1 = 1000, among 847 effective experiments, where convergence is reached under both causal st ructures, 237 experiments offer better goodnessoffit from wrong model than from tr ue model. Among these 237 experiments, only 29 experiments obtain the BP value less than 0.05, where wrong model 2 is judged as true model. This is Type II error that wrong model is incorrectly accepted. In this study, the possibility of making Type II error can be estimated by 29/847 3.4% < 5%. (5% is the significance level that is selected for nonnested te st.) This result supports the validity of nonnested test in the application to identify the alternative causal structures. In addition, there are 265 ( 31.3%) experiments with conclu sively correct judgement and 553 (65.3%) experiments with inconclusive judgement. As N1 increases, more experiments can be conclusively identified. For example, if sample size of dataset for model estimation increases to 5000, 89.9% of th e experiments can be correctly identified by the nonnested test (as s hown in Table 2.3). Thus, the simulation study highly recommends a large sample size for estima ting recursive bivari ate probit model. 71 PAGE 83 2.4.4 Monte Carlo Studies for Lee Model In this section, Monte Carlo Study is conducted to examine the robustness of discretecontinuous simultaneou s equations model based on L ee transformation. As long as distributional function is know n and all the parameters with respect to this function are identified, maximum likelihood estimators are alwa ys consistent and efficient. However, in the real case, the distributional assumpti on can be easily violated. Robustness stands for the consistency property of estimators wh en the distributional a ssumption is violated. For example, linear regression model is cons idered robust because the consistency of OLS estimators does not depend on distributional assumption on its random error term. The procedure to generate the synthetic datasets is presented as follows. Table 2.3 Statistics for Nonnested Test Application N1 = 1000 Inconclusive Conclusive Total True Model is seemingly better 345 (40.7%) 265 (31.3%) 610 (72.0%) Wrong Model is seemingly better 208 (24.6%) 29 (3.4%) 237 (28.0%) Total 553 (65.3%) 294 (34.7%) 847 (100%) N2 = 2000 Inconclusive Conclusive Total True Model is seemingly better 287 (32.1%) 465 (52.0%) 752 (84.0%) Wrong Model is seemingly better 118 (13.2%) 25 (2.8%) 143 (16.0%) Total 405 (45.3%) 490 (54.8%) 895 (100%) N2 = 3000 Inconclusive Conclusive Total True Model is seemingly better 112 (12.0%) 746 (79.9%) 858 (91.9%) Wrong Model is seemingly better 48 (5.1%) 28 (3.0%) 76 (8.1%) Total 160 (17.1%) 774 (82.9%) 934 (100%) N2 = 5000 Inconclusive Conclusive Total True Model is seemingly better 67 (7.1%) 843 (89.9%) 910 (97.0%) Wrong Model is seemingly better 22 (2.3%) 6 (0.6%) 28 (3.0%) Total 89 (9.4%) 849 (90.5%) 938 (100%) 72 PAGE 84 1. Generation of Error Terms Suppose there are 3 alternatives (I = 3) in the discrete choice set. Let z 1 z 2 and z 3 be independently and uniformly distributed as R(0,1). Let 1 = ln[ln(z 1 )], 2 = ln[ln(z 2 )] and 3 = ln[ln(z 3 )]. Then, 1 2 and 3 are i.i.d. Gumbel(0, 1). To realize the correlation between norm al seeds and each Gumbel distribution i let 1 = 1 (z 1 ), 2 = 1 (z 2 ) and 3 = 1 (z 3 ), thus i ~ N(0,1) and corr( i i ) 1 (due to the nonlinear transformation of the same ra ndom seeds, the correlation may not exactly equal to 1; however, it is a constant a nd approximately equal to 0.97). Generate z 4 ~ R(0,1) which is independent of z i and let 4 = 1 (z 4 ). Then 4 ~ N(0,1) is independent of i (i = 1,2,3). Let = 2 4 2 3 2 2 2 1 44332211ffff ffff (2.6.2) where f i are arbitrary constant coefficients to control the correlation between and i It can be shown that ~ N(0, 1) and corr( i ) 2 4 2 3 2 2 2 1 iffff f (2.6.3) This process results in the generation of three i.i.d. standard Gumbel random seeds i one standard normal random seed and corr( i ) 2 4 2 3 2 2 2 1 iffff f = c i Note that c i is constant, which meets the requirement as noted in the preceding discussions. 73 PAGE 85 2. Generation of Parameters and Explanatory Variables Given the model equations (2.6.4) with no endogenous variable and model equations (2.6.5) with en dogenous continuous variable in utility function, q q10q qiqii1 i 0 qix a y u (2.6.4) q q10q qiqiqii1 i 0 qix a ay u (2.6.5) let f1 = 0.5; f2 = 0.4; f3 = 0.0; f4 = 0.5, then c1 0.60 and c2 0.48. Table 2.4 True Values of Parameters in the Model 01 = 0.15 02 = 0.25 03 = 0 0 = 1.0 11 = 0.1 12 = 0.3 13 = 0 1 = 1.5 1 = 0.2 2 = 0.2 = 2 Let explanatory variables y1 ~ R(0,3), y2 ~ R(0,2) and x ~ R(0,4), where R() represents a uniform distribution. 3. Generation of Dependent Variables Let a = 1.0 + 1.5 x + 2 Then calculate ui based on the model formulated as Equation (2.6.5). u1 = 0.15 0.1 y1 + 0.2 a + 1, u2 = 0.25 + 0.3 y2 0.2 a + 2, u3 = 3. 74 PAGE 86 For the model formulated by Equa tion (2.6.4), we only calculate u 1 = 0.15 0.1 y 1 + 1 u 2 = 0.25 + 0.3 y 2 + 2 u 3 = 3 Then, calculate the dummy variables D 1 D 2 and D 3 indicating the discrete choices as: D 1 = (u 1 u 2 and u 1 u 3 ); (If the conditions are satisfied, D 1 =1; D 1 = 0 otherwise); D 2 = (u 2 u 1 and u 2 u 3 ); (If the conditions are satisfied, D 2 =1; D 2 = 0 otherwise); D 3 = (u 3 u 1 and u 3 u 2 ). (If the conditions are satisfied, D 3 =1; D 3 = 0 otherwise); This completes the development of a synthetic random dataset consisting of discrete choice indicators D 1 D 2 and D 3 as well as explanatory variables y 1 y 2 and x. The sample size is set to 3000. 4. Simulation Results Following the previous procedure, a s ynthetic random dataset is generated 500 times and parameters are estimated by ma ximizing the loglikelihood function. The statistical results are shown in Table 2.5 and Table 2.6. Table 2.5 offers the statistical results of estimators from joint estimation for the model without endogenous variables, whereas Table 2.6 offers the results from recursive estimation (multinomial logit model for di screte choice and linear regression for continuous variable). In both tables, ttest fails to reje ct the null hypothe sis that the estimator for exogenous variable is consistent. However, ttest rejects the consistency of s estimator from joint estimation method. The mean value of i s estimator is far from 75 PAGE 87 c i which indicates that i cannot truly represen t the correlation c i It can lead to the inconsistent estimate of It is noticeable that estimat ors in Table 2.5 are a bit more efficient than those in Table 2.6, as evidenced by the smaller standard deviations. That is because joint estimation with accommodating error correlations, in spite of being misspecified, will anyway improve efficiency of estimators for exogenous variables. Table 2.5 Statistics of Estimators from Joint Estimation Procedure (without Endogenous Variables) Discretecontinuous Simultaneous Equation System: Joint Estimation N=500 Minimum Maximum Mean True Parameter Std Dev Difference tstat 01 0.452 0.078 0.159 0.150 0.088 0.009 0.104 11 (y 1 ) 0.286 0.052 0.102 0.100 0.047 0.002 0.048 02 0.023 0.461 0.239 0.250 0.071 0.011 0.152 12 (y 2 ) 0.119 0.494 0.312 0.300 0.057 0.012 0.212 0 0.712 1.258 0.985 1.000 0.085 0.015 0.182 1 (x) 1.415 1.585 1.500 1.500 0.026 0.000 0.006 1.817 1.956 1.888 2.000 0.025 0.112 4.548 c 1 and 1 0.664 0.796 1 = 0.726 c 1 =0.600 0.022 c 2 and 2 0.708 0.496 2 = 0.619 c 2 =0.597 0.034 c 3 and 3 0.051 0.199 3 = 0.077 c 3 =0.477 0.044 Table 2.7 offers the statistical result s of estimators through joint estimation method for the model with endogenous continuo us variables in utility functions, whereas Table 2.8 offers the results from recursive estimation method. Since i does not truly represent c i consistent estimators for endogenous va riables a were not obtained as evidenced by the ttests, which strongly rej ect the null hypothesis th at the expectation of estimator is equal to the true parameter valu e. Inconsistency in the estimator for the endogenous variable leads to inconsistency in all of the estimators of the constant terms in the model. The ttest only fails to reje ct the consistency of estimators for exogenous variables; this is reasonable because the co efficients for exogenous variables can be 76 PAGE 88 consistently estimated even without accommodation of random error correlat ions, similar to those in Table 2.6. Table 2.6 Statistics of Estimators from Recursive Estimation Procedure (without Endogenous Variables) Discretecontinuous Simultaneous Equation System: Recursive System Estimation N=500 Minimum Maximum Mean True Parameter Std Dev Difference tstat 01 0.488 0.130 0.156 0.150 0.096 0.006 0.061 11 (y 1 ) 0.271 0.053 0.097 0.100 0.053 0.003 0.054 (a) 0.002 0.522 0.252 0.250 0.077 0.002 0.029 02 0.113 0.485 0.300 0.300 0.064 0.000 0.004 12 (y 2 ) 0.799 1.236 1.001 1.000 0.073 0.001 0.010 (a) 1.415 1.586 1.501 1.500 0.031 0.001 0.031 0 1.921 2.068 2.000 2.000 0.026 0.000 0.018 1 (x) 0.488 0.130 0.156 0.150 0.096 0.006 0.061 0.271 0.053 0.097 0.100 0.053 0.003 0.054 c 1 and 1 0.000 0.000 1 = 0.000 c 1 =0.600 c 2 and 2 0.000 0.000 2 = 0.000 c 2 =0.597 c 3 and 3 0.000 0.000 3 = 0.000 c 3 =0.477 Table 2.7 Statistics of Estimators from Joint Estimation Procedure (with Endogenous Variables) Discretecontinuous Simultaneous Equation System: Joint Estimation N=500 Minimum Maximum Mean True Parameter Std Dev Difference tstat 01 0.875 0.075 0.492 0.150 0.154 0.342 2.223 11 (y 1 ) 0.185 0.058 0.082 0.100 0.034 0.018 0.526 (a) 0.146 0.359 0.271 0.200 0.030 0.071 2.329 02 0.876 0.171 0.502 0.250 0.117 0.752 6.420 12 (y 2 ) 0.152 0.562 0.325 0.300 0.074 0.025 0.341 (a) 0.106 0.030 0.047 0.200 0.017 0.153 9.044 0 1.214 2.743 2.158 1.000 0.264 1.158 4.382 1 (x) 1.216 1.673 1.411 1.500 0.077 0.089 1.150 2.059 2.455 2.273 2.000 0.071 0.273 3.830 c 1 and 1 0.738 0.967 1 = 0.912 c 1 =0.600 0.033 c 2 and 2 0.292 0.073 2 = 0.069 c 2 =0.480 0.060 c 3 and 3 0.076 0.373 3 = 0.224 c 3 =0.000 0.051 77 PAGE 89 Table 2.8 provides the estimation results fo r a recursive system, which is obtained by imposing zero values on the correlation i The results indicate that the constants and parameters for endogenous variables a in th e utility functions are not consistently estimated, but all of the parameters in the continuous model appear consistent with the true parameter values. This is because there are no endogenous variables in the continuous model and any parameters in th is model equation can be consistently estimated in a recursive system. In the join t estimation procedure, the constant in the continuous model is inconsiste ntly estimated because of th e inconsistent estimator on However, it is not a problem in the recursive estimation procedure. 2.4.5 Summary Monte Carlo studies in this section disclose the following facts: 1. Joint estimators for bivariate probit model are more efficient than recursive estimators, particularly when the error correlation is high. However, estimators are both consistent regardless of recursive estimation or joint estimation. 2. Joint estimator of coefficient for e ndogenous dummy variab le in recursive bivariate probit model is consistent, whereas recursive estimator of coefficient for endogenous dummy variable is in consistent. Regardless of recursive estimation or joint estimation, the estimators of coefficients for exogenous variables are consistent. Regardless of correct casual structure or wrong causal stru cture, the estimators of coefficients for exogenous va riables are consistent. 78 PAGE 90 3. Nonnested test offers a valid higher bound of probability that the model under wrong causal structure will be conclusively accep ted as it is applied to compare recursive bivariate probit model under alternative causal structure. 4. The estimation of coefficient for endo genous variable in discretecontinuous model is highly sensitive to covariance st ructure of random error terms. Arbitrary specification of error correla tions cannot help consistently estimate the coefficient for endogenous variables. However, the coefficients for exogenous variables can be consistently estimated regardless of recurs ive estimation or joint estimation. Joint estimators for exogenous variables are more e fficient than recursive estimators even if covariance structure of random error terms is misspecified. Table 2.8 Statistics of Estimators from Recursive Estimation Procedure (with Endogenous Variables) Discretecontinuous Simultaneous Equation System: Recursive System Estimation N=500 Minimum Maximum Mean True Parameter Std Dev Difference tstat 01 0.317 0.907 0.654 0.150 0.106 0.804 7.623 11 (y 1 ) 0.243 0.005 0.115 0.100 0.043 0.015 0.355 (a) 0.054 0.072 0.003 0.200 0.016 0.197 12.014 02 1.178 0.364 0.760 0.250 0.130 1.010 7.752 12 (y 2 ) 0.160 0.608 0.354 0.300 0.080 0.054 0.671 (a) 0.057 0.096 0.001 0.200 0.020 0.201 9.942 0 0.724 1.211 1.003 1.000 0.078 0.003 0.037 1 (x) 1.386 1.621 1.499 1.500 0.034 0.001 0.018 1.910 2.068 1.998 2.000 0.027 0.002 0.072 c 1 and 1 0.000 0.000 1 = 0.000 c 1 =0.600 c 2 and 2 0.000 0.000 2 = 0.000 c 2 =0.480 c 3 and 3 0.000 0.000 3 = 0.000 c 3 =0.000 79 PAGE 91 Chapter Three: Dataset Preparation and Description 3.1 Introduction to Swiss Travel Survey The data set used for analysis and mode l estimation is extracted from the Swiss Travel Microcensus 2000. A very detailed de scription of the survey and the survey sample can be found in Ye and Pendyala (2003). The survey respondent sample consists of 27,918 households from 26 cantons in Sw itzerland. The person sample was formed by randomly selecting one person over six year s old from each household with less than four household members and two persons over six years old from each household with four or more members. As a result of this sampling scheme, the person respondent sample consisted of 29,407 persons. All of the persons in the person sample were asked to report their travel in a oneday trip di ary. The resulting trip data set includes 103,376 trips reported by 29,407 interviewed persons (including the possibility of some respondents making zero trips on the survey day). The household, person and trip characteristics of these samples are respectiv ely shown in Table 3.1 through Table 3.4. 3.2 Dataset Description at Household Level Table 3.1 shows the household characterist ics of Switzerland. The sample shows that the average household size in Switzerland is 2.43. Single family constitutes 27.5% among all the households, whereas big family with more than 3 ( 4) household 80 PAGE 92 members constitutes 23.4%. Household monthly income has low response rate as indicated by 24.9% missing values. With exclusion of missing values, around 27.7% Table 3.1 Household Characteristics of Swiss Travel Microcensus 2000 Characteristic Swiss Sample Sample Size 27918 Household Size 2.43 1 person 27.5% 2 persons 35.1% 3 persons 14.0% 4 persons 23.4% Monthly Income Low ( PAGE 93 the potentially positive correlation between low response rate and low income. The average number of household cars is 1.17 in Switzerland. As expected, the proportion of households without automobiles ( 19.8%) in this Swiss sample is substantially higher than in a typical sample from the United States. This may be reflective of the higher level of public transport service in Switzerland that enables mobility and accessibility without the same level of auto dependence. As a result one might expect the automobile to play a smaller role in the Swiss tr avel environment than in the US environment. 43.6% of households are composed of married couples with or without children. 10.6% of households have children who are less than 6 years old and 22.5% of households have children who are 6~17 years ol d. In Switzerland, 42.4% of households are located in major city and 30.4% in surrounding areas of city, while only 1.1% of households are located in rural areas. 3.3 Dataset Description at Person Level Table 3.2 presents person characteristic s of the Swiss respondent sample. The average age is 43.9 years in the person sample, among which the proportion of people over 60 years is as high as 25.5%. It is wellknown that Switzerland has a serious problem of aging population. 46.3% of responde nts in the sample are male, while 53.7% are female. It is unlikely to be a true reflection of gender distribution among the population. Higher female proportion in the sample is probably caused by higher response rate of female. 48.4% of persons are not employed, 37.3% of persons are fulltime employed and 14.3% are parttime employed. Only 67.4% of respondents are licensed in Swiss sample, which is much lower than in a typical US sample. On average, 82 PAGE 94 Swiss people make 3.51 trips per day, among which 0.46 trips are for work purpose and 3.06 trips are for nonwork purpose. Table 3.2 Person Characteristics of Swiss Travel Microcensus 2000 Characteristic Swiss Sample Sample Size 29407 Age (in years) 43.9 (Mean) Young (6~29) 26.8% Middle (30~59) 47.6% Old ( 60) 25.5% Sex Male 46.3% Female 53.7% Employment Status Full time 37.3% Part time 14.3% Not employed 48.4% Licensed 67.4% #Trips/day 3.51 Work trips 0.46 Nonwork trips 3.06 3.4 Dataset Description at Trip Level Table 3.3 and Table 3.4 illust rate the trip characteri stics of Swiss sample by analyzing the trip distribution by purpose and mode. Among the total 103,376 trips, 101,783 trips are selected for analysis with exclusion of the trips containing missing value on purpose or mode. Table 3 offers trip purpose distribution by various trip modes. In general, as high as 25.1% of trips ar e made for leisure purpose, 13.2% for work purpose and 39.4% for backhome purpose. To some degree, the percent of backhome trip purpose is able to reflect the prevalence of trip chaining behavior. With the absence of trip chaining behavior (home single destination home), the percent of backhome purpose should be around 50%. That 39.4% in the sample is considerably lower 83 PAGE 95 than 50% indicates trip chaining behavior cannot be ignored in the context of Switzerland. The distribution of trip purpose using various m odes is generally consistent with expectation. For example, 18.0% of auto drivers trips are for work purpose, however only 5.4% of auto passengers trip ar e for work purpose. It is intuitive that workers do not tend to commute as auto passeng ers. Only 0.8% of trips made by transit riders are for service, but 6.3% by auto drivers are for service. Intuitively, auto drivers are more likely to serve people than transit riders. Table 3.3 Trip Characteristics of Swiss Travel Microcensus 2000 (Trip Purpose Distribution by Trip Mode) All Purposes Work (%) School (%) Shopping (%) Business (%) Leisure (%) Service (%) Back Home (%) All (%) Auto Driver 39059 18.0 0.5 11.9 4.3 19.8 6.3 39.1 100 Auto Passenger 11671 5.4 2.5 10.4 1.5 32.9 4.1 43.3 100 Bicycle/Motorcycle 9297 14.6 8.2 9.9 1.3 21.4 1.3 43.4 100 Pedestrian 29052 8.1 6.5 14.1 1.1 32.0 1.8 36.3 100 Transit 12704 16.2 8.0 10.5 3.0 20.8 0.8 40.6 100 Trip Numbers 101783 13.2 4.1 12.0 2.6 25.1 3.6 39.4 100 Table 3.4 Trip Characteristics of Swiss Travel Microcensus 2000 (Trip Mode Distributi on by Trip Purpose) All Purposes Work (%) School (%) Shopping (%) Business (%) Leisure (%) Service (%) Back Home (%) Auto Driver (%) 38.4 52.3 5.1 38.0 62.4 30.4 66.8 38.1 Auto Passenger (%) 11.5 4.7 6.9 9.9 6.6 15.1 12.9 12.6 Bicycle/Motorcycle (%) 9.1 10.1 18.1 7.5 4.4 7.8 3.3 10.1 Pedestrian (%) 28.5 17.6 45.5 33.6 12.4 36.4 14.1 26.3 Transit (%) 12.5 15.3 24.3 11.0 14.1 10.3 2.9 12.9 All (%) 100 100 100 100 100 100 100 100 Trip Numbers 101783 13436 4181 12217 2671 25509 3704 40065 Table 3.4 offers trip mode distribution by various trip purposes Generally, autodependent trips including both as drivers and passengers constitutes around half of total number of trips. 12.5% of trips use public transit, 28.5% are made on foot and 9.1% of 84 PAGE 96 trips use bicycle or motorcycle. This statistics illustrates a multimodal travel environment in Swiss context. Further, the mode distribution by each trip purpose is also consistent with common sense. Fox exampl e, 52.3% of work trips are made by auto drivers, but only 4.7% by auto passengers. School trips take the lowest proportion of using auto but the highest proportion of usi ng transit mode, cycle mode and walk mode, among the all types of trips. 85 PAGE 97 86 Chapter Four: Empirical Estimation Results 4.1 Causal Models Between Trip Chaining and Mode Choice (Recursive Bivariate Probit Model) 4.1.1 Background Over the past few decades, there has been considerable research on peoples trip chaining patterns, i.e., the propens ity to link a series of activit ies into a multistop tour or journey (Shiftan, 1998; Dissanaya ke and Morikawa, 2002). The analysis of trip chaining activity may lead to a better understanding of travel behavior and provide a more appropriate framework for examining various transportation policy issues (Strathman and Dueker, 1995). Indeed, the profession has s een tourbased models being developed and increasingly applied in the travel dema nd forecasting arena in place of the more traditional tripbased models that do not reflect trip chaining behavior and tour formation. The terms trip chain and tour are used synonymously to refer to a sequence of trips that begins at home, involves visits one or more other places, and ends at home. Depending on the number of places visited with in the tour or chain, the tour may be classified into two categories: simple and comp lex. A tour or chain with a single stop or activity outside the home location is defined as a simple tour, whereas a tour or chain with more than one stop outside the home loca tion is defined as a complex tour. Thus PAGE 98 87 a tour or chain of the form: home shop home is considered a simple tour while a tour of the form: home work shop home is considered a complex tour. As peoples activity patterns become increasingly complex and involve interactions with other h ousehold and nonhousehold members and as time is a finite resource, it may be conjectured that trip chains are likely to be increasingly complex over time. The ability to chain multiple activities together in a single tour or chain may provide greater efficiency a nd convenience than a series of singlestop simple tours (Hensher and Reyes, 2000). There are at least two reasons as to why this has significant traffic and policy implications. First, complex tours or chains may lead to an increase in automobile usage. If one needs to pursue complex tours or chains, then the flexibility afforded by the private automobile is desirabl e. The ability to pursue multiple activities in a single journey is rather limited wh en constrained by the schedules, routes, and uncertainty associated with public trans portation. Thus, complex trip chaining may contribute to an increased auto dependency and consequently, automobile traffic. Second, in the case of workers (commuters), the formation of complex trip chains may entail the linking of nonwork activities with the work trip (commute). Then, nonwork trips that could have taken place outside the peak periods now occur in the peak periods simply because they are being tied together with the commute. Thus, complex trip chaining patterns may contribute to an in crease in peak period travel demand. The above discussion clearly points to th e possible interdependency between trip chaining, auto usage, and trip timing. Strathman and Dueker (1995), in an analysis of the 1990 NPTS, found that complex trip chains may tend to be more autooriented. However, the nature of the causal relationship is not uni laterally evident because the availability of PAGE 99 88 an automobile may provide the flexibility and convenience that contributes to the formation of complex trip chains. The flex ibility of the automobile may stimulate the desire to undertake a dditional activities in one tour. Fo r example, the lower travel times typically associated with the auto mode c hoice may relax time constraints and lead to more stopmaking (Bhat, 1997). Moreover, shar ed rides, which constitute a portion of total auto mode share, are more likely to invol ve complex tours due to the variety of trip purposes and destinations between the driver and passengers. The central question being addressed is: Does mode choi ce influence the complexity of trip chaining patterns or does the complexity of the trip chaining patterns influence mode choice?. Previously, Strathman and Dueker (1994) analyzed the probability of an individual engaging in a complex work tour using a binary log it model formulation, where the complexity/simplicity of a tour was modeled as a binary choice. One may also adopt a binary choice formula tion to model mode choice at th e tour level, i.e., auto vs. nonauto mode choice. Thus, the investig ation of the mutual influence and causal relationship between tour complexity and mode choice may be reduced to a problem involving two binary discrete choi ce variables. The nested lo git model is often applied in dealing with problems of this nature. Ba sed on the assumption of a conditional choice mechanism, nested logit models representi ng two alternative tree structures can be formed. By checking the reasonableness of the estimated inclusive value parameter coefficients and/or comparing measures of goodnessoffit betw een models of two different structures, the more plausible stru cture that is supporte d by the data may be identified. Hensher and Reyes (2000) used the nested logit model formulation to understand the role of trip chaining in serving as a barrier to the us e of public transport PAGE 100 89 modes. This section is intended to further clarify the relationship between mode choice and tour complexity using recursive bivari ate probit model (see in Section 2.2.1) that explicitly allows the quantification of the imp act of one choice dimension on another. In other words, it is to model the causal relations hip between the complexity of trip chains and mode choice. 4.1.2 Dataset Preparation and Desc ription for Modeling Analysis In this study, the unit of anal ysis is the tour or trip ch ain. A trip chain is defined as a complete hometohome journey where the origin of the first trip is home and the destination of the last trip is home. No intermediate home stop is present within the trip chain. Whenever the home location is reached, a chain is formed. A tourlevel data set was formed by aggregating the trip data set to the tour level. All person and household characteristics were merged into the tour leve l data set. In most cases, a single mode was prevalent for the trip chain. In cases wher e multiple modes were prevalent within the same trip chain or tour, a single mode was assigned based on the whether or not the auto mode was used in the chain. If the auto mode was used for any segment in the trip chain, then the chain was assigned an auto mode and vice versa. One may argue that main mode should be defined as a representation of mode choice at the tour level but it is felt that the definition of the mode for a chain is a complex issue. The definition in this study is made in this way because the major concern is not the main mode of the chain, but whether the auto mode was used for any part of the chain thus potentially contributing to the formation of a multistop complex chain. Ea ch tour was classified as a simple or PAGE 101 90 complex tour depending on whether it had one intermediate stop or more than one intermediate stops within the chain. Data corresponding to respondents from th e Canton of Zurich was extracted to reduce the data to a more manageable size and to control for possible area specific effects. Tables 4.1 and 4.2 include summary statistics for the Zurich subsample in addition to those of the overall Swiss sample. Ther e are 3293 persons from 2998 households who report at least one nonwork tour in th e Zurich sample and 1466 persons from 1438 households who report at least one work tour. It is to be not ed that these two samples are not mutually exclusive as some individuals may report both a work tour and a nonwork tour. As expected, households in which there are work tour makers report higher income levels than households in which there are nonwork tour makers, presumably because the work tour maker households consistently in clude workers earning wages. The average household size is a little over two persons pe r household while vehicle ownership is a little over one vehicle per household. As expected, a very small percentage of households in the Zurich subsample report th eir residence as bei ng in a rural location, presumably due to the urban nature of Zurich and its immediate surrounding areas. Person characteristics also s how similarities between the overall Swiss sample and the Zurich subsamples. As expected, the nonwork tour maker sample consists of a greater proportion of elderly (retired) a nd young persons than the work tour maker sample. On average, work tour makers make about 1.17 trip chains per day where a trip chain is defined as a complete hometohom e tour. Nonwork tour makers report, on average, about 1.49 trip chains per day. Work tour makers make 4.46 trips per day while nonwork tour makers report fewer trips at 4.11 trips per day. The trip rates are PAGE 102 91 substantially higher than the trip rate for the overall Swiss sample, which is partially caused by the exclusion of zerotr ip making persons from the Zurich subsample. As the model estimation was performed only on the Zu rich subsample, all further analysis presented in the section pertains only to this subsample. The Zurich subsample included 4,901 nonwork tours and 1,711 work tours. Table 4.1 Household Characteristics of Swiss Travel Microcensus 2000 and Zurich Subsamples Swiss Sample Zurich Nonwork Tour Makers Zurich Work Tour Makers Sample Size 27918 2998 1438 Household Size 2.43 (Mean) 2.42 (Mean) 2.33 (Mean) 1 person 27.5% 29.9% 31.5% 2 persons 35.1% 33.2% 33.5% 3 persons 14.0% 11.6% 12.0% 4 persons 23.4% 25.4% 22.9% Monthly Income Low ( PAGE 103 92 Tables 4.3 and 4.4 offer simple crosstabula tions of tour complexity against mode choice. Table 4.3 examines the distribution of tour complexity by mode choice for nonwork tours while Table 4.4 examines the distri bution for work tours. An examination of columnbased percentages in Table 4.3 indi cates that about 28 percent of simple nonwork tours involve the use of the automobile as the primary mode of transportation. This value is considerably higher at 44 percent fo r complex nonwork tours. Thus it appears Table 4.2 Person Characteristics of Swiss Travel Microcensus 2000 and Zurich Subsamples Characteristic Swiss Sample Zurich Nonwork Tour Makers Zurich Work Tour Makers Sample Size 29407 3293 1466 Age (in years) 43.9 (Mean) 44.2 (Mean) 41.8 (Mean) Young (6~29) 26.83% 28.1% 18.3% Middle (30~59) 47.64% 42.1% 74.6% Old ( 60) 25.48% 29.8% 7.2% Sex Male 46.31% 45.6% 60.8% Female 53.69% 54.4% 39.2% Employment Status Full time 37.34% 29.4% 76.9% Part time 14.27% 14.5% 20.2% Not employed 48.39% 56.0% 2.9% Licensed 67.43% 64.1% 87.9% #Chains/day 1.33 1.49 1.17 #Trips/day 3.51 4.11 4.46 Work trips 0.46 0.21 1.54 Nonwork trips 3.06 3.90 2.92 Work Trip Mode Share Auto 55.84% 49.6% 52.0% NonAuto 44.16% 50.4% 48.0% NonWork Trip Mode Share Auto 48.92% 43.5% 54.1% NonAuto 51.08% 56.5% 45.9% PAGE 104 93 that there is a correlation (a t least) between mode choice a nd tour complexity. Clearly, the auto mode is utilized to a greater degree in the contex t of complex multistop trip chains. Similarly, examining the rowbase d percentages shows that 80 percent of nonwork nonauto tours are simple in nature (involve only one stop). On the other hand, only 66 percent of nonwork auto tours are si mple in nature. Thus it appears that nonauto tours tend to be simpler than autobased tours. Table 4.3 Crosstabulation of Mode Choi ce and Tour Type for Nonwork Tours Tour Type Mode Choice Simple Complex Total Frequency Nonauto 2685 661 3346 Auto 1030 525 1555 Total 3715 1186 4901 Column Percent Nonauto 72.3% 55.7% 68.3% Auto 27.7% 44.3% 31.7% Total 100.0% 100.0% 100.0% Row Percent Nonauto 80.2% 19.8% 100.0% Auto 66.2% 33.8% 100.0% Total 75.8% 24.2% 100.0% Table 4.4 Crosstabulation of Mode C hoice and Tour Type for Work Tours Tour Type Mode Choice Simple Complex Total Frequency Nonauto 436 355 791 Auto 397 523 920 Total 833 878 1711 Column Percent Nonauto 52.3% 40.4% 46.2% Auto 47.7% 59.6% 53.8% Total 100.0% 100.0% 100.0% Row Percent Nonauto 55.1% 44.9% 100.0% Auto 43.2% 56.8% 100.0% Total 48.7% 51.3% 100.0% Table 4.4 offers similar indications, albeit the tendencies are not as strong as those seen in Table 4.3. In the case of work tours, it is found that a majority of simple tours are non autobased (52 pe rcent) while a majority of complex tours are autobased PAGE 105 94 (60 percent). Similarly, a majority of non au tobased work tours tend to be simple in nature (55 percent), while a majority of autob ased tours tend to be complex in nature (57 percent). Once again, a clear co rrelation between auto use and trip chain complexity is seen in these crosstabulations. Given the di fference in the percent distributions between work and nonwork tours, it was considered prudent to examine th e causal relationship between tour complexity and mode choice fo r work and nonwork tours separately. 4.1.3 Model Estimation Results This section presents estimation results of the recursive bivariate probit model for causal analysis between trip chaining formati on and mode choice. A ll the tables showing model estimation results consist of four bloc ks. The first two blocks provide the model estimation results for the causal structure wher e tour complexity affects mode choice. Between these two blocks, the left block provides estimation results from recursive estimation method and the right block provide s estimate results from joint estimation method. The next two blocks provides esti mation results for the causal structure where mode choice affects tour complexity. Simila rly, between these two blocks, the left one is obtained from recursive estimation method and the right block is from joint estimation method. 4.1.3.1 Estimation Results for NonWork Tours Table 4.5 offers the definition and desc ription of the variables regarding nonwork tour model and the estimation results for nonwork tour m odels are provided in Table 4.6. In the causal structure where to ur complexity affects mode choice, the PAGE 106 95 coefficient for tour complexity is statistica lly significant and positive in the mode choice model, regardless of from r ecursive estimation method or jo int estimation method. This lends credence to the hypothesis that the need to make a complex tour is likely to increase dependency on the auto mode. Th e coefficient of COMPLEX from joint estimation is more positive than from recu rsive estimation (1.409 vs. 0.456) because the negative correlation (0.622) is accommodated into jo int estimation procedure. In addition, it was found that demographic and so cioeconomic characteristics, the tours primary purpose, and timeofday signifi cantly influence mode choice and tour complexity. The coefficients of these variables are rather close between recursive estimation and joint estimation. In the auto mode choice model, negative coefficient of CAR_0 is consistent with expectat ion that tour makers with zer o autos are less likely to use auto mode, while those with more than on e auto are more likely to use the auto mode as evidenced by the positive coefficient of CAR_GE2. As expected, those with a driver license are more prone to using the auto mode while those with seasonal transit ticket subscriptions are less likely to use the auto mode. Transit ticket subscribers are likely to be more transitoriented and have better access to transit services than nonsubscribers. Tours made by persons living in rural areas are likely to be autooriented, presumably due to their limited transit accessibility. In the tour complexity model, it is fo und that individuals in larger households tend to make less complex tours as opposed to individuals in smaller households. One may conjecture that the possibility of task allocation present in a multiperson household may reduce the need to perform multistop tr ip chains (Strathman and Dueker, 1994). The young and the elderly are less likely to pursue complex nonwork tours, possibly PAGE 107 96 because they have fewer household obligations than those in the middle age groups. It is rather interesting that tours undertaken in the AM peak show a greater propensity to involve multiple stops than those undertaken in the PM peak period. However, in the context of nonwork tours, this may be a plau sible result in that people combine a series of errands and school activities in the morning and complete their activities by midday. In the tour complexity model, it is fo und that individuals in larger households tend to make less complex tours as opposed to individuals in smaller households. One may conjecture that the possibility of task allocation present in a multiperson household may reduce the need to perform multistop tr ip chains (Strathman and Dueker, 1994). The young and the elderly are less likely to pursue complex nonwork tours, possibly because they have fewer household obligations than those in the middle age groups. Table 4.5 Nonworktour Model Variable Description and Statistics (N = 4901) Variable Variable Description Mean Std. Dev. CAR_0 Number of autos in household = 0 0.21 0.41 AUTOLIC Person has auto license 0.62 0.48 H_SUB Person subscribes halfprice seasonal ticket 0.43 0.49 O_SUB Person subscribes other type of seasonal ticket 0.26 0.44 CAR_GE2 Number of autos in household 2 0.29 0.45 RURAL Person lives in rural area 0.11 0.31 COMPLEX Tour is complex (multistop) 0.24 0.43 HHSIZE Number of household members 2.72 1.46 YOUNG Person < 18 years old 0.21 0.40 OLD Person > 60 years old 0.27 0.45 SERVICE Primary purpose of the tour is service 0.06 0.23 SHOPPINGPrimary purpose of the tour is shopping 0.30 0.46 AMPEAK Tour starts in AM peak period (7:00~8:59) 0.17 0.37 PMPEAK Tour starts in PM peak period (16:00~17:59) 0.10 0.30 AUTO Tour uses auto 0.32 0.46 PAGE 108 Table 4.6 Nonworktour Model Causal Structure Complex Auto Auto Complex Recursive Estimation Joint Estimation Recursive EstimationJoint Estimation Variable CoefficientttestCoefficien tttestCoefficientttestCoefficientttest Auto Mode Choice Model Constant 2.097 24.24 2.100 24.98 1.991 23.65 1.988 23.86 CAR_0 1.303 12.59 1.185 12.84 1.290 12.55 1.290 13.32 AUTOLIC 2.211 25.66 1.933 19.76 2.224 26.04 2.222 26.86 O_SUB 0.428 7.01 0.391 6.95 0.424 6.98 0.423 6.94 H_SUB 0.206 4.33 0.178 4.24 0.201 4.26 0.203 4.39 CAR_GE2 0.200 4.00 0.201 4.36 0.202 4.08 0.197 3.90 RURAL 0.139 1.94 0.121 1.86 0.117 1.64 0.125 1.75 COMPLEX 0.456 8.71 1.409 11.42 Complex Tour Choice Model Constant 0.285 5.07 0.282 5.31 0.427 7.21 0.384 6.12 HHSIZE 0.120 6.74 0.130 7.92 0.128 7.14 0.121 6.74 YOUNG 0.243 3.82 0.210 3.46 0.069 1.02 0.137 1.82 OLD 0.136 2.71 0.150 3.17 0.099 1.97 0.108 2.12 SERVICE 0.582 7.05 0.676 8.49 0.495 5.92 0.495 5.95 SHOPPING 0.266 5.73 0.206 4.73 0.260 5.57 0.267 5.73 AMPEAK 0.285 5.28 0.258 5.10 0.286 5.27 0.285 5.21 PMPEAK 0.315 4.24 0.258 3.58 0.326 4.36 0.325 4.36 AUTO 0.363 8.03 0.227 2.60 0.000 0.622 7.65 0.000 0.111 1.81 97 PAGE 109 98 It is rather interesting that tours und ertaken in the AM peak show a greater propensity to involve multiple stops than those undertaken in the PM peak period. However, in the context of nonwork tours, th is may be a plausible result in that people combine a series of errands and school act ivities in the morning and complete their activities by midday. Another possible explanat ion is that time constraints towards the end of the day (PM period) limit the number of activities that an indi vidual can pursue at that time. Another interesti ng finding is that gender does not significantly influence tour complexity in the case of nonwork tours. Ot her studies have suggest ed that females tend to make more complex trip chains than males (McGuckin and Murakami, 1999). The analysis in this section does not support that finding in the Swiss travel context. The tours primary purpose appears to a ffect tour complexity. While service (serve passenger) tours tend to be complex in nature, shopping tours do not tend to be complex in nature. Thus it appears that the shopping activity may be more prone to being a standalone activity within a tour The error correla tion is found to be statistically significant and this is indicative of the validity of the assumption that nonwork tour complexity and mode choice shoul d be modeled in a simultaneous equations framework. The negative sign associated with the error correlation indicates that the unobserved factors influencing th ese two variables are negativel y correlated. It is not straightforward to interpre t the negative sign of the error correlation, since the unobserved variables associated with complex tour choice and auto mode choice would be expected to be positively correlated. For example, the unobserved personal preference to be more efficient may stimulate more auto mode selection as well as more multistop tours. Indeed, error correlations were found to be positive in the preliminary analysis in PAGE 110 99 which bivariate probit models were estimat ed without endogenous dummy variables. The inclusion of the endogenous dummy variable, which is likely to be positively correlated with unobserved variables, may be contributing to the negative error correlation. The negative erro r correlation may also be due to the exclusion of unobserved factors from the model and this ha ppens often when data are analyzed at a higher aggregation level. For example, no di stinction is made between drive alone and drive/ride with others, both of which entail the use of th e auto mode. As a result, person and household correlations are absorbed in the unobserved part of the models that, in turn, leads to negative correlations among the error terms used in the model formulations. Further analysis is warr anted to fully understand the source and interpretation of the nega tive error correlations. The right two blocks of Table 4.6 provi de estimation results for the causal structure where mode choice affects tour co mplexity for nonwork tours using recursive estimation method and joint estimation me thod. It is found that mode choice significantly affects tour complexity and that the choice of auto is positively associated with the formation of complex tours. Ther e is no substantial difference between the coefficients of endogenous dummy vari able AUTO (0.363 vs. 0.227) from two different estimation methods, since the error correlation is estimated as low as 0.111. Thus it appears from this model that the choice of the automobile mode for a tour contributes positively to the formation of mu ltistop trip chains. In addition, the error correlation is positive for this model structure, consistent with exp ectation. All of the other indications provided by the model system are similar to those seen in the left block. PAGE 111 100 As all of the estimation results in Table 4.6 offer plausible and similar interpretations, a more rigorous performa nce comparison must be conducted among the models to potentially identify the causal st ructure underlying the data set. This performance comparison is presented in S ection 4.1.4 following the discussion of the estimation results for the work tour models. 4.1.3.2 Estimation Results for Work Tours Table 4.7 offers the definition and descri ption of the variables regarding work tour model. Estimation results for work tour models are provided in Table 4.8. Similar to Table 4.6, four blocks in Tabl e 4.8 also represent two different causal structures using recursive a nd joint estimation methods. In the causal structure where tour complexity affects mode choice, it is found that tour complexity has a positive impact on auto mode choice, regardless of us ing recursive or joint estimation methods, as indicated by the coefficients of COM PLEX (0.486 vs. 0.915). The negative error correlation (0.293) is jointly estimated in bivariate probit model, where the coefficient is more positive than in the recursive model. This finding is consistent with expectations, trends in the data, and the models of nonwork tours. The model s upports the notion that a complex tour or trip chaining pattern contri butes to the choice of auto as the mode for the tour. In addition, the error correlati on is statistically significant, once again supporting the simultaneous equations formul ation of the relationship between tour complexity and mode choice. Similar to the nonwork tour mode l estimation results, auto ownership and the possession of a driver license contribute positively to auto mode selection, whereas seasonal transit ticket s ubscription contributes negatively to auto mode choice. With respect to workrelated vari ables, it is found that the availability of PAGE 112 101 free parking at the work place and longer co mmutes are both positively associated with the choice of auto for work tours. Table 4.7 Worktour Model Variable De scription and Statistics (N = 1711) Variable Variable Description Mean Std. Dev. HIGH_INC Monthly household income > Fr10000 0.21 0.41 OWN_BUS Person owns enterprise/business 0.14 0.35 SWISS Person is of Swiss Nationality 0.85 0.36 BEG6_8 Tour starts in time period from 6:00 to 8:59 0.67 0.47 BEG13_14 Tour starts in time period from13:00 to 14:59 0.11 0.31 END12 Tour ends in time period from 12:00 to 12:59 0.12 0.32 AUTO Tour uses auto 0.54 0.50 CAR_0 Number of autos in household = 0 0.16 0.36 CAR_GE2 Number of autos in household 2 0.32 0.47 AUTOLIC Person has auto license 0.88 0.33 FREEPARK Reserved parking lot at the work place is free 0.33 0.47 H_SUB Person subscribes halfprice seasonal ticket 0.47 0.50 O_SUB Person subscribes other type of seasonal ticket 0.25 0.43 DIS_WORK Distance between residence and work place (km) 11.02 15.02 COMPLEX Tour is complex (multistop) 0.51 0.50 All of these findings are consistent with expectations. In the tour complexity model, it is found that persons of higher income are prone to making complex work tours. In addition, individuals owning their business enterprise are more likely to engage in multistop trip chains. It is possible that these individuals have occupational characteristics that lead to the formation of complex trip chains. Individuals of Swiss Nationality are more likely to engage in comp lex work tours, possibly because they have a denser network of social contacts and a larger set of activity options. Another interesting finding is that timeofday indicators play an im portant role in influencing tour complexity. Tours ending within the lunch hour are less prone to be complex possibly due to time constraints and the pr esence of a single lunc h stop/destination. However, those beginning in the morning peri od of 6 to 9 AM are more prone to being multistop trip chains, possibly due to the li nking of a nonwork activity with the work PAGE 113 102 activity in the overall tour. A more detailed timeofday based analysis of trip chain formation is warranted to fully understand the relationship between trip chain complexity and timeofday choice behavior. Within the context of this study, timeofday choice is assumed exogenous to the model system. Ho wever, one may argue that time of day choice is endogenous to trip chain complex ity and mode choice. The study of the simultaneous causal relationships among trip chain formation, mode choice, and time of day choice (three endogenous entities) remains a future research effort. Indeed, the simultaneous equations model for analyzing th e causal relationship between mode choice and timeofday choice is conducted in Sect ion 4.3 of this diss ertation using mixed binarymultinomial choice model. The right two blocks of Table 4.8 give es timation results for the model where the choice of mode affects work tour complex ity using both recursive and joint estimation method. In the recursive model, the coeffi cient indicating auto mode choice appears positively significant in the tour complexity equation. However, in joint model, this coefficient associated with the auto mode choice variable is not statistically significant in the tour complexity equation, but the error correlation is positi ve and statistically significant. This result does not support the hypothesis that auto mode choice positively affects the formation of a complex work tour The model supports the notion that these choices should be modeled in a simultane ous equations framewor k because recursive model gives misleading inference that auto positively affects the formation of complex work tour. PAGE 114 103 4.1.4 Model Performance Comparisons Based on Nonnested Test The model estimation results presented in Section 4.1.3 generally offer plausible statistical indications for a lternative causal paradigms. The only model that may be rejected on qualitative grounds is worktou r model where the mode choice decision precedes the tour complexity decision. Th e statistically insignificant coefficient associated with the endogenous auto mode choi ce variable in the tour complexity model implies that the choice of the auto mode doe s not significantly influence the complexity of work tours. Although this is possible, it is not consistent with the trends noted in the descriptive crosstabulations and with any of the other models where the endogenous dummy variables have been consistently statistically significant. Given the preponderance of evidence to the contrary, it is difficult to explain and defend this statistically insignificant coefficient. Fo r all other models, however, the statistical indications are plausible. Th is section presents a rigorous comparison across models to see if it is possible to identify the most likely causal structure governing the relationship between mode choice and trip chaining. Nonnested test, mentioned in Section 2.3.2, is adopted to compare the models under alternative causal struct ures. For nonwork tour models using joint estimation method, the differences in adjusted likeli hood ratios are 0.0023 between two alternative causal structures. According to Equation (2.5.12), the calculated bounding probability on the right hand side of the expression for the comparison between the two causal structures is almost zero. PAGE 115 Table 4.8 Worktour Model Causal Structure Complex Auto Auto Complex Recursive Estimation Joint Estimation Recursive EstimationJoint Estimation Variable CoefficientttestCoefficien tttestCoefficientttestCoefficientttest Auto Mode Choice Model Constant 2.010 7.40 2.149 8.32 1.804 6.80 1.768 7.06 CAR_0 1.287 7.17 1.247 6.77 1.254 7.07 1.259 7.06 CAR_GE2 0.459 5.35 0.430 4.89 0.471 5.55 0.467 5.58 AUTOLIC 2.067 7.77 1.996 7.83 2.052 7.81 2.040 8.24 FREEPARK 0.812 9.32 0.797 9.00 0.803 9.36 0.790 9.32 O_SUB 1.465 13.00 1.440 12.41 1.400 12.65 1.413 13.24 H_SUB 0.422 5.30 0.421 5.30 0.382 4.89 0.397 5.05 DIS_WORK 0.017 4.99 0.015 5.00 0.018 5.48 0.017 6.06 COMPLEX 0.486 6.06 0.915 3.42 Complex Tour Choice Model Constant 0.420 4.31 0.414 4.24 0.553 5.40 0.439 4.06 HIGH_INC 0.292 3.77 0.295 3.80 0.262 3.36 0.277 3.54 OWN_BUS 0.282 3.09 0.304 3.30 0.249 2.70 0.244 2.64 SWISS 0.315 3.57 0.299 3.38 0.316 3.58 0.321 3.62 BEG6_8 0.324 4.25 0.327 4.39 0.320 4.18 0.310 4.11 BEG13_14 0.413 3.55 0.389 3.39 0.430 3.67 0.422 3.59 END12 0.753 7.49 0.760 7.64 0.751 7.43 0.725 7.24 AUTO 0.275 4.36 0.053 0.60 0.000 0.293 1.63 0.000 0.246 3.49 104 PAGE 116 105 Even the corresponding recursive models under two causal structures offer the consistent results, where the difference in adjusted likelihood ra tio is 0.0009 and the bounding probability is almost zero. Thus, it ma y be concluded that the model in the left block is more closely capturing the causal structure underlying the relationship between mode choice and tour complexity. The signi ficantly better goodnessoffit of the model in the left block suggests that the causal st ructure where the complexity of the tour affects mode choice (tour complexity auto mode choice) is statistically, and possibly behaviorally, dominant in the population for nonwork tours. For work tour models, the situation is very similar. In comparing th e joint models, the seemingly better model in the block (tour complexity auto mode choice) of Table 4.9 has an adjusted likelihood ratio index that is 0.0018 greater than those of the models in the ot her causal structure (auto mode choice tour complexity). The bounding pr obabilities, as pe r the right hand side of equation (2.5.12), are calculated as almost zero. Even in recursive model, nonnested test rejects the causal relationship that auto mode choice affecting tour complexity in recursive model by the difference as 0.0039 and negligib le bounding probability value. Also, as mentioned earlier, the statistically insignificant coefficien t associated with the endogenous dummy variable appe ars to suggest that the cau sal structure where auto mode choice drives complex work tour forma tion is not capturing the trends in the data set. Once again, it may be concluded that th e causal structure wher e the complexity of the tour affects mode choice (tour complexity auto mode choice) is statistically, and possibly behaviorally, dominant in the population for work tours. From the viewpoint of activitybased trav el behavior theory where travel choices are considered to be derived from activity pa tterns (and activity needs that are distributed PAGE 117 106 in time and space), one may consider the findings of this section to be quite consistent with expectations. For both nonwork tour s and work tours, the statistical model estimation results show that tour complexity (which is reflective of the activity pattern) drives mode choice. This finding is also c onsistent with and conf irms previous results regarding the nature of the relationship betw een trip chaining and mode choice reported by Hensher and Reyes (2000). Table 4.9 Comparisons of Goodnessoffit of Recursive Bivariate Probit Models Nonwork Tour Model Recursive Estimation Joint Estimation Complex AutoAuto Complex Complex Auto Auto ComplexSample size 4901 LL at zero: LL(0) 6794.229 LL at constant: LL(c) 5719.416 # of Parameters 16 1617 17 LL at convergence 4585.1474591.1324573.906 4589.533 2 at zero 0.32510.32430.3268 0.3245 Adj. 2 at zero 0.32280.32190.3243 0.3220 2 at constant 0.19830.19730.2003 0.1976 Adj. 2 at constant 0.19550.19450.1973 0.1946Nonnested Test (Prob.) 0.0009 (0.000)0.0023 (0.000) Work Tour Model Sample size 1711 LL at zero: LL(0) 2371.950 LL at constant: LL(c) 2354.272 LL at convergence 1780.6011789.7971779.440 1783.900 Number of Par 16 1617 172 at zero 0.24930.24540.2498 0.2479Adj. 2 at zero 0.24260.23870.2426 0.24082 at constant 0.24370.23980.2442 0.2423Adj. 2 at constant 0.23690.23300.2369 0.2351Nonnested Test (Prob.) 0.0039 (0.000)0.0018 (0.000) 4.1.5 Discussions and Conclusions Mode choice behavior is a fundamental element of travel behavior that has significant implications for tran sportation planning. Estimates of public transit ridership and the use of alternative mode s of transportation are largely based on studies of mode choice behavior and modal split models. Public transport agencies face increasing PAGE 118 107 competition from the automobile as automob iles become increasingly affordable and the road infrastructure become s increasingly ubiquitous. Undoubtedly, the automobile is considered to provide greater flexibility and convenience when compared with public transport modes that are generally cons trained with respect to schedules and routes/destinations. This study examines the interrelations hip between the complexity of peoples activitytravel patterns and their mode choice. In order to conduc t the analysis, this section examines mode choice behavior in th e context of multistop (complex) vs. singlestop (simple) trip chains. Through recursive bi variate probit model, this section presents a rigorous analysis of the most likely causa l relationship between these two phenomena at the level of the individual tr ip chain or tour. It should be emphasized that the analysis in this section does not attempt to replicate cau sality at the level of the individual traveler, but rather at the macroscopic level to iden tify the causal tendency that appears to be dominant in the population. This section estimates recursive bivariat e probit models that provide a rigorous analytical framework for anal yzing and testing alternative causal structures. For both nonwork tours (i.e., tours that do not involve any work stops) and work tours (i.e., tours that involves at least one work stop), the anal ysis suggests that the causal structure where the complexity of the trip chaining pattern drives mode choice is the dominating behavioral trend in the population. These findings have important implicatio ns for public transport service providers who are interested in attracti ng choice riders. If mode c hoice decisions precede activity pattern/agenda decisions, then it may be possi ble for public transport service providers to PAGE 119 108 simply attract choice riders by improving amen ities, schedule, route coverage, safety and security, and comfort. On the other hand, if the formation of the activity agenda precedes or drives mode choice decisions, then th e public transport indu stry has a greater challenge before it. Trip ch aining and tour complexity se rve as impediments to public transport usage as it is generally more burdensome to undertake multistop tours using public transportation where travelers ar e constrained by routes, schedules, and access/egress issues. The analysis in this section suggests that the dominant relationship in the data set be the one in which tour co mplexity drives mode choice, both for work and nonwork tours. Then, not only do public transport services providers have to improve service amenities, but they also have to cater to a multistop oriented complex activity agenda. This is extremely difficult to do with a fixed route, fixed schedule system. As activitytravel patterns and t ours become increasingly complex, it is likely that public transport agencies wi ll have to develop new types of services to try and retain existing riders in addition to attracting new ri ders. Fixed route bus and rail services may continue to be useful in serving longer li nehaul portions of multistop tours. However, serving shorter multistop trips calls for the provision of more flexible circulator and paratransittype services that may involve the use of smaller buses and vans than conventional vehicles. Also, attention needs to be paid to land use developments around transit stops/stations. Concerted efforts n eed to be made to promote mixed use land developments and multipurpose activity centers so that travelers are able to fulfill a variety of activity needs at a single location (with out the need for undertaking additional trips). PAGE 120 109 The analysis and findings of this secti on are also useful and important in the specification and development of activitybased and tourbased mode ls. Most activitybased and tourbased travel demand model sy stems consist of hierarchical structures involving, at a minimum, activity agenda or tour formati on, mode choice, destination choice, and time of day choice. A lthough many of the model systems utilize simultaneous equations systems to repres ent joint choice proc esses and recognize endogeneity, there is invariably a causal hierar chy that is implied in the specification of the model system. Knowledge about the natu re of the relationships among key choice dimensions can aid in the specification of activ itybased model structures that reflect the dominant behavioral trends in the population. For example, consider the findings of this section in which it is found that the activity agenda or tour formation drives mode choice for both nonwork and work tours. Clearl y, this finding suggests that activitybased models should be formulated such that indi vidual activity agendas and tours are formed first and then mode choice is determined base d on the nature of th e activity agenda or tour complexity. Such a model would more accurately reflect behavioral changes that might result from a system change, say, the im provement of transit service in a corridor or region. If, for example, one developed an activitybased model system assuming a different causal structure, i.e., one in wh ich mode choice precedes and drives tour formation, then the model is prone to errone ously overestimate the potential benefits or impacts of the transit service improvement and may alter the nature of the individual tour patterns in response to the mode shift. Acco rding to the results obt ained in this section, the dominant relationship is one in which pe ople make decisions re garding their activity agendas or tour complexity first and this decision drives the m ode choice decision. PAGE 121 110 Many individuals with complex t our patterns will not be able to shift modes in response to improvements in transit service and thus, in reality, the impacts of the improved transit service may be substantially lower (than th at which might be obtained had the reverse causal structure where mode choice drives tour complexity has been assumed). Mode choice can be expanded to consider multinomial modes including SOV, shared ride, public transit, and nonmotorized modes. Similarly, tour complexity can be expanded to consider different le vels of tour complexity or different tour types such as that presented in Strathman and Dueker ( 1995). Another consideration that merits further investigation is the extent to which findings such as those presented are sensitive to model specification. It is possible that st atistical indicators of model performance will change depending on the model specification chose n. One of the limitations of this study is that detailed levelofservice and price variables were not available for all trips as many trips had either an origin or a destination outside the Zurich region. While levelofservice variables are available for trips wi th known origins and de stinations within the Zurich region, they are not available when one of the trip ends is outside the region. This problem is exacerbated when one is conducting analyses and modeling efforts at the tour level. Limiting the analysis to the subset of trips with known orig ins and destinations within the Zurich region would have resulted in a very restrictive sample of tours. It is unclear whether the inclusion of such vari ables would significant ly alter the findings reported in this section and therefore the se nsitivity of findings to model specification merits further investigation. PAGE 122 111 4.2 Causal Models Between Activity Timing and Activity Duration (Mixed Discretecontinuous Model and Lee Model) 4.2.1 Background Activitybased approaches to travel demand analysis explicitly recognize the important role played by time in shaping activity and travel patterns (Axhausen and Garling 1992). One of the key advantages of the activitybased approach is that it is capable of explicitly incorporating the time dimension into the travel modeling process (Pas and Harvey 1997). In the new planni ng context where travel demand management (TDM) strategies and transpor tation control measures (TCM) are inherently linked to the time dimension, activitybased approaches th at recognize the time dimension offer a stronger behavioral framework for conducting po licy analyses and impact studies (Bhat and Koppelman 1999; Harvey and Taylor 2000; Kitamura et al 1996 ; Pendyala et al 1997, 1998; Yamamoto and Kitamura 1999). Telecommuting is a good example for illustrating the importance of time dimension. The commute trip to and from work place is not made when a worker telecommutes, thus he or she has additional time available for pursuing more activities. The elimination of the commute trip influen ces the duration of travel and/or activity engagement. Besides influencing duration, telecommuting may influence the timing of activity engagement. A worker used to pursu e nonwork activities on the way to work or on the way back from work, but now he or she may choose to engage in nonwork activities in different times of day. Without commute trip, the worker has no longer the need or opportunity to pursue nonwork act ivities in combination with commute. PAGE 123 112 Analyzing these temporal changes in activ ity engagement patterns is important for accurately assessing the impacts of telecommuting on travel demand. As illustrated by the telecommuting example, there are two key aspects of the temporal dimension that play an important role in activitytravel demand modeling (Goulias 1997). They are the timing of an activity episode and the duration (time allocation) of an activity episode (Mah massani and Chang 1985; Mahmassani and Stephan 1988; Abkowitz 1981). In other words, activitybased analys is allows one to answer the two critical questions: When is an activity pursued? For how long is th e activity pursued? In recent years, activitybased research has focused on the analysis of individual activity episodes so that both of these aspects may be studi ed in detail (Bhat 1996, 1998; Bhat and Misra 1999; Bhat and Singh 2000) Studies that focused on daily time allocations to various activity types were not able to addr ess the timeofday choice in activity engagement (Kasturirangan et al. 2002). Thus, conducting activitybased analysis at the individual activity episode leve l is crucial to gaini ng an understanding of the relationships between activity timing and duration (Hamed and Mannering 1993; Hunt and Patterson 1996; Levinson a nd Kumar 1995; Steed and Bhat 2000). The causal relationship between activity timing and duration is an important component of activitybased travel demand modeling systems that aim to explicitly capture the temporal dimension (Kitamura et al 2000; Mannering et al 1994; Pendyala et al 2002; Wang 1996; Wen and Koppelman 2000). On the one hand, one may hypothesize that the timing of an activity aff ects its duration. Perh aps activity episodes PAGE 124 113 pursued during peak periods are of short dura tion while those pursued in offpeak periods are longer in duration. On the other hand, the dur ation of an activity ma y affect its timing. Perhaps activities of longer duration are scheduled during the offpeak periods while activities of shorter duration ar e scheduled during peak periods. This section attempts to shed light on this relationship by exploring both causal structures in a simultaneous equations framework. By identifying the causa l structure that is most appropriate in different circumstances, one may be able to design activitybased model systems that accurately capture the relationship between activity timing and duration. 4.2.2 Data Preparation and Description for Model Analysis The data set is derived from Swiss Tr avel Microcensus 2000, which has been introduced in Chapter 3. The trip file was used to create an outofhome activity file where individual activity records were created from the trip records. This activity file included information about activity type, activity timing, activity duration, and other variables pertinent to each activity episode This section focuses on the relationship between activity timing and duration for main tenance activities. Maintenance activities included the following two activit y (trip) types: shopping and se rvice (passenger or child). These activity records were extracted from the original file to create two maintenance activity record files, one for commuters and one for noncommuters. Commuters were defined as individuals who commuted to a wo rk place on the travel diary day, while noncommuters were defined as those who did not commute to a work place (made zero work trips) on the travel diary da y. Note that a worker (employed person) who did not PAGE 125 114 commute on the travel diary day would still be classified as a noncommuter for the purpose of this study. Maintenance activities were pursued by 10833 individuals residing in 10554 households. Of these individuals, 2617 were commuters and they reported 3394 maintenance activities. The remaining 8216 individuals were noncommuters and they reported 11293 maintenance activ ities. The commuter and noncommuter maintenance activity episode data sets in cluded complete socioeconomic and activity information for the respective samples. For these specific data sets, Table 4.10 provides a summary of the household characteristics of these two samples for modeling purpose, in comparison with those in the whole Swiss sample. The average household size for the noncommuters and commuters household sample is 2.44 and 2.51 persons, individually, wh ich is close to average household size from the whole Swiss sample as being 2.43. Households of commuter sample report higher income levels than households of noncommuter sample, presumably because commuters households consistently include workers earning wages. Similarly, households of commuter sample report higher car ownership levels than households of noncommuter sample because commuters househol ds are more likely to own cars. The distributions of the other char acteristics are rather consiste nt across household samples. Table 4.11 compares the person characteristic s of samples with those of the whole Swiss sample. The major differences between commuters and noncommuters are consistent with expectations. Commuters are predominantly in the age groups of 3059 years while 37.8% of noncommuter s are older than or equal to 60 years of age. 67.4% of commuters are employed full time while only 21.0% of noncommuters are employed PAGE 126 115 full time. 88.7% of commuters hold driver license but 67.6% of noncommuters hold driver license. Finally, co mmuters make 1.62 work trips per day and 4.23 nonwork trips per day, which is almost equal to nonwork trip frequency of noncommuters (4.30 times per day). Table 4.10 Household Characteristics of Swi ss Travel Microcensus 2000 and Sample for Model of Maintenance Activity Duration and Timeofday Choice Characteristic Swiss Sample Noncommuters Sample Commuters Sample Sample Size 27918 7957 2597 Household Size 2.43 2.44 2.51 1 person 27.5% 29.4% 28.7% 2 persons 35.1% 33.1% 29.4% 3 persons 14.0% 11.5% 13.1% 4 persons 23.4% 26.0% 28.8% Monthly Income Low ( PAGE 127 116 Prior to commencing the model developmen t effort, descriptive analysis of the potential relationship between activity duration and timing was undertaken. The results are presented in Table 4.12 and Table 4.13. Table 4.11 Person Characteristics of Swiss Travel Microcensus 2000 and Sample for Model of Activity Duration and Timeofday Choice Characteristic Swiss Sample Noncommuters Sample Commuters Sample Sample Size 29407 8216 2617 Age (in years) 43.9 (Mean) 49.6 (Mean) 41.0 (Mean) Young (6~29) 26.8% 18.9% 19.3% Middle (30~59) 47.6% 43.3% 75.7% Old ( 60) 25.5% 37.8% 5.1% Sex Male 46.3% 36.1% 47.0% Female 53.7% 63.9% 53.0% Employment Status Full time 37.3% 21.0% 67.4% Part time 14.3% 14.9% 30.0% Not employed 48.4% 64.1% 2.6% Licensed 67.4% 67.6% 88.7% #Trips/day 3.51 4.30 5.85 Work trips 0.46 0.00 1.62 Nonwork trips 3.05 4.30 4.23 Based on a time of day distribution of all tr ips in the data set, four distinct time periods in which activity begins are identified. They are: AM peak: 6:00 AM 8:59 AM Midday: 9:00 AM 3:59 PM PM peak: 4:00 PM 6:59 PM Off peak: 7:00 PM 5:59 AM PAGE 128 117 Table 4.12 and Table 4.13 compare mean valu e and standard devi ation of activity duration across timeofday a llocation of activity within noncommuter sample and commuter sample. To alleviate the variance of dependent variable in linear regression model, Ln(1+activity duration in minutes) is sp ecified as dependent variable in the joint model system, noted as LN_DUR. In term s of mean value of LN_DUR, four timeofday choices of noncommuters can be ra nked into the following sequence: MIDDAY > AMPEAK > PMPEAK > OFFPEAK. However, the corresponding sequence for commuter is shifted as: MIDDAY > PMPE AK > AMPEAK > OFFPEAK, presumably because commuters are less likely to pursue maintenance activities in AM peak period than noncommuters due to the work sc hedule constraint. Generally speaking, commuters maintenance activ ities are of shorter duration than noncommuters. Table 4.12 Description of Endogenous Variables in Noncommuter Sample TimeofDay Choices Mean of LN_DUR Std. Dev. Of LN_DUR Mean of Duration Std. Dev. of Duration N AMPEAK (6:008:59) 2.90 1.38 41.24 73.13 1142 PMPEAK (16:0018:59) 2.68 1.40 37.22 80.59 1775 MIDDAY (9:0015:59) 3.21 1.29 50.70 90.62 7911 OFFPEAK (19:005:59) 1.87 1.53 22.09 48.84 465 Total 3.04 1.36 46.45 86.40 11293 Table 4.13 Description of Endogenous Variables in Commuter Sample TimeofDay Choices Mean of LN_DUR Std. Dev. Of LN_DUR Mean of Duration Std. Dev. of Duration N AMPEAK (6:008:59) 1.95 1.33 17.38 46.52 359 PMPEAK (16:0018:59) 2.77 1.19 31.58 66.47 1222 MIDDAY (9:0015:59) 2.89 1.25 35.09 55.93 1467 OFFPEAK (19:005:59) 1.84 1.53 21.38 47.43 346 Total 2.64 1.33 30.55 58.61 3394 PAGE 129 118 4.2.3 Model Estimation Results 4.2.3.1 Estimation Results for Noncommuters Table 4.14 offers a description of explan atory variables used in all the causal models for timeofday choice and duration of maintenance activ ities. Among these variables, LN_DUR is an endogenous continuous variable and AMPEAK, PMPEAK and MIDDAY are three endogeno us dummy variable indicating timeofday choices. Table 4.15 provides the estimation resu lts of noncommuter model under the causal structure where activity duration is predetermined and affects timeofday allocation of activity. For comparison, the tabl e is composed of four blocks. The first block offers estimation results from recursive estimation, i.e. a multinomial logit model for timeofday choices among four time periods and a linear regression model for the logarithm of activity duration. The second block offers the estimation results of a recursive unidentified mixed logit model and linear regression model. Here, f4 in offpeak choice model is seemi ngly smallest among all the fi, thus f4 is fixed at zero and thereby g4 also needs to be fixed at zero. Then an identified mixed discretecontinuous model can be estimated and the estimation resu lts are shown in the third block. The final block shows the estimation results of discretecontinuous model based on Lee transformation. In this block, gi/ri values represent the correlations between transformed utility function for discrete choi ce and error term in lin ear regression model, while in the second and the third block, gi/ri represents gi. Except the unidentified mixed model, al l the three identified models provide similar estimators for exogenous variables. Pa rticularly, the estimators in Lee models are PAGE 130 119 almost identical to those in recursive mode ls. Thats because joint estimation merely improves the efficiency of estimators for exoge nous variables. The absolute values of coefficient in mixed timeofday choice mode l are generally somewhat greater than the recursive model and Lee model, possibly caused by the involvement of additional heterogeneities in a late nt utility functions. All the coefficients of exogenous variables have reasonable behavioral interpretation. In all the models, AGE ta kes greatest positive coefficient in AM peak choice utility function, which indicates that elder noncomm uters are most likely to schedule maintenance activities in AM peak period and least likely in Offpeak period. Elder may undertake more responsibility of taking children to school or shopping for grocery in AM peak period. Noncommuters living with more household members tend to pursue maintenance activities in AM peak period as evidenced by the positive coefficient of HHSIZE, presumably because th ey have to undertake more responsibility of serving children in AM peak period. The positive coefficient of LOW_INC indicates that lowincome noncommuters prefer to e ngage in maintenance activity in AM peak period, possibly because their travel are more transitoriented or more dependent on nonmotorized mode thereby less sensitive to AM peakperiod traffic congestion. The noncommuters without household car are most lik ely to pursue mainte nance activities in Midday period and least likely in Offpeak pe riod, as evidenced by the greatest positive coefficient in Midday utility function and less positive coefficients in AM peak and PM peak utility functions. The dependency on tr ansit might be a plau sible explanation. Public transit may be the least congested in midday period and unavailable or the least secure in offpeak period. The negative coe fficients of MALE in PM peak utility and PAGE 131 120 Midday utility indicate that ma le noncommuters dislike scheduling maintenance activity in PM peak and Midday, as compared to fema le, possibly due to the fact that female noncommuters tend to stay at home in the morning and at night for household obligations. Highincome noncommuters prefer to sche dule maintenance activ ities in PM peak period as evidenced by the positive coefficient. Age and square of age appear signif icant in loglinear regression model for maintenance activity duration. Negative coefficient on age and positive coefficient on age square infer a nonlinear effect of age on activity duration. Thats probably because noncommuters in midage are more sensitive to time expenditure and less willing to spend much time on maintenance activities than younger and elder. Relative to female noncommuters, male noncommuters allocate less time on daily maintenance activities. Negative coefficient of HHSIZE indicates that people living with more household members spend less time in maintenance activities than those living in small family, presumably because shopping obligations can be assigned to more family members in a big household. Highincome noncommuters are expected to expend less time on maintenance activities, as indicated by the nega tive coefficient, possi bly due to their more concern on time budget. Car ownership a ppears significant in recursive model but insignificant in the other three types of models therefore it has been excluded from these models. The coefficients for endogenous variable are the most important outputs from the modeling estimation results. In recursive models, LN_DUR, indicating logarithm of activity duration, appears positively significant in all the three utility functions for timeofday choices. The coefficient in utility function for Midday choice is greatest. The PAGE 132 121 estimation results in Lee model are rather clos e to those in recursive models in spite of the accommodation of error correlations, possibl y because the correlations r1 (0.088) and r2 (0.171) are rather small, albeit statistically significant. In the unidentified mixed logit model, f2 appears smallest among all the fi. Thus f2 and g2 are fixed at zero in the identified mixed model, where different estimation results are found for endogenous variable. LN_DUR does not appear significant in utility functions for AM peak choice. Table 4.19 presents all the simulationbased hypothesis test results for error covariance estim ators in all the identified mixed models. In the current model, the signifi cance level of positive covariance f1g1 is 0.209, which is not of high level but considerab le in mixed model. A very large sample size is required to obtain accurate estimators for standard de viation of heterogene ities in mixed logit model. The positive correlation is calculate d as 0.231 according to Equation (2.4.5). The statistical result indicates that activity durat ion does not have significant impact on the utility of AM peak choice. Without accommodation of direct positive correlation between AM peak utility and activity durati on choice, the coefficient of LN_DUR in AM peak utility function is overestimated as 0.495 in the recursive model. Corr(u3, a) and Corr(u4, a) are estimated as 0.057 and 0.004, whic h are almost negligible. That is the reason why the coefficient for endoge nous variable LN_DUR in PM peak and Midday utility function of mixed model does not differ substantially from those in recursive model. Finally, the current mi xed model supports the hypothesis that noncommuters maintenance activities of longer dur ation are more likely to be pursued in PM peak period and Midday. PAGE 133 122 In the mixed continuous model without endogenous variable s, the standard deviation ( ) of normal random disturbance estimated from recursive model and Lee model are rather consistent (1.357 vs. 1.348). In the identified mixed model, one may have 358.1254.1 0.284 0.240 364.0 g g g2 2 2 2 2 2 3 2 2 2 1 1.357. It can be seen that, in the mixed linear regre ssion model, the random component has been divided into a linear combination of four pa rts: the first three parts are individually correlated to the first three util ity functions and the last part is idiosyncratic random error which is uncorrelated with the utility functions This result coincides with the a priori assumption for the mixed discretecontinuous modeling system. Table 4.16 offers model estimation resu lts of noncommuter model under the causal structure that timeofday choices affect activity duration. All the exogenous variables take the coefficients similar to those in Table 4.15. T hose coefficients for exogenous variables in Lee model are fairly cl ose to those in Recursive model. Except the coefficient for endogenous dummy variab le MIDDAY being re duced substantially (1.308 vs. 1.044), there is no considerable ch ange for AMPEAK and PMPEAK. It may be explained by the fact that r3 is estimated as 0.307. Please notice that ri represents corr(vqi *, ), where)]umax(F[vqi qj ij,I,2,1j i 1 qi and is random component in continuous model. qi is not normally but gumbel distributed, which is asymmetric. Then ri is basically negatively related with the correlation between qi and Therefore negative r3 value infers positive correlation between q3 and This is an explanation why the positiv e coefficient for the thir d endogenous dummy variable MIDDAY is reduced in Lee model that, to some degree, accommodates the positive correlation between the third utility function and conti nuous model. PAGE 134 123 In the unidentified mixed logit model, f1 takes seemingly smallest absolute value among all the fi, thus f1 and g1 are fixed at zero. The co efficients for endogenous dummy variables are close to those in recursive model and Lee model. Corr(u2 *, a) is calculated as 0.073, Corr(u3 *, a) as 0.104 and Corr(u4 *, a) as 0.091 according to Equation (2.4.5). These error co rrelations are too slight to substantially influence the coefficient of endogenous dummy variables, though simulationbased hypothesis test indicates that covariance f3g3 and f4g4 have relatively high si gnificance level at 0.089 and 0.072. In the continuous linear regression model, the standard deviation ( ) of normal random disturbance estimated from recursive model and Lee model are rather consistent (1.323 vs. 1.346). In the identified mixed mode l, one may have the standard deviation of the whole random components as 1.328, which is close to 1.323. All the three dummy variables indicating AM peak choice, PM peak choice and Midday choice appear positively significant in all types of models. These statistical results strongly support the hypothesis that timeofday choices of mainte nance activity affect the activity duration for noncommuters. Except Lee model, both recursive model and mixed model take the greatest positive coefficient on MIDDAY, less positive coefficient on AMPEAK and the least positive coefficient on PMPEAK. This result is consistent with the descriptive analysis in Table 4.12, where the ranking of timeofday categories are MIDDAY > AMPEAK > PMPEAK > OFFPEAK in terms of average activity duration. In summary, for noncommuters, AM peak, Midday and PM peak choice of maintenance activities positively affects activity duration. In these time periods, noncommuters have sufficient time available fo r maintenance activities without institutional PAGE 135 124 constraint such as closing time of shopping center. On the other side, maintenance activity duration positively aff ects Midday choice. In other words, maintenance activity of longer duration tends to be scheduled in midday period. Intuitively, noncommuters who intends to make longer maintenance activ ities probably prefer to start them in midday for having sufficient time, avoiding peakperiod congestion and institutional constraint. 4.2.3.2 Estimation Results for Commuters Table 4.17 offers model estimation resu lts for commuter model where activity duration affects timeofday c hoices. The exogenous variable s in all the models take almost identical coefficients. AGE takes positive coefficient in all the three utility functions, among which the one in PM peak utility is the greatest. It indicates that the elder commuters prefer to allocate maintenan ce activity in PM peak but does not tend to allocate it in Offpeak, presumably because the elder commuters are used to pursuing their maintenance activities in commute wa y from work place back home. Compared with female commuters, male commuters are more likely to undertake maintenance activity in offpeak period, as evidenced by the negative coefficients of PMALE in the utility functions for the other three time periods. Thats probably because females are unwilling to go out of home at night or in early morning for security purpose. Commuters living alone wit hout any other household members are less likely to undertake maintenance activities in AM peak and midday as evidenced by the negative coefficients on HHSIZE1 in both utility func tions. That is possibly because they do not have obligation of taking children to sc hool in AM peak period. In addition, they do not have to undertake maintenance activ ity in midday without urgent household PAGE 136 125 obligations. Commuters with no cars in household are more likely to undertake maintenance activities in AM peak, PM p eak and midday, as indicated by the positive coefficients of CAR_0. Thats possibly because the commuters with car are more likely to pursue activities in offpeak period since their schedule is not constrained by transit availability in that time period. The lowincome commuters are more likely to undertake maintenance activities in midday, as shown by the positive coefficient which appears slightly significant. As expected, lowincome commuters ha ve more spare time for pursuing maintenance activities in the mi ddle of daily work. The commute distance negatively affects AM peak engagement of maintenance activities. Uncertainty in commute time increases as commute distance increases, thus commuters are unwilling to undertake additional activities in AM peak period on their commute. Total daily work time negatively affects midday engagement of ma intenance activities, which is consistent with expectation. The more time commuters spend on work, the less time is available for maintenance activity in midday. Table 4.14 Description and Definition of Variables in Timingduration Model Commuters Activity Sample NonCommuters Activity Sample Sample Size 3394 11293 Variable Name Variable Description Mean Std. Dev. Mean Std. Dev. AGE Age in 100 years 0.410.12 0.49 0.20 AGE_SQ The square of age (1002 years2) 0.180.10 0.28 0.20 PMALE Person is male 0.460.50 0.35 0.48 HHSIZE1 Singlemember family 0.270.44 HIGH_INC Monthly household income > Fr 10000 0.10 0.30 LOW_INC Monthly household income < Fr 4000 0.100.31 0.24 0.43 CAR_0 Household does not own car 0.130.34 0.20 0.40 LN_DISW Ln(1 + commute distance in kilometers) 1.911.02 WORKDUR Daily total work time (100 mins) 4.151.66 HHSIZE Total number of household members 2.561.33 2.55 1.39 LN_DUR Ln(1 + activity duration in minutes) 2.641.33 3.04 1.36 AMPEAK Activity is scheduled in AM peak 0.110.31 0.10 0.30 PMPEAK Activity is scheduled in PM peak 0.360.48 0.16 0.36 MIDDAY Activity is scheduled in Midday 0.430.50 0.70 0.46 PAGE 137 126 Table 4.15 Noncommuter Model (Duration Timeofday) Recursive Models Unidentified Mixed Models Identified Mixed Models Lee Models Variable Coeff. ttest Coeff. ttest Coeff. ttest Coeff. ttest Activity TimeofDay Choice Model AM Peak Choice Model Constant 2.502 11.66 12.635 1.93 2.873 1.33 2.226 10.15 AGE 3.760 11.84 8.984 2.48 4.557 2.34 3.768 11.88HHSIZE 0.129 4.43 0.485 1.87 0.155 1.30 0.127 4.36LOW_INC 0.186 2.36 0.695 1.53 0.271 1.05 0.185 2.35CAR_0 0.678 3.35 0.447 1.02 0.677 2.80 0.647 3.24LN_DUR 0.495 12.20 0.310 1.76 0.044 0.04 0.377 8.92f1 7.369 1.90 2.167 0.60 PM Peak Choice Model Constant 0.023 0.160.030 0.18 0.009 0.02 0.144 0.94 AGE 1.206 4.351.263 4.35 1.206 4.13 1.192 4.30PMALE 0.234 3.250.383 3.65 0.260 1.96 0.229 3.18HIGH_INC 0.146 1.770.000 0.000 0.156 1.89CAR_0 0.565 2.87 0.545 2.75 0.557 2.79 0.535 2.75LN_DUR 0.384 10.22 0.393 9.67 0.390 1.93 0.305 7.73f2 0.018 0.03 0.000 Midday Choice Model Constant 0.003 0.02 0.105 0.34 0.193 0.17 0.380 2.70 AGE 2.474 9.52 2.976 3.79 2.514 6.43 2.475 9.56PMALE 0.199 3.46 0.380 3.33 0.224 1.39 0.205 3.56CAR_0 0.792 4.22 0.882 3.57 0.803 4.03 0.762 4.13LN_DUR 0.665 18.81 0.792 4.29 0.741 1.83 0.514 14.02f3 1.460 1.12 0.440 0.25 OffPeak Choice Model f4 0.179 0.22 0.029 0.02 Activity Duration Model Constant 3.404 56.48 3.407 41.66 3.439 40.96 3.524 43.38 AGE 0.760 3.53 0.793 2.73 0.959 3.08 0.782 2.71AGE_SQ 0.821 3.75 0.863 2.92 1.038 3.27 0.764 2.61PMALE 0.147 7.42 0.147 5.46 0.146 5.40 0.132 4.94HHSIZE 0.056 6.46 0.060 5.29 0.060 5.30 0.057 5.09HIGH_INC 0.136 4.14 0.148 3.38 0.141 3.02 0.144 3.34CAR_GE2 0.045 1.96 0.000 0.000 0.000 g1/r1 0.000 0.364 1.18 0.088 4.66g2/r2 0.000 0.000 0.171 9.53g3/r3 0.000 0.240 0.29 0.000 g4/r4 0.000 0.284 1.08 0.326 14.92sigma 1.357 1.358 150.29 1.254 10.18 1.348 151.10 PAGE 138 127 Table 4.16 Noncommuter Model (Timeofday Duration) Recursive Models Unidentified Mixed Models Identified Mixed Models Lee Models Variable Coeff. ttest Coeff. ttest Coeff. ttest Coeff. ttest Activity TimeofDay Choice Model AM Peak Choice Model Constant 1.360 7.13 2.650 2.74 1.267 5.19 1.360 7.14 AGE 3.805 12.29 11.200 3.33 3.916 12.05 3.800 12.27HHSIZE 0.132 4.56 0.413 2.85 0.130 4.44 0.133 4.54LOW_INC 0.185 2.34 0.467 1.76 0.171 2.15 0.178 2.27CAR_0 0.795 3.95 1.937 3.02 0.801 3.92 0.814 4.04f1 2.437 2.98 0.000 PM Peak Choice Model Constant 0.826 6.74 2.293 1.38 0.826 4.32 0.831 6.77 AGE 1.237 4.59 2.955 1.33 1.313 4.63 1.220 4.53PMALE 0.239 3.33 0.000 0.000 0.238 3.29HIGH_INC 0.180 2.20 1.312 1.69 0.181 2.18 0.185 2.26CAR_0 0.667 3.40 0.957 1.25 0.706 3.56 0.678 3.45f2 14.032 3.00 0.143 0.73 Midday Choice Model Constant 1.686 15.22 6.612 3.48 1.708 8.84 1.694 15.28 AGE 2.521 10.14 8.258 3.14 2.617 9.68 2.497 10.05PMALE 0.238 4.20 0.000 0.000 0.238 4.19CAR_0 0.915 4.93 2.623 3.42 0.963 5.08 0.926 4.98f3 5.939 3.25 0.366 1.35 OffPeak Choice Model f4 4.684 3.01 0.552 1.46 Activity Duration Model Constant 2.313 31.03 2.314 23.46 2.331 15.09 2.357 8.82 AGE 0.897 4.17 0.915 3.22 0.935 3.28 0.822 2.88AGE_SQ 0.785 3.59 0.808 2.80 0.853 2.96 0.772 2.68PMALE 0.121 6.08 0.121 4.59 0.117 4.49 0.134 5.01HHSIZE 0.052 6.06 0.055 4.93 0.053 4.76 0.059 5.33HIGH_INC 0.126 3.84 0.132 3.10 0.134 3.14 0.142 3.31CAR_GE2 0.023 1.01 0.000 0.000 0.000 AMPEAK 1.016 18.30 1.017 13.85 0.840 5.96 1.099 2.59PMPEAK 0.794 15.20 0.795 11.50 0.740 3.79 0.802 2.56MIDDAY 1.308 27.19 1.309 20.58 1.313 8.71 1.044 3.96g1/r1 0.000 0.000 0.063 0.44g2/r2 0.000 0.876 10.57 0.031 0.36g3/r3 0.000 0.504 5.26 0.307 5.80g4/r4 0.000 0.305 3.32 0.018 0.20sigma 1.323 1.323 150.29 0.805 10.13 1.346 109.44 PAGE 139 128 In loglinear regression model for main tenance activity duration, age, gender, household size, total daily work time and car ownership are found to be significant contributing factors. Quadratic term of ag e is specified for ca pturing the nonlinear impact of age on activity duration. As oppos ed to noncommuter model, AGE takes positive coefficient and AGE_SQ takes negative coefficient. These results indicate that commuters in midage are expected to unde rtake longer maintena nce activities than younger and elder. Commuters in midage have to undertake much more responsibilities than younger and elder commuters. Similar to noncommuters activity durations, male commuters activity durations are expected to be shorter than female commuters as evidenced by the negative coefficient of PMALE. Commuters living with more household members are less likely to engage in to longer activity dura tion, as evidenced by the negative coefficient of HHSIZE, si milar to noncommuters. Total daily work time negatively affects main tenance activity duration, as indicated by the negative coefficient for WORKDUR. As expecte d, the more time commuters spend on work, the less time is available for maintenance activities. The negative coefficient for CAR_0 indicates that commuters without ca rs in household tend to allocate more time on maintenance activity than those with household cars. These commuters should heavily depend on public transit, thus the fi xed schedule of transit service may lengthen their activity duration. As for the estimation results for endogenous variable LN_DUR, there are no substantial difference found in all types of models. LN_DUR takes positive coefficient in the utility functions for PM peak choice and Midday c hoice and take s insignificant coefficient in the utility function for AM peak choice. It implies that activity duration PAGE 140 129 negatively affects AMpeak or Offpeak c hoice of maintenance activities probably because the activities of l onger duration cannot be pursued due to work schedule (e.g. work starts in the morning) or institutiona l constraint (e.g. shopping center is closed at night and in early morning). The minor difference in the coefficients for endogenous variable among recursive model, mixed model and Lee model is cause d by slight correlations between random error terms. The unidentified mixed logit mode l justifies that f4 takes seemingly smallest value, thus f4 and g4 are fixed at zero in the identified mixed disc retecontinuous model. In the identified mixed model, the correla tions are calculated as 0.030, 0.129 and 0.013 according to Equation (2.4.5). In Lee model, r1 (0.285) and r4 (0.261) appear statistically significant and rather considerab le, thus the coefficient of LN_DUR is a bit less than that from recursive estimation. It ca n be seen the standard deviation fi for normal heterogeneity appears small and insignificant in mixed discrete choice model. Indeed, it reflects that commuters daily activity pattern is constrained by their rigid work schedule. The flexibility of maintenance activities is limited for commuters, thus there are not many unspecified factors contributi ng to timeofday choices of maintenance activity. In addition, Table 4.19 shows the significance levels of error covariance are 0.387, 0.288 and 0.429, which does not provide strong evidence for the existe nce of error correlations. This result indicates that there are not many unspecified factors simultaneously affecting commuters maintenance activity timing and duration, which is consistent with the finding in Pendyala and Bhat (2004). The standard deviations of random error components in continuous model are reasonable and fairly consistent in all types of models. The standard deviation of the tota l random component in the identified mixed PAGE 141 130 model can be calculated as 1.306, which is al most equal to the c ounterpart (1.307) in recursive model and is close to the c ounterpart (1.298) in Lee model. Table 4.18 offers model estimation resu lts for commuter model where timeofday choices affect activity durat ion. The exogenous variables in all the models take the coefficients with the same sign and slight va riations in magnitude. The coefficient of AGE in PM utility function of Lee model is greatly different from the others due to the exclusion of insignificant AGE_SQ variable. All these coefficients for exogenous variables take the same sign as those in Table 4.17. The unidentified mixed logit model indicat es that f4 takes seemingly smallest value, thus f4 and g4 are fixed at zero in the identified mixed model. The estimation results for endogenous variables greatly diffe r from one another among various types of models. The coefficient of AMPEAK is not significant in recursive model. However, the coefficient of AMPEAK a ppears significan tly negative (1.422) in Lee model as the error correlation r1 is signif icantly negative (0.526) which basically indicate a strongly positive correlation between the corresponding random error terms. This estimation result is consistent with that in Pendyal a and Bhat (2004), where they also found that AMPEAK takes negative coefficient in comm uters activity duration model. However, the estimated coefficient for AMPEAK from mix ed model is insignificant. It explicitly indicates that the coeffici ent estimation for endogenous dummy variables is highly sensitive to the specification of error stru cture in the joint modeling system. PMPEAK takes significantly positive co efficient (0.867) in recursive model, but takes insignificant coefficient in iden tified mixed model and Lee model. The corresponding Corr(u2,a) is calculated as 0.243 and ta kes significance level of 0.187 in PAGE 142 131 the mixed model, while the corresponding e rror correlation r2 is 0.648 in Lee model, inferring strongly positive correlation between the utility function for PM peak choice and continuous activity duration model. At this point, both mixed model and Lee model yield consistent results: unspecified factor s associated with PM peak choice positively affect duration of maintenance activities, but PM peak choice itself does not exert significant impact on the activity duration. MIDDAY takes significantly positive coe fficients 0.960 in recursive model and 0.813 in identified mixed model. The Corr(u3, a) is calculated as 0.080 and insignificant (significance level: 0.313). That is the reason why MIDDAY coefficient in mixed model does not differ greatly from that in r ecursive model. However, in Lee model, MIDDAY takes much greater positive coe fficient 1.791 while r3 is estimated as 0.726 indicating a highly negative correlation betw een the utility of midday choice and activity duration. The standard deviation of total random component in the mixed continuous model can be calculated as 1.267, which is close to the standard deviation 1.258 in recursive model and much less than the standard de viation 1.477 in Lee model. The greater standard deviation of Lee model is probably caused by the positive coefficient of WORKDUR in loglinear activity durati on model. In the recursive model, WORKDUR takes significantly negative coeffi cient as 0.042. In the identified mixed model, WORKDUR still takes ne gative coefficient as 0.034 in spite of insignificance. From behavioral perspective, total daily work time is expected to negatively affect commuters maintenance activity duration. Th e positive coefficient of WORKDUR in Lee model is counterintuitive. The distribut ional assumption on the whole latent utility PAGE 143 132 including both systematic component and ra ndom component may be contributing to this problem in Lee model. Note that WO RKDUR has been specified and appears significant in Midday utility function. The highly negative correlation between Midday utility function and activity duration model, both of which include WORKDUR variable, may result in such a counterin tuitive estimator fo r WORKDUR in the continuous model. In general, the modeling estimation resu lts are rather consistent with the descriptive analysis in Table 4.13. On av erage, commuters maintenance activities starting in Midday and PM peak are longer than those in AM peak and Offpeak period. This result implies that activity duration is positively correlated with Midday and PM peak activity beginning time. In summary, only midday choice of maintenance activity positively affects activity duration for commuters. Due to constraint of fixed work schedule, commuters usually do not have much time for maintenance activities. Since midday period includes lunch time, comm uters may like to undertake a longer maintenance activity in this time period. On the other side, maintenance activity duration positively affects PM peak choice and midday c hoice. Intuitively, if commuters plan to make maintenance activities of longer duration, they would like to schedule them in midday or PM peak period. That is because midday includes lunch time at noon and PM peak period is flexible after work, during which longer maintenan ce activities can be undertaken on the way back home. PAGE 144 133 Table 4.17 Commuter Model (Duration Timeofday) Recursive Models Unidentified Mixed Models Identified Mixed Models Lee Models Variable Coeff. ttest Coeff. ttest Coeff. ttest Coeff. ttest Activity TimeofDay Choice Model AM Peak Choice Model Constant 1.050 3.26 5.280 1.15 1.098 2.61 0.900 2.75 AGE 3.610 5.56 7.137 1.66 3.713 5.28 3.592 5.50PMALE 0.400 2.59 0.371 0.91 0.413 2.57 0.471 3.03HHSIZE1 0.405 2.72 1.149 1.23 0.403 2.47 0.419 2.81CAR_0 0.974 3.50 1.681 1.64 0.987 3.44 0.940 3.42LN_DISW 0.105 1.83 0.328 1.09 0.112 1.81 0.118 2.06LN_DUR 0.035 0.58 0.988 0.84 0.019 0.16 0.001 0.02f1 6.088 1.18 0.665 0.85 PM Peak Choice Model Constant 1.347 3.08 1.484 2.82 1.632 2.44 1.012 2.31 AGE 5.590 2.75 6.174 2.61 5.352 2.52 5.419 2.68AGE_SQ 4.290 1.76 4.978 1.75 4.069 1.62 4.117 1.70PMALE 0.311 2.44 0.324 2.42 0.285 2.07 0.363 2.84CAR_0 0.812 3.41 0.824 3.32 0.787 3.23 0.793 3.38LN_DUR 0.530 10.72 0.538 10.02 0.657 2.80 0.410 7.41f2 0.461 0.63 0.210 0.56 Midday Choice Model Constant 0.483 1.81 0.585 1.83 0.476 1.45 0.916 3.35 AGE 2.522 4.68 2.569 4.40 2.514 4.65 2.515 4.68PMALE 0.630 4.94 0.673 4.38 0.629 4.91 0.685 5.37HHSIZE1 0.182 1.98 0.205 1.81 0.166 1.66 0.165 1.79LOW_INC 0.210 1.68 0.241 1.55 0.213 1.68 0.217 1.74CAR_0 0.640 2.63 0.642 2.58 0.632 2.59 0.619 2.58WORKDUR 0.274 12.07 0.307 4.37 0.280 10.72 0.278 12.25LN_DUR 0.582 11.79 0.601 9.14 0.595 7.01 0.418 7.98f3 0.477 0.50 0.078 0.19 OffPeak Choice Model f4 0.120 0.25 0.000 Activity Duration Model Constant 2.713 13.97 2.72010.722.72210.70 2.834 11.36 AGE 2.601 2.85 2.5652.152.5522.13 2.587 2.20AGE_SQ 2.853 2.60 2.8091.962.8081.96 2.871 2.03PMALE 0.196 5.64 0.1964.320.1874.13 0.176 3.92HHSIZE 0.097 7.14 0.0975.470.0975.50 0.097 5.74WORKDUR 0.071 6.77 0.0715.180.0715.21 0.058 4.22CAR_0 0.121 2.31 0.1221.760.1271.83 0.000 g1/r1 0.0000.0860.54 0.285 8.38g2/r2 0.0001.0408.64 0.094 1.76g3/r3 0.0000.2761.90 g4/r4 0.0000.0000.261 7.67sigma 1.307 1.30782.390.7364.98 1.298 82.45 PAGE 145 134 Table 4.18 Commuter Model (Timeofday Duration) Recursive Models Unidentified Mixed Models Identified Mixed Models Lee Models Variable Coeff. ttest Coeff. ttest Coeff. ttest Coeff. ttest Activity TimeofDay Choice Model AM Peak Choice Model Constant 0.973 3.17 4.407 1.95 1.161 0.99 0.954 3.17 AGE 3.623 5.57 6.611 2.89 3.699 4.65 3.345 5.19PMALE 0.387 2.52 0.450 1.16 0.385 2.36 0.277 1.82HHSIZE1 0.479 3.26 1.284 2.12 0.519 2.28 0.387 2.87CAR_0 1.013 3.67 1.699 2.45 1.036 3.44 0.944 3.44LN_DISW 0.108 1.91 0.279 1.55 0.115 1.68 0.091 1.70f1 4.515 2.34 0.704 0.33 PM Peak Choice Model Constant 0.224 0.53 0.778 0.84 0.352 0.69 0.404 1.85 AGE 6.036 3.00 11.085 2.13 6.459 2.73 2.396 4.61AGE_SQ 4.551 1.88 9.600 1.65 5.096 1.78 0.000 PMALE 0.388 3.12 0.515 2.02 0.373 2.82 0.327 2.70CAR_0 0.965 4.10 1.486 2.79 0.991 4.05 0.929 3.97f2 2.218 2.00 0.640 1.05 Midday Choice Model Constant 1.809 7.45 3.696 2.94 1.893 5.83 1.789 7.53 AGE 2.768 5.25 4.419 2.74 2.791 5.16 2.696 5.23PMALE 0.716 5.76 1.323 2.62 0.728 5.39 0.631 5.21HHSIZE1 0.150 1.64 0.316 1.37 0.167 1.62 0.000 LOW_INC 0.217 1.74 0.532 1.48 0.236 1.67 0.184 1.78CAR_0 0.784 3.26 1.092 2.29 0.784 3.22 0.643 2.71WORKDUR 0.280 12.49 0.663 2.53 0.301 5.19 0.282 12.99f3 2.918 1.77 0.403 0.56 OffPeak Choice Model f4 1.4361.560.000Activity Duration Model Constant 1.863 9.28 1.863 7.38 1.997 6.27 1.571 3.05 AGE 2.350 2.57 2.350 2.04 2.658 2.23 2.339 2.04AGE_SQ 2.648 2.41 2.648 1.92 3.021 2.10 2.722 1.98PMALE 0.140 4.02 0.140 3.19 0.136 2.68 0.000 HHSIZE 0.083 6.08 0.083 4.84 0.086 4.93 0.075 4.39WORKDUR 0.042 3.86 0.042 3.07 0.034 1.49 0.045 2.66CAR_0 0.088 1.67 0.088 1.33 0.113 1.63 0.158 2.11AMPEAK 0.071 0.94 0.071 0.75 0.010 0.04 1.422 2.51PMPEAK 0.867 14.13 0.867 11.24 0.450 1.50 0.279 0.56MIDDAY 0.960 15.66 0.960 12.45 0.813 3.01 1.791 3.82g1/r1 0.000 0.094 0.24 0.526 6.10g2/r2 0.000 0.689 1.73 0.648 14.15g3/r3 0.000 0.337 1.57 0.726 15.96g4/r4 0.000 0.000 0.083 0.45sigma 1.258 1.258 82.39 1.004 3.57 1.477 34.81 PAGE 146 135 4.2.4 Model Performance Comparisons Based on Nonnested Test The model estimation results presented in Section 4.2.3 generally offer plausible statistical indications for alternative causal paradigms. Table 4.20 compares goodnessoffit measurements of models under alternative causa l structure. As derived in Section 2.3.3, the extension of nonnest ed test for discreteconti nuous model is adopted for Table 4.19 Simulationbased Hypothesis Test for Error Covariance of Identified Mixed Discretecontinuous Models i 1 2 3 4 Noncommuter Model (Duration TimeofDay) E(fi) 2.1670.4400.029 Std(fi) 3.6121.7601.450 E(gi) 0.3640.2400.284 Std(gi) 0.3080.8280.263 (fi,gi) 0.7090.9150.314 Sign.(figi) 0.2090.1280.549 Noncommuter Model (TimeofDay Duration) E(fi) 0.1430.3660.552 Std(fi) 0.1960.2710.378 E(gi) 0.8760.5040.305 Std(gi) 0.0830.0960.092 (fi,gi) 0.0860.1960.026 Sign.(figi) 0.2320.0890.072 Commuter Model (Duration TimeofDay) E(fi) 0.6650.2100.078Std(fi) 0.7820.3750.411E(gi) 0.0861.0400.276Std(gi) 0.1590.1200.145(fi,gi) 0.0650.2790.009Sign.(figi) 0.3870.2880.429Commuter Model (TimeofDay Duration) E(fi) 0.704 0.640 0.403 Std(fi) 2.1270.6890.719E(gi) 0.0940.8060.337Std(gi) 0.3870.3990.215(fi,gi) 0.5220.5480.004Sign.(figi) 0.6300.1870.313 PAGE 147 136 identifying the dominant causal structur e among the population. For noncommuters, both recursive model and mixed model indicat e that the model in which timeofday choices affects activity dura tion provides better goodnessof fit in terms of greater adjusted likelihood index. Also, nonnested test rejects the model in which activity duration affects timeofday choice. For commu ters, nonnested test fails to reject the model in which activity duration affects timeofday choice, therefore the causal relationship between timeofday choices and activity duration is still inconclusive for commuters. On the other side, Lee model supports opposite conclusion that the causal relationship of duration aff ecting timeofday choices is dominant among population. This finding is consistent with that in Pe ndyala and Bhat (2005), who applied Lee model for identifying causal relationship between tim eofday choices and maintenance activity duration based on the survey data from Florida, USA. Lee model also identifies that the dominant causal relationship fo r commuters is also duration timeofday but Pendyala and Bhat (2005) did not draw conclusi ve results for commuters. It is rather surprising to see that not only is the co efficient estimation of endogenous variables sensitive to the specification of error structure, but also the dominant causal structure is. Finally, Lee model provides better overall goodnessoffit of data than mixed model does. 4.2.5 Discussions and Conclusions Figures 4.1 and 4.2 summarize the causal relationships between activity duration and timeofday choices based on the mixed di scretecontinuous model and Lee model. There are some contradictive results associated with the impact of AM peak choice on the PAGE 148 137 activity duration betwee n different error structures. The diagrams only show the causal relationships that are consistently indicated by both mixed and Lee model. The causal relationship rejected by nonne sted is not dominant among p opulation but probably exist among population, thus both dominant and undomina nt causal relationship are illustrated in the figures for comparison purpose. For noncommuters, Midday a nd PM peak choice Table 4.20 Comparison of Goodnessoffit of Timingduration Models NonCommuter Models Commuter Models Duration TimeTime DurationDuration Time Time Duration Sample Size 11293 3394 LL at zero: LL(0) 35185.0 10476.7 LL at constant: LL(c) 29729.8 9846.93 Estimated sigma 1.3640 1.3254 Recursive Model # of Parameters 25 25 29 30 LL at convergence (LL) 29212.9 29193.2 9514.41 9513.36 2 at zero 0.1697 0.1703 0.0919 0.0920 Adj. 2 at zero 0.1690 0.1696 0.089082 0.089087 2 at constant 0.0174 0.0180 0.0338 0.0339 Adj. 2 at constant 0.0165 0.0172 0.0308 0.0308 Nonnested Test (Prob.) 0.0006 (0.000) 0.000005(0.147)Unidentified Mixed Model # of Parameters 27 27 33 33 LL at convergence 29211.0 29193.2 9512.67 9511.38 2 at zero 0.1698 0.1703 0.0920 0.0921 Adj. 2 at zero 0.1690 0.1695 0.088867 0.088990 2 at constant 0.0175 0.0180 0.0339 0.0341 Adj. 2 at constant 0.0165 0.0171 0.0306 0.0307 Nonnested Test (Prob.) 0.0005 (0.001) 0.00012(0.056)Identified Mixed Model # of Parameters 29 28 35 35 LL at convergence 29213.1 29196.6 9511.55 9511.69 2 at zero 0.1697 0.1702 0.0921 0.0921 Adj. 2 at zero 0.1689 0.1694 0.088783 0.088770 2 at constant 0.0174 0.0179 0.0341 0.0340 Adj. 2 at constant 0.0164 0.0170 0.0305 0.0305 Nonnested Test (Prob.) 0.0005 (0.000)0.000013(0.300) Lee Model # of Parameters 27 28 31 31 LL at convergence 29078.9 29187.8 9463.49 9497.50 2 at zero 0.1735 0.1704 0.0967 0.0935 Adj. 2 at zero 0.1728 0.1697 0.0938 0.0905 2 at constant 0.0219 0.0182 0.0389 0.0355 Adj. 2 at constant 0.0210 0.0173 0.0358 0.0323 Nonnested Test (Prob.) 0.0031 (0.000) 0.0032 (0.000) PAGE 149 138 positively affects activity duration. As e xpected, noncommuters have sufficient time available for maintenance activities without in stitutional constraint such as closing time of shopping center. On the other side, main tenance activity duration positively affects Midday choice. In other words, maintenan ce activity of longer duration tends to be scheduled in midday period. Intuitively, noncommuters w ho intends to make longer maintenance activities probably prefer to star t them in midday for having sufficient time, avoiding peakperiod congestion and institut ional constraint. After long maintenance activities in midday, noncommuters may have to get back home earlier, probably in PM peak period, for undertaking necessary household obligations. Note: Solid arrow represents positive impact Figure 4.1 Diagram of Consistent Causal Relationship Identif ied by Joint Timingduration Model for Noncommuters Note: Solid arrow represents positive impact Figure 4.2 Diagram of Consistent Causal Relationship Identif ied by Joint Timingduration Model for Commuters Midday Activity Duration PM Peak AM Peak Midday Activity Duration PM Peak AM Peak PAGE 150 139 For commuters, only midday choice of maintenance activity positively affects activity duration. Due to the constraint of fixed work sc hedule, commuters usually do not have too much time for maintenance activitie s. Since midday period includes lunch time, commuters may like to enjoy a longer maintena nce activity in this time period. On the other side, maintenance activity duration positively affects PM peak choice and midday choice. Intuitively, if commuters plan to ma ke maintenance activities of longer duration, they would like to schedule them in midday or PM peak period. That is because midday includes lunch time at noon and PM peak period is commute time after work, while longer maintenance activities can be made on the way back home. In summary, this section ha s presented an exploration of the relationship between activity timing (time of day choice) and activity episode duration for maintenance activities such as shopping and service. The analysis invo lved the estimation of joint models of activity timing and duration se parately for commuters and noncommuters while allowing two types of error correla tions between the timing and duration model equations. Time of day choice was modeled as a discrete choice va riable involving four alternative periods of the day while duration was modeled using a loglinear formulation. Two different causal structures were considered: Activity timing (time of day c hoice) affects activity duration Activity episode duration affects act ivity timing (time of day choice) Both of these causal structures we re estimated on the noncommuter and commuter sample activity episodes to identify the appropriate causal structure for each sample group. The identification of such causal relationships between activity engagement phenomena is very important from several key perspectives. First, the PAGE 151 140 identification of appropriate causal structures will help in the development of accurate activitybased travel demand model systems that intend to capture su ch relationships at the level of the individual traveler and activ ity episode. Second, a knowledge of the true causal relationships underlying decision proces ses will help in the accurate assessment and impact analysis of alternative transporta tion policies such as va riable pricing, parking pricing, and telecommuting. Unfortunately, the dominant causal relationship between timing and duration has not been consistently identified through tw o types of models. For both commuters and noncommuters, Lee model supports the causa l relationship that activity duration is determined first and then influence time of day choice. However, mixed discretecontinuous model supports the alternative casual relationship for both commuters and noncommuters: timeofday choices are firs t determined and then influence activity duration. Both mixed model and Lee model adopt Full Information Maximum Likelihood (FIML) method based on distributional assu mption of error structure in simultaneous model system, so as to consistently es timate the coefficient of endogenous dummy variable in continuous model or endogenous continuous variable in discrete choice model. The error structure in mixed model is a more behaviorally interpreta ble than the one in Lee model. However, the likelihood function of mixed model does not have closed form and Monte Carlo integral is required to approximate th e likelihood function. Maximum Simulated Likelihood Estimation (MSLE) ba sed on Monte Carlo integral is timeconsuming under the current leve l of computational technolo gy. Further, the simulation bias cannot be avoided in MSLE. More qua sirandom seeds for simulation can alleviate PAGE 152 141 the simulation bias, but the accuracy is tr aded with time consumption in estimation procedure. Lee model has a closed form based on Lee transformation. Estimation procedure of Lee model takes much less time than that of mixed model (a few minutes vs. a few hours). Moreover, Lee model better fits the data though herein the coefficient of endogenous variable is more of con cern, rather than the fitness. The dependency on strong distributional assumption is a common disadvantage of mixed model and Lee model. Maximum like lihood estimation is always consistent and efficient as long as the distributional assu mption is true and all the parameters are identifiable. However, the distributional assumption is vulnerable in many cases, particularly when it is assumed to take acc ount of unobserved heterogeneities. Since the coefficient of endogenous vari able is highly sensitive to distributional assumption, a robust specification of error structure turn s to be extremely important. For obtaining more robust estimation results, there might be tw o directions to furthe r explore this topic in the area of travel behavior analysis One is to introduce nonparametric heterogeneities into the joint model system. In econometric lite rature, Mroz (1987 and 1999) applied discrete factor approximation to estimate endogenous dummy variable in a continuous model. Sometimes, this met hod is called mass point method, in which heterogeneity is nonparametric and discrete ly distributed, in place of parametric and continuously distributed (e.g. normal distributi on, gumbel distributi on). However, it is not easy to apply this method in practice si nce the derived loglik elihood function is not globally concave and has multiple peaks. A large number of starting values need to be explored to avoid the pitfa ll of local maxima. The other way is to apply Limited Information Likelihood Estimation (LILE), whic h is more robust, albeit less efficient, PAGE 153 142 than FIML. Dubin and Mcfadden (1984) deve loped a twostage estimation procedure for joint discretecontinuous model system, where multinomial logit model is initially estimated and then a nonlinear function with re spect to the predictors is specified into continuous model as well as the other explanatory variables. This approach may be used for consistently estimating the endogenous dummy variables in continuous model but cannot be directly used for estimating the endogenous continuous vari ables in the latent utility function for discrete choice. That is because multinomial logit model needs to be estimated in advance without any endogenous variables. But the idea of twostage estimation merits our reference for devel oping a more robust modeling framework in which both continuous dummy variable a nd endogenous dummy variables can be consistently estimated. It remains challenging but interesting research effort for future. 4.3 Causal Models Between Trip Ti ming and Mode Choice (Mixed Binarymultinomial Choice Model) 4.3.1 Background Departure time choice and mode choice ar e important constituents of traveler behavior (Bhat, 1998). Travel demand models designed to estimate travel not only for the average weekday, but for different periods within the day (referred to as timeofday models), are increasingly required to analy ze a broad range of tran sportation policies and initiatives (Cambridge Systematics, 1997). In addition to the temporal dimension of trip making, mode choice is another facet of trip making that ha s important implications in the transportation policy context. Understa nding the relationships underlying these two facets of travel behavior will, in turn, assist planners in examining the potential PAGE 154 143 effectiveness of policy measures aimed at a lleviating traffic congest ion and reducing auto vehicle emissions. Such policies, motivated by recent legislation, call for the deployment of travel demand models capable of assessing a range of tr ansportation control measures (TCMs) (Stopher, 1993 and Weiner and Ducca, 1996). Early studies involving de parture time choice have focused mainly on work or commuting trips. Indeed, commuting directly contributes to morning and afternoon peak period congestion. The direct link between wo rk trips and peak travel has provided researchers (Noland and Small 1994, Kumar and Levinson 1994, Lockwood and Demetsky, 1994) the necessary impetus to undertake studies that aim at modeling departure time choice of commuters a nd understanding the relationship between commuter departure time choice an d traffic congestion levels. The interest in modeling nonwork trips also lies in their inherent nature of being more flexible than work trips in terms of the individuals timeofday choice and mode choice. For certain types of nonwork activit ies, such as shopping, the departure time flexibility is evident and therefore travel ers may have a greater tendency to shift departure times than shift modes in response to transportation cont rol measures (Bhat, 1998). Similarly, socialrecreation trips may be pursued at various times of the day unless the activity involves rigid time and space constraints such as those associated with concerts, sporting events, and movies. With respect to mode choice, nonwork activities and trips tend to be undertak en jointly with other househol d members or friends (Steed and Bhat, 2000). Such joint coupling c onstraints may make mode switching quite difficult; on the other hand, departure time shifts may still be feasible, particularly in PAGE 155 144 todays context of realtime activity scheduling using cellular communications technology. The causality between departure time choi ce and mode choice is quite important from a transportation planning and policy analysis context. If mode choice precedes departure time choice, then strategies aimed at reducing peak period travel should also focus significantly on peoples mode choice beha vior (because the departure time choice is influenced by mode choice). On the other hand, if departure time choice affects (and therefore precedes) mode choice, then strate gies aimed at reducing peak period travel demand can focus primarily on departure time as pects of behavior. Besides, strategies aimed at reducing SOV use would have to focus significantly on departure time choice aspects as well because mode c hoice is affected by departure time choice. In addition to the causal relationship between these two aspect s of behavior, attention must be paid to the potential simultaneity in their nature, in that, unobserved factor s affecting each of these may be correlated with one another. Thus, when modeling the relationship between departure time choice and mode choice, one needs to consider a rigorous simultaneous equations modeling framework. Treating m ode choice as multinomial choice variable and departure time choice as a binary c hoice variable, the proposed mixed binarymultinomial choice modeling methodology provid es a rigorous modeling framework in which the causal relationship be tween them can be analyzed. The central question addressed is: what is the causal re lationship between departure time choice and mode choice for non work trips? One may conjecture that people engaging in activities in the nonpeak period may choos e to travel by automobile because of the reduced traffic congestion and possibly better trans it levels of service PAGE 156 145 during such periods. Conversely, people choos ing to travel by the automobile may arrange their activities such that they can do so in th e nonpeak periods to avoid congestion. Similar causal relationships may be considered in the c ontext of peak period travel and/or nonauto travel. Thus, one may hypothesize causal re lationships between departure time choice and mode choice that are opposite to one another. This section attempts to shed light on this issue by id entifying the causal structure using proposed mixed binarymultinomial choice model, wher e peakperiod departure is modeled as a binary choice: peak vs. nonpeak and mode choice as 4alternative multinomial choices: SOV (Single Occupancy Vehicle), HOV (Hi gh Occupancy Vehicle), Transit and Nonmotorized Mode (Bicycle and Walk). 4.3.2 Dataset Preparation and Desc ription for Modeling Analysis The data set for modeling analysis is derived from Swiss Travel Microcensus 2000. LevelofService (LOS) variables asso ciated with travel modes are the most important variables influencing mode choice be havior. These data are only available for the model area of Canton Aargau, thus all the nonwork trips made in this area are selected to form a trip file including LOS variables for each pair of trips origin and destination, trip departure time, revealed m ode choice, trip purpose and socioeconomic and demographic variables of trip makers Two market segments: commuters and noncommuters are classified and separately m odeled with consideration of the influence from the work schedule constraint on commuters. Commuters were defined as individuals who commuted to a work place on the travel diary day, while noncommuters were defined as those who did not commute to a work place (made zero work trips) on PAGE 157 146 the travel diary day. Note that a worker (employed person) who did not commute on the travel diary day would still be classified as a noncommuter for the purpose of this study. Table 4.21 Household Characteristics of Swi ss Travel Microcensus 2000 and Sample for Model of Mode Choice and Timeofday Choice Characteristic Swiss Sample Noncommuters Sample Commuters Sample Sample Size 27918 2273 1753 Household Size 2.43 2.63 2.47 1 person 27.5% 26.7% 26.7% 2 persons 35.1% 29.3% 33.9% 3 persons 14.0% 12.1% 13.2% 4 persons 23.4% 31.8% 26.1% Monthly Income Low ( PAGE 158 147 In the sample, nonwork trips were pu rsued by 4260 individuals residing in 4026 households. Among these individuals, 1805 were commuters reporting 4619 nonwork trips and the remaining 2455 individuals we re noncommuters reporting 7984 nonwork trips. For these specific datasets, Table 4.21 provides a summary of the household characteristics of these two samples and comp ares the characteristics with that of the whole Swiss sample. The average hous ehold size for the noncommuters and Table 4.22 Person Characteristics of Swiss Travel Microcensus 2000 and Sample for Model of Mode Choice and Timeofday Choice Characteristic Swiss Sample Noncommuters Sample Commuters Sample Sample Size 29407 2455 1805 Age (in years) 43.9 (Mean) 44.1 (Mean) 41.1 (Mean) Young (6~29) 26.8% 32.0% 21.1% Middle (30~59) 47.6% 32.9% 72.0% Old ( 60) 25.5% 35.1% 7.0% Sex Male 46.3% 40.9% 59.2% Female 53.7% 59.1% 40.8% Employment Status Full time 37.3% 16.3% 75.5% Part time 14.3% 11.9% 19.3% Not employed 48.4% 71.7% 5.2% Licensed 67.4% 56.0% 88.1% #Trips/day 3.51 4.18 4.74 Work trips 0.46 0.07 1.63 Nonwork trips 3.05 4.11 3.11 commuters household sample is 2.63 and 2.47 persons, respectively. As expected, households of commuter sample report higher income levels than households of noncommuter sample, presumably because commuters households consistently include workers earning wages. Similarly, house holds of commuter sample report higher car PAGE 159 148 ownership levels than households of noncommuter sample because commuters households are more likely to own cars. N oncommuters are more likely to live with children who are 6 ~ 17 years ol d as shown by a higher percen tage than the whole Swiss sample. The percentage of households locati ng in rural area in the current samples are lower than that of the whole Swiss sample, probably because there are less rural areas in Aargau Canton. Table 4.23 Crosstabulation of Mode Choice and Timeofday Choices for Noncommuters TimeofDay Choices Mode Choice NonPeak Peak Total Frequency SOV 1070 491 1561 HOV 1096 542 1638 Transit 647 324 971 Nonmotorized2692 1122 3814 Total 5505 2479 7984 Column Percent SOV 68.5% 31.5% 100.0% HOV 66.9% 33.1% 100.0% Transit 66.6% 33.4% 100.0% Nonmotorized70.6% 29.4% 100.0% Total 69.0% 31.0% 100.0% Row Percent SOV 19.4% 19.8% 19.6% HOV 19.9% 21.9% 20.5% Transit 11.8% 13.1% 12.2% Nonmotorized48.9% 45.3% 47.8% Total 100.0% 100.0% 100.0% Table 4.22 compares the person characteristic s of samples with those of the whole Swiss sample. The major differences between commuters and noncommuters are consistent with expectations. Commuters are predominantly in the age groups of 30 ~ 59 years while 35.1% of noncommuter s are older than or equal to 60 years of age. 75.5% of commuters are employed full time while only 16.3% of noncommuters are employed full time. 88.1% of commuters hold driver license whereas 56.0% of noncommuters PAGE 160 149 hold driver license. Finally, commuters make 1.63 work trips and 3.11 nonwork trips per day, while noncommuters make 4.18 nonwork trips per day. Prior to commencing the model developmen t effort, descriptive analysis of the potential relationship between mode choice and peakperiod departure of trip was undertaken. Trip departed in the time period of 6:00 AM 8:59 AM and 4:00 PM 6:59 PM are defined as peakperiod trips. Ta bles 4.23 and Table 4.24 offer simple crosstabulations of timeofday choice against mode choice for noncommuters and commuters, respectively. Table 4.24 Crosstabulation of Mode Choice and Timeofday Choices for Commuters TimeofDay Choices Mode Choice NonPeak Peak Total Frequency SOV 1107 872 1979 HOV 472 238 710 Transit 217 318 535 Nonmotorized977 418 1395 Total 2773 1846 4619 Column Percent SOV 55.9% 44.1% 100.0% HOV 66.5% 33.5% 100.0% Transit 40.6% 59.4% 100.0% Nonmotorized70.0% 30.0% 100.0% Total 60.0% 40.0% 100.0% Row Percent SOV 39.9% 47.2% 42.8% HOV 17.0% 12.9% 15.4% Transit 7.8% 17.2% 11.6% Nonmotorized35.2% 22.6% 30.2% Total 100.0% 100.0% 100.0% For noncommuters, there are only slight differences in distribution across timeofday choices and mode choices. Se emingly, trips using HOV mode (33.1%) and Transit mode (33.4%) are more likely to be sc heduled in peak period, as compared with the average peak period distri bution of 31.0%. On the other side, peakper iod trips are PAGE 161 150 more likely to use HOV mode (21.9%) and Tran sit mode (13.1%), as compared with the average mode distribution of 19.6% for HOV mode and 12.2% for Transit mode. For commuters, the differences in distribution ac ross timeofday choices and mode choices are more remarkable than those for noncomm uters. Seemingly, trips using SOV mode (44.1%) and Transit mode (59.4%) are more lik ely to be scheduled in peak period, as compared with the average peak period dist ribution of 40.0%. On the other side, peakperiod trips are more likely to use SOV mode (47.2%) and Transit mode (17.2%), as compared with the average mode distri bution of 42.8% for SOV mode and 11.6% for Transit mode. 4.3.3 Model Estimation Results 4.3.3.1 Estimation Results for Noncommuters Table 4.25 offers definition and description of variables adopted in the models. Table 4.26 provides the noncommuter mode l estimation results under the causal structure where multinomial mode choices aff ect binary timeofday choice. The first block offers the estimation results of recurs ive models, i.e. a multinomial logit model for mode choices and a binary probit model for p eakperiod departure choice of trips. The second block offers the estimation results of mixed binarymultinomial choice model, in which the standard deviations gi of heterogeneity in binary probit model are fixed at 1. The third block offers the estimation results of mixed binarymultinomial choice model, in which fi is equal to gi in terms of the absolute value and the sign of figi is forced to be consistent with the sign of fi estimated in the second block. LevelOfService (LOS) variables, car ownership and transit seasonal ticket subscription are specified into the multinomial m ode choice model. In all the three types PAGE 162 151 of models, travel time using various m odes appears significantly negative in the corresponding utility function, as expected. TERMTIME and PKLOT_SH take negative coefficients, which is consistent wi th expectation that l onger terminal time and shortage of parking spaces at destination tend to reduce th e possibility of auto mode choice. It is almost impossible for persons with no cars in household to drive and to use SOV mode, thus the coefficient of CAR_0 appears highly nega tive (around 3.5) in SOV utility function. However, the coeffici ent appears modestly negative (around 1.3) in HOV utility function since these persons may use HOV mode as passengers although they are unlikely to be drivers. The persons with more than one car in household are more accessible to cars thereby more lik ely to use SOV mode and HOV mode, as evidenced by positive coefficients of CAR_GE2 in both utility functions. As for the transit mode choice, riders appear equally se nsitive to invehicle time and waiting time at the initial station, as indicated by the almost identically negative coefficients of IVEH and OWT. The positive coefficient of FREQ indicates that the more service frequencies can increase the possibility of tr ansit choice, which is consistent with expectation. People subscribi ng transit seasonal tickets ar e much more likely to use transit than those without subs cription, as indicated by the highly positive coefficient of TRST_SUB (around 2.0) in trans it utility function. In the binary probit model for peak peri od departure, it can be found that the persons elder than 60 years old are less likely to make their nonwork trips in peak period, presumably because old people are more sensit ive to traffic congestion and more inclined to avoid it than young people. The positive co efficient of HHSIZE2 indicates that noncommuters living in twomember household are more likely to make peakperiod non PAGE 163 152 work trips, possibly for sharing cars with the other household member who commutes in peak period. Shopping trips are less likely to be scheduled in peak pe riod than the trips for the other purposes, as evidenced by the negative coefficient of SHOPPING. The positive coefficient of SERVICE indicates that noncommuters prefer to make service trips in peak period, because the trips for taking children to school are the main body of service trips and most of them are unde rtaken in morning peak period. Table 4.25 Variable Description in Timingmode Choice Model Commuters Activity Sample NonCommuters Activity Sample Sample Size 4619 7984 Variable Name Variable Description Mean Std. Dev. Mean Std. Dev. CAR_TIME Car invehicle time (100 mins) 0.110.10 0.08 0.09 TERMTIME Car terminal time (min) 5.482.12 5.17 2.25 PKLOT_SH Measurement of parking lot shortage 2.645.53 2.27 4.91 CAR_0 Household does not own car 0.120.32 0.21 0.41 CAR_GE2 Household owns more than one car ( 2) 0.380.49 0.28 0.45 IVEH Transit invehicle time (100 mins) 0.120.13 0.09 0.11 OWT Waiting time at 1st transit station (min) 0.140.16 0.11 0.13 FREQ Transit frequency within 2 hours 7.637.88 7.71 8.10 TRST_SUB Transit seasonal ticket is subscribed 0.210.41 0.22 0.42 NM_TIME Average travel time by bicycle and on foot (100 mins) 0.670.84 0.47 0.70 OLD Person is over 60 years old 0.040.21 0.35 0.48 HHSIZE2 Household has two members 0.320.47 0.28 0.45 NSWISS Household is not located in a permanent address of Switzerland 0.010.12 0.00 0.07 SHOPPING Trip purpose is shopping 0.130.33 0.20 0.40 SERVICE Trip purpose is service 0.030.18 0.04 0.19 SOV Trip mode is SOV 0.430.49 0.20 0.40 HOV Trip mode is HOV 0.150.36 0.21 0.40 TRANSIT Trip mode is Transit 0.120.32 0.12 0.33 NMOTOR Trip mode is bicycle or walk 0.300.46 0.48 0.50 PEAK Trip is departed in peak period 0.400.49 0.31 0.46 In binary probit model, theoretically speak ing, the coefficients of variables should be proportional to the standard deviation of normal random error term in the utility function because dependent variable is an unobservable latent variable. This latent variable can be arbitrarily changed by scali ng up the coefficients and standard deviation PAGE 164 153 of normal random term without changing the probability of observed binary choices. Thus, in standard binary probit model, th e standard deviation of normal random error term is normalized at 1 for estimating a unique set of coefficients. In mixed binarymultinomial choice model, the standard deviation gi of normal heterogeneity is involved into the random component of binary probit model and then the standard deviation of random component must be greater than 1. Thus the coefficients in mixed model will be enlarged in response to the increment in th e standard deviation of random component. The ratio between the coefficients in mixed model and those in recursive model can be calculated as 1 g g g g2 4 2 3 2 2 2 1 In the mixed model where gi is fixed at 1, the ratio is a constant ( 236 2 5 ). In the second block, almost all the coefficients for other exogenous variables have been scaled up by 2 ~ 3 times. However, the coefficient of endogenous dummy variable TRANSIT is scaled up by around 7 times, much higher than 2.236 (0.133 vs. 0.900), which is caused by the correlation between the error terms in transit utility function and peakperiod de parture utility function. In the mixed model where gi is fixed at 1, this correlation can be estimated by the equation that ) 1 I )( 6 (f f ) v u ( Corr2 2 i i q qi Accordingly, the correla tion between the errors terms in transit utility function and peakperi od departure utility func tion is calculated as 0.305. This provides a good reason as to why the coefficient of TRANSIT is much greater than the calculated theoretical value (0.900 > 0.133.236 0.297). The correlations between utility f unctions of SOV, HOV, Nonmotorized and that of peakperiod departure are 0.147, 0.092 and 0.017, se quentially. SOV and HOV appear insignificant in all the models. PAGE 165 154 The third block provides the mixed m odel with the restriction that fi = gi, through which it is believed that the correla tions can be better accommodated into the simultaneous equations model. In the curren t model, the correlations between utility functions of SOV, HOV, Transit, Nonmotorized mode and that of peakperiod departure are 0.051, 0.013, 0.442 and 0.091, sequentially. In the current mixed model, the absolute values of correlations for SOV a nd HOV are less and thos e for Transit and Nonmotorized are greater than in the mixed model where gi is fixed at 1. The most useful contribution from the mixed m odel with the restriction fi = gi is the accommodation of error correlation greater than 447 0 1 I 1 (I = 4). The correlation of 0.442 between transit utility and peakperiod departure util ity can be rarely accommodated in the mixed model where gi is fixed at 1. Similarly, the coeffici ents of variables in binary choice model will be scaled up with th e involvement of additional heterogeneities. The ratio is calculated as 582 1 1 0.442 1.083 0.168 327 02 2 2 2 Correspondingly, the estimated coefficients in the current mi xed model are scaled up by around 1~2 times compared with in recursive model. Two t ypes of mixed models yield rather similar estimation results for endogenous dummy va riable indicating mode choices. TRANSIT takes positive coefficient 0.789, whic h is a bit less than 0.900 in the second block. Both types of models support the hypothesi s that transit trips are more likely to be scheduled in peak period. PAGE 166 155 Table 4.26 Noncommuter Model (Mode Timeofday) Recursive Models Mixed Models (gi is fixed at 1) Mixed Models (fi = gi) Variable Coeff. ttest Coeff. ttest Coeff. ttest Mode Choice Model SOV Mode Choice Model Constant 1.616 16.96 1.777 15.72 1.785 15.27 CAR_TIME 5.063 6.27 8.345 7.62 8.781 7.77 TERMTIME 0.037 2.03 0.009 0.48 0.008 0.42 PKLOT_SH 0.028 3.15 0.027 2.89 0.027 2.87 CAR_0 3.495 12.76 3.571 12.85 3.578 12.88 CAR_GE2 0.587 8.41 0.609 8.32 0.618 8.34 f1 0.448 2.35 0.327 1.04 HOV Mode Choice Model Constant 1.456 15.67 1.585 15.11 1.609 15.08 CAR_TIME 3.929 4.95 7.222 6.68 7.662 6.89 TERMTIME 0.088 4.93 0.061 3.21 0.059 3.07 PKLOT_SH 0.023 2.76 0.022 2.55 0.022 2.50 CAR_0 1.260 11.75 1.302 11.61 1.326 11.48 CAR_GE2 0.397 5.53 0.413 5.59 0.426 5.64 f2 0.269 1.24 0.168 0.56 Transit Mode Choice Model Constant 3.914 36.66 4.426 19.73 4.448 23.63 IVEH 1.577 2.41 2.898 3.73 3.103 3.93 OWT 1.627 3.37 2.889 4.73 3.126 4.96 FREQ 0.044 10.13 0.051 8.70 0.050 9.26 TRST_SUB 1.884 22.58 2.148 16.27 2.148 18.77 f3 1.061 5.70 1.083 7.90 Nonmotorized Mode Choice Model NM_TIME 4.448 26.58 5.031 23.17 5.192 21.04 f4 0.049 0.30 0.442 2.87 TimeofDay (PeakPeriod Departure Choice) Model Constant 0.482 19.27 1.031 13.16 0.753 11.84 OLD 0.116 3.48 0.358 4.72 0.259 4.26 HHSIZE2 0.072 2.06 0.183 2.34 0.132 2.33 NSWISS 0.607 2.31 1.257 2.16 0.934 2.23 SHOPPING 0.147 3.86 0.329 3.88 0.233 3.69 SERVICE 0.240 3.22 0.521 3.15 0.365 3.05 SOV 0.036 0.90 0.252 1.39 0.128 0.83 HOV 0.069 1.76 0.045 0.23 0.034 0.25 TRANSIT 0.133 2.84 0.900 5.89 0.789 4.84 g1 1.000 0.327 1.04 g2 1.000 0.168 0.56 g3 1.000 1.083 7.90 g4 1.000 0.442 2.87 PAGE 167 156 Table 4.27 provides the noncommuter mode l estimation results under the causal structure where binary timeofday choice affects multinomial mode choice. All the coefficients of exogenous variables take reasonable signs with good behavioral interpretation. Magnitude of coefficients varies across all types of models in a reasonable manner. In recursive model, the endogenous dummy variable PEAK appears positively significant in HOV utility function but insignificant in SOV and Transit utility function. PEAK appears insignificant in a ll the utility functions in mixed model where gi is fixed at 1, where the correlations betw een utility functions of SOV, HOV, Transit, Nonmotorized and that of peakperi od departure are 0.146, 0.070, 0.196 and 0.124, sequentially. In the mixed model where fi is equal to gi, PEAK appears insignificant in SOV and HOV utility functions and positively significant in Transit utility function. The correlations between utility functions of SOV, HOV, Transit, Nonmotorized and that of peakperiod departure are 0.000, 0.005, 0.160, 0.082, sequentially. Surprisingly, the absolute values of correlations are le ss than those in the mixed model with gi being fixed at 1. It indicates that various restrictions imposed on fi and gi make great impact on the estimation result of the endogeno us dummy variables. 4.3.3.2 Estimation Results for Commuters Table 4.28 offers commuter model estima tion results under th e causal structure where multinomial mode choices affect binary timeofday choices. All the exogenous variables take reasonable coefficients in m ode choice model. In the binary peakperiod departure model, commuters with kids in household are more likely to schedule their nonwork trips in peak period, as evidenced by the positive coefficient of WITH_KID. PAGE 168 157 Table 4.27 Noncommuter Model (Timeofday Mode) Recursive Models Mixed Models (gi is fixed at 1) Mixed Models (fi = gi) Variable Coeff. ttest Coeff. ttest Coeff. ttest Mode Choice Model SOV Mode Choice Model Constant 1.662 16.76 1.551 12.80 1.646 16.69 CAR_TIME 5.604 6.58 5.960 6.40 5.835 6.51 TERMTIME 0.032 1.76 0.033 1.68 0.032 1.72 PKLOT_SH 0.028 3.12 0.029 3.10 0.028 3.09 CAR_0 3.492 12.75 3.566 12.59 3.532 12.86 CAR_GE2 0.586 8.40 0.615 7.89 0.600 8.42 PEAK 0.103 1.38 0.455 1.20 0.062 0.51 f1 0.442 1.64 0.023 0.10 HOV Mode Choice Model Constant 1.521 15.66 1.537 9.43 1.494 15.38 CAR_TIME 4.497 5.37 4.896 5.23 4.713 5.34 TERMTIME 0.083 4.59 0.082 4.33 0.083 4.51 PKLOT_SH 0.023 2.72 0.023 2.70 0.023 2.71 CAR_0 1.258 11.73 1.302 11.02 1.301 11.84 CAR_GE2 0.395 5.50 0.414 5.38 0.410 5.59 PEAK 0.158 2.16 0.062 0.15 0.018 0.15 f2 0.203 0.60 0.091 0.53 Transit Mode Choice Model Constant 3.891 34.14 4.172 14.13 4.132 31.44 IVEH 1.898 2.81 1.960 2.77 1.925 2.77 OWT 1.786 3.56 1.879 3.57 1.831 3.54 FREQ 0.044 10.01 0.047 7.97 0.047 10.04 TRST_SUB 1.885 22.57 1.988 14.85 1.969 21.42 PEAK 0.034 0.35 0.110 0.34 0.317 2.04 f3 0.585 1.83 0.511 5.16 Nonmotorized Mode Choice Model NM_TIME 4.527 26.26 4.698 19.19 4.649 24.51 f4 0.370 1.02 0.360 4.15 TimeofDay (PeakPeriod Departure Choice) Model Constant 0.446 21.98 0.987 18.39 0.532 14.76 OLD 0.114 3.44 0.308 3.98 0.151 3.73 HHSIZE2 0.075 2.15 0.130 1.55 0.081 1.97 NSWISS 0.595 2.27 1.103 1.93 0.688 2.20 SHOPPING 0.145 3.79 0.340 3.74 0.171 3.64 SERVICE 0.253 3.44 0.513 2.43 0.256 2.92 g1 1.000 0.023 0.10 g2 1.000 0.091 0.53 g3 1.0000.511 5.16 g4 1.0000.360 4.15 PAGE 169 158 A plausible reason is that these persons have to undertake the responsibility of taking children to school or kindergarten on thei r commute way. Similar to noncommuters, commuters living in twomember household are more inclined to schedule nonwork trips in peak period, possibly because they need to serve the other household member on their commute way. For the similar reason, service trips are more likely to be scheduled in peak period by commuters, as evidenced by the positive coefficient of SERVICE. Different from noncommuters, commuters tend to schedule shopping trips in peak period probably for pursuing shopping activities on co mmute way, as indicated by the positive coefficient of SHOPPING. Since most comm uters go back home from work place in PM peak period, HOME takes positive coefficient in the model. In all types of models, the endogenous dummy variables SOV and TRANSIT take significantly positive coefficient in bi nary departure time choice model, but HOV appears insignificant in all the mode ls. In the mixed model where gi is fixed at 1, the error correlations betw een utility functions of SOV, HOV, Transit, Nonmotorized and that of peakperiod departure are 0.09 3, 0.304, 0.309 and 0.283, sequentially. In the mixed model where fi is equal to gi, the error correlations be tween utility functions of SOV, HOV, Transit, Nonmotorized and th at of peakperiod departure are 0.001, 0.385, 0.498 and 0.095, sequentially. The high error corr elation (0.498) cannot be allowed in the mixed model where gi is fixed at 1. As mentioned be fore, 0.447 is the most negative correlation which can be accommodated in that type of model. Under the influence of error correlations, the coefficients of SOV and Transit are much more positive than those in recursive model. However, no c onsiderable differences are found in the coefficients of SOV and Transit between two types of mixed models. PAGE 170 159 Table 4.28 Commuter Model (Mode Timeofday) Recursive Models Mixed Models (gi is fixed at 1) Mixed Models (fi = gi) Variable Coeff. ttest Coeff. ttest Coeff. ttest Mode Choice Model SOV Mode Choice Model Constant 0.618 4.81 0.747 4.74 0.831 5.74 CAR_TIME 5.130 5.64 7.245 5.85 9.350 6.47 TERMTIME 0.092 4.09 0.091 3.11 0.057 1.97 PKLOT_SH 0.036 4.19 0.043 4.24 0.041 4.32 CAR_0 2.850 11.54 3.234 10.90 3.126 11.08 CAR_GE2 0.683 8.23 0.805 7.07 0.734 7.63 f1 0.274 1.37 0.057 0.29 HOV Mode Choice Model Constant 1.490 9.55 1.956 7.24 2.090 8.09 CAR_TIME 6.718 6.97 9.189 7.04 11.536 7.84 TERMTIME 0.076 2.76 0.074 2.03 0.034 0.94 PKLOT_SH 0.021 1.96 0.028 2.11 0.025 1.96 CAR_0 1.693 7.37 2.033 7.09 1.892 6.87 CAR_GE2 0.499 4.86 0.631 4.62 0.584 4.56 f2 1.192 3.36 1.242 3.67 Transit Mode Choice Model Constant 3.960 24.14 4.730 14.98 4.822 17.55 IVEH 2.514 2.85 2.950 2.73 3.891 3.50 OWT 1.766 2.80 3.365 3.97 4.488 4.53 FREQ 0.014 2.21 0.019 2.40 0.019 2.38 TRST_SUB 2.560 20.72 2.982 14.11 3.050 16.59 f3 1.223 5.03 1.477 7.24 Nonmotorized Mode Choice Model NM_TIME 5.017 21.48 6.066 13.66 6.052 19.37 f4 1.051 3.28 0.545 1.85 TimeofDay (PeakPeriod Departure Choice) Model Constant 0.726 15.58 1.940 14.94 1.749 7.69 WITH_KID 0.148 2.11 0.285 1.85 0.304 1.97 HHSIZE2 0.102 2.47 0.217 2.40 0.234 2.55 SHOPPING 0.303 4.76 0.694 5.00 0.668 4.40 SERVICE 0.439 4.05 0.999 4.21 0.959 3.84 HOME 0.245 5.52 0.554 5.66 0.557 5.12 SOV 0.336 7.30 1.268 7.01 1.062 3.90 HOV 0.080 1.29 0.056 0.21 0.753 1.59 TRANSIT 0.719 10.88 2.759 13.92 2.789 7.20 g1 1.000 0.057 0.29 g2 1.000 1.242 3.67 g3 1.000 1.477 7.24 g4 1.000 0.545 1.85 PAGE 171 160 Table 4.29 Commuter Model (Timeofday Mode) Recursive Models Mixed Models (gi is fixed at 1) Mixed Models (fi = gi) Variable Coeff. ttest Coeff. ttest Coeff. ttest Mode Choice Model SOV Mode Choice Model Constant 0.748 5.55 0.535 3.71 0.547 3.87 CAR_TIME 4.729 4.78 4.931 4.57 5.087 4.75 TERMTIME 0.099 4.31 0.110 4.27 0.106 4.30 PKLOT_SH 0.035 3.98 0.037 3.97 0.036 3.99 CAR_0 2.860 11.58 2.949 11.29 2.955 11.50 CAR_GE2 0.688 8.28 0.739 7.64 0.728 8.05 PEAK 0.495 5.41 0.039 0.26 0.047 0.32 f1 0.061 0.35 0.111 0.45 HOV Mode Choice Model Constant 1.479 9.13 1.388 7.84 1.432 8.14 CAR_TIME 5.943 5.68 6.156 5.43 6.281 5.56 TERMTIME 0.088 3.15 0.099 3.23 0.096 3.23 PKLOT_SH 0.020 1.87 0.021 1.93 0.021 1.92 CAR_0 1.708 7.42 1.810 7.36 1.812 7.52 CAR_GE2 0.492 4.78 0.547 4.78 0.535 4.86 PEAK 0.098 0.88 0.397 2.13 0.384 2.09 f2 0.292 1.20 0.434 2.91 Transit Mode Choice Model Constant 4.366 23.88 4.296 18.17 4.343 20.28 IVEH 2.621 2.87 2.738 2.86 2.771 2.89 OWT 0.949 1.50 1.061 1.61 1.139 1.72 FREQ 1.422 2.21 0.015 2.23 0.016 2.31 TRST_SUB 2.600 20.78 2.649 17.67 2.680 18.69 PEAK 0.872 6.32 0.735 3.57 0.760 3.79 f3 0.298 0.98 0.449 2.48 Nonmotorized Mode Choice Model NM_TIME 4.888 20.42 5.243 14.98 5.207 18.21 f4 0.784 3.08 0.680 5.63 TimeofDay (PeakPeriod Departure Choice) Model Constant 0.606 11.07 1.335 10.84 0.818 7.01 WITH_KID 0.135 1.96 0.276 1.79 0.171 1.77 HHSIZE2 0.092 2.25 0.225 2.48 0.137 2.36 SHOPPING 0.356 5.68 0.759 5.37 0.464 4.74 SERVICE 0.444 4.17 0.911 3.73 0.558 3.45 HOME 0.349 8.10 0.752 7.64 0.457 5.93 LN_DISW 0.033 1.72 0.068 1.59 0.043 1.63 g1 1.000 0.111 0.45 g2 1.000 0.434 2.91 g3 1.000 0.449 2.48 g4 1.000 0.680 5.63 PAGE 172 161 Table 4.29 offers commuter model estima tion results under th e causal structure where binary departure time choice affects multinomial mode choices. All the exogenous variables take reasonable coefficients. In recursiv e model, the endogenous dummy variable PEAK appears significantly posit ive in both SOV utility function and Transit utility function. In the mixed model where gi is fixed at 1, the error correlations between utility functions of SOV, HOV, Transit, Nonmotorized and that of peakperiod departure are 0.021, 0.099, 0.101 and 0.233, respectiv ely. In the mixed model where fi is equal to gi, the error correlations between util ity functions of SOV, HOV, Transit, Nonmotorized and that of peakperi od departure are 0.007, 0.102, 0.109, 0.233, respectively. In both types of mixed models, the coefficient of PEAK in SOV utility function turns to be insignificant, that in HOV utility function turns to be significantly negative, but the coefficient in Transit ut ility function is still significantly positive but somewhat less than that in r ecursive model. No considerab le differences are found in the coefficients of SOV a nd Transit between two types of mixed models. 4.3.4 Model Performance Comparisons Based on Nonnested Test Table 4.30 compares the goodnessoffit meas urements across various types of model and causal structure. Nonnested test is employed to identify the dominant causal structure between mode choice and timeofday choice. Th e causal structure in which timeofday choice affects mode choice is re jected by nonnested test in all types of models. Thus, it is relatively safe to c onclude that both comm uters and noncommuters are more likely to make decision on mode choi ce then to select trip departure time conditional on the predetermined mo de. This finding is consiste nt with that in Tringides PAGE 173 162 et al. (2004), where recursive bivariate probit model is adopt ed. In addition, it is found that the mixed models genera lly better fit the data than recursive models and mixed models in which fi = gi better fit the data than mixed model in which gi is fixed at 1. Table 4.30 Comparison of Goodnessoff it of Timingmode Choice Models NonCommuter Models Commuter Models Mode Time Time Mode Mode Time Time Mode Sample Size 7984 4619 LL at zero: LL(0) 16602.3 9604.94 LL at constant: LL(c) 15539.8 9032.11 Recursive Model # of Parameters 27 27 27 28 LL at convergence (LL) 12526.0 12527.7 7293.66 7329.65 2 at zero 0.2455 0.2454 0.2406 0.2369 Adj. 2 at zero 0.2439 0.2438 0.2378 0.2340 2 at constant 0.1939 0.1938 0.1925 0.1885 Adj. 2 at constant 0.1922 0.1921 0.1895 0.1854 Nonnested Test (Prob.) 0.0001 (0.034) 0.0039 (0.000) Mixed Models (gi is fixed at 1) # of Parameters 31 31 31 32 LL at convergence 12486.1 12504.2 7264.35 7321.25 2 at zero 0.2479 0.2468 0.2437 0.2378 Adj. 2 at zero 0.2461 0.2450 0.2405 0.2344 2 at constant 0.1965 0.1953 0.1957 0.1894 Adj. 2 at constant 0.1945 0.1933 0.1923 0.1859 Nonnested Test (Prob.) 0.0011 (0.000) 0.0060 (0.000) Mixed Models (fi = gi) # of Parameters 31 31 31 32 LL at convergence 12480.5 12498.4 7254.83 7317.93 2 at zero 0.2483 0.2472 0.2447 0.2381 Adj. 2 at zero 0.2464 0.2453 0.2414 0.2348 2 at constant 0.1969 0.1957 0.1968 0.1898 Adj. 2 at constant 0.1949 0.1937 0.1933 0.1862 Nonnested Test (Prob.) 0.0011 (0.000) 0.0067 (0.000) 4.3.5 Discussions and Conclusions Figures 4.3 and 4.4 summarize and illustra te the causal relationships between binary timeofday choices and multinomial mode choices according to the mixed binarymultinomial choice model. The causal relationship rejected by nonnested is not dominant among population but probably ex ist among population, thus both dominant PAGE 174 163 and undominant causal relationship are illustrate d in the figures for comparison purpose. For noncommuters, transit riders are more likely to undertake trips in peak period compared with other modes. One plausible e xplanation might be the transit riders are not as sensitive to the peakperiod congestion as the travelers using ot her modes. Switzerland provides an excellent transit se rvice in peak period which enables transit riders pursue more nonwork trips in peak period than auto and nonmotorized travelers. On the other side, noncommuters peakperiod nonwork tr ips are more dependent on transit mode. As expected, travel time by auto or nonmo torized mode is highly sensitive to traffic congestion, thus travelers prefer to use pub lic transit for their nonwork trips. Similar to noncommuters, commuters nonwork trips by SOV and transit are more likely to be scheduled in peak period. Commuters may like to schedule their nonwork activities when driving on commute way, which serves as a reasonable explanation as to why their nonwork SOV trips are more likely to occur in peak period. Similar to noncommuter transit riders, commuter tran sit riders are less sensitive to traffic congestion in peak period than commuters us ing alternative modes. On the other side, commuters peakperiod nonwork trips are mo re dependent on transit mode but less dependent on HOV mode. Similar to nonco mmuters, commuters prefer to use public transit for their nonwork trip s in peak period so as to avoid traffic congestion. In addition, it is the rare case that commuters can make nonwork trips with passengers in the cars if they undertake nonwork tr ips on commute way, which explains why commuters nonwork trips are less likely to depend on HOV in peak period. Tringides et al. (2004) found that SOV m ode choice negatively affects peakperiod trip departure and peakperiod trip departure negatively affects SOV mode choice using recursive PAGE 175 164 bivariate probit model. The mixed binarymultinomial choice model further explores this problem and realizes that such negative effect s are attributable to the positive dependency between transit usage and peakperiod departure. Note: Solid arrow represents positive impact Figure 4.3 Diagram of Causal Relationship of Mixed Binarymultinomial Choice Models for Noncommuters Note: Solid arrow represents positive impact and dashed arrow represents negative impact Figure 4.4 Diagram of Causal Relationship of Mixed Binarymultinomial Choice Models for Commuters In summary, this section points to th e possible behavioral mechanism where people tend to first make choices that are s ubject to constraints a nd then make choices that are less constrained. For both commuters and noncommuters, mode choice is Transit Mode Period Peak Choice HOV Mode SOV Mode Choice Transit Mode Period Peak Choice HOV Mode SOV M ode C h o i ce PAGE 176 165 determined first because of possible modal av ailability constraints and greater departure time flexibility. People first think about the decision regarding mode and then determine the most suitable time for pursuing the nonwork activity. Th ese conclusions are reasonable and consistent with previous findings (Bhat, 1997). New microsimulation models of travel a nd activity behavior attempt to predict travel and activity patterns at the level of the individual decisionmaker or traveler. The development of such models calls for a d eeper understanding of the causal decision mechanisms that govern travel and activity pa rticipation decisions. Two major elements of travel and activity behavior include departure time choice and mode choice as planners would undoubtedly expect such advanced m odel systems to offer information about travel demand by mode and timeofday. This study attempts to shed considerable light on the relationship between these two elements of behavior by considering alternative formulations of joint model systems of de parture time choice and mode choice for nonwork trips. As departure time choice for wo rk trips tends to be governed largely by work schedules and constraints, st udies of work trip departur e time choice have largely examined the issue with respect to traveler se nsitivity to congestion, travel time reliability, and arrival/departure time wi ndow sizes. On the other hand, less attention has been paid to the issue of departure time choice for nonwork trips, a growing segment of trip making that is accounting for a larger sh are of trips at all times of day. This section considers two alternative formulations of joint model systems indicating two possible alternat ive causal relationships between departure time choice and mode choice for nonwork trips. The an alysis employs the Swiss household travel survey data collected in 2000. The model es timation effort was conducted separately for PAGE 177 166 commuters and noncommuters due to the diffe rent scheduling and time constraints under which these demographic groups make activity and travel decisions. Mode choice were treated as multinomial choices among SOV, HO V, Transit and Nonmotorized mode and timeofday choice a binary choice between p eak period and nonpeak period. Under this scheme, the mixed binarymultinomial choice modeling framework was applied to estimate the model systems and clarify the dire ction of causal relati onships between these dimensions of behavior. It is believed that people generally ma ke decisions on choice variables that are more constrained first. For both commu ters and noncommuter samples, it was found that the data better supports the causa l relationship where mode choice preceded departure time choice. These findings are c onsistent with the no tion that choices on constrained dimensions are made first. Swi ss people may be more mode constrained than timeofday constrained due to the modal availa bility issue, need to engage in nonwork activities that serve household members and ot her household obligations (leading to more shared ride trips). Models of activity and travel behavior sh ould incorporate relationships such as those identified in this section to more accurately portray the decision mechanisms that may be driving traveler patterns. As with most research efforts of this type, limitations apply to this study and additional research is warranted. First and foremost, it must be recognized that the identification of true causal re lationships based on a statis tical analysis of revealed behavior data is extremely difficult and ch allenging. This study provides a framework by which alternative hypotheses re garding causal relationships ca n be tested, but true causal relationships may be best identified by colle cting and analyzing beha vioral process data PAGE 178 167 that collects information about the thought process that went into a certain decision or behavioral choice. Also, despit e the best efforts of the aut hors, research results may be sensitive to model specification and choice of explanatory vari ables. Finally additional research should examine whether the relati onships found to be more suitable in this section extend to other data sets and geographical contexts. PAGE 179 Chapter Five: Conclusions and Discussions 5.1 Contributions to the Field 5.1.1 Methodological Contribution On the modeling methodological aspect, th is dissertation is dedicated into proposing a simultaneous equations mode ling methodology integrating unordered discrete choices into the fram ework of structural equations model system, which allows the causal analysis between unordered discrete variable and continuous variable or between two unordered discrete variables. Such modeling methodol ogy is highly desired in travel behavior study, where many dependent variables of interest are unordered discrete in nature. Nonnested statistical test which used to be applied in discrete choice model has been extended into joint discretecon tinuous model system, thus alternative causal structure in discretecontinuous model can be compared and selected in a rigorous way. In addition, this dissertation contribu tes to addressing endogenous problem in discrete choice model for travel behavior re search community. IIA problem has been of much concern for many years but endogenous problem has not received transportation professions attention as much as it deserves. Econometricians have invested great effort and made considerable advance in this res earch topic, exemplified by semiparametric 168 PAGE 180 method for robust estimation (Lewbel, 2000). Compared with these approaches, the modeling methodology proposed in this dissertation is more practical but less robust. 5.1.2 Behavioral Contribution The plausible causal relati onship among the activity and travel variables, regardless of continuous variab les or discrete choices, can be quantified in the proposed econometric modeling framework. By co mparing the goodnessoffit measure of competing causal models, travel behavior rese archer may virtually identify the dominant causal relationship between activity timing a nd duration, trip chaining pattern and mode choice, trip departure time a nd mode choice. This modeli ng methodology allows travel behavior researcher better understand mechan ism of travelers decisionmaking process only through analyzing the reveal ed data which is available in most cases. In addition, the endogenous nature of activity and travel variables has been completely recognized in this dissertation, which corresponds to the co mprehensive correlations among travelers behavior. 5.1.3 Practical Contribution The causal relationship identified by th e proposed models can aid in the development of activitybased travel demand mode l system. It will guide the modelers to specify activitybased submodels and to decide the application sequence of these submodels, such as activity timing model, activity duration model and mode choice model. Section 4.1 identifies the dominant causal relationship between trip chaining and mode choice is Tour Type Mode Choice, Section 4.3 id entifies the dominant causal 169 PAGE 181 relationship between mode choice and tim eofday choice is Mode Choice TimeofDay Choice and the mixed model in Section 4.2 identifies that the dominant causal relationship between timing and dur ation is TimeofDay Choice Duration. It is interesting and surprising to depict a uniquely dominant decision process that most Swiss people follow for pursuing nonwork activities: Activity Sequence Mode Choice Timing Duration. This sequence of model a pplication is recommended according to the analytical results of this dissertation. 5.1.4 Empirical Contribution With the consideration of endogenous problem, the coefficient of endogenous variables can be more accurately estimated in the proposed simultaneous equations modeling system. In the proposed model, we may specify endogenous variables of interest, regardless of being continuous or discrete, into the mode l equation which can be linear regression model for continuous dependent variable or latent utility function for discrete choices. For example, variable pr icing policy may change timeofday choices of freeway users or transit riders and policy makers ar e concerned about how these people change travel mode in re sponse to the change in timeofday choices. In that case, endogenous variables indicating ti meofday choices are required to be specified into the mode choice model. The impact of endoge nous variables can be accurately estimated with the proposed modeling methodology, 170 PAGE 182 5.2 Future Research Direction It is far from the end to come up with a robust and practical modeling methodology that completely solves the endogen ous problem in discrete choice model. As found in this dissertation, the estimate of endogenous variable is highly sensitive to the assumption of error structure. In the future, more robust modeling methodology (semiparametric or limitedinformation method) needs to be introduced into travel behavior context, where the performance of those modeling approaches requires further exploration. In addition, various distributional assumptions on the error structures can be further explored in the mixed model framew ork. For example, normally distributional assumption can be replaced with lognormal distribution. In addi tion, the heterogeneity can be nonparametrically and discretely di stributed, similar to the assumption adopted by Mroz (1999). It still remains a wide space for future research effort. On the other side, we remain great enthusiasm in seeking a modeling methodology that allows the coex istence of two unidirectional causal relationships in one single model. As we know, population is not homogenous and people behave subject to different decisionmaking process. The dominant causal structure identified by the proposed model must be unidirectional, whic h cannot describe the behavior of all the people among the population. Ye et al. (2006) attempted to use a modeling approach, called simultaneous logit model (Schmidt and Strauss, 1975), to accommodate such bidirectional causal relationship. Howe ver, this modeling methodology assumes a simultaneous causal relationship at macrosc opic level and cannot al low the coexistence of alternative unidirectional causal structur es in one model. A desirable modeling methodology is expect not only to allow the coexistence of alternative unidirectional 171 PAGE 183 causal structures in one single model but also to be able to identify latent market segments that belong to certain casual structure. It rema ins interesting but challenging topic for future research effort. It must be noted that causal relationships are being extracted and examined in this dissertation from statistical relationships estimated on revealed outcome data. While such data provides insights into what people have done, it does not provide true insights into the decision mechanisms and behavioral processes underlying the revealed outcomes. One must exercise care when drawing infere nces regarding behavi oral causality from statistical indicators. In or der to truly understand and iden tify causal relationships, data regarding underlying behavior al processes and decision mechanisms are needed. Activitytravel scheduling surveys that i nvolve the collection of data on underlying behavioral processes make it possible to study travel decisions in a robust framework. Such data would greatly help further e xplore the causal linkage s among activitytravel variables. In addition, such data would help further explore the decision processes that govern activitytravel engagement patterns. Future research into the development of microsimulation models of activity and travel behavior should include attempts to collect and analyze such data. 172 PAGE 184 References Abkowitz, M. D. (1981). An analysis of the commuter departure time decision. Transportation 10 283297. Aptech (2005). GAUSS 7.0, Aptech Syst ems. Maple Valley, Washington. Arentze, T. and Timmermans, H. (2000). Albatross: A learningbased transportation oriented simulation system EIRASS, Eindhoven Univ ersity of Technology, The Netherlands. Axhausen, K. and Garling, T. (1992). Activitybased approaches to travel analysis: conceptual frameworks, models and research problems. Transport Reviews 12 493517. BenAkiva, M. E. and Swait, J. D. (1984). The Akaike likelihood ratio index. Transportation Science 20(2) 133136. BenAkiva, M. and Lerman, S. R. (1985). Discrete Choice Analysis: Theory and Application to Travel Demand The MIT Press, Cambridge. Bhat, C. R. (1995). A heteroscedastic ex treme value model of intercity mode choice. Transportation Research B 29(6) 471483. Bhat, C. R. (1996). A hazardbased duration model of shopping activity with nonparametric baseline specification a nd nonparametric control for unobserved heterogeneity. Transportation Research B 30(1) 189207. Bhat, C. R. (1997). Work travel mode choice and number of nonwork commute stops. Transportation Research B 31(1) 4154. Bhat, C. R. (1998a). Analysis of travel mode and departure time choice for urban shopping trips. Transportation Research B 32(6) 361371. Bhat, C. R. (1998b). A model of posthome arrival activity part icipation behavior. Transportation Research B 32(6) 387400. Bhat, C. R. (1998c). Accommodating flexible substitution patterns in multidimensional choice modeling: formulation and application to travel mode and departure time choice. Transportation Research B 32(7) 455466. 173 PAGE 185 Bhat, C. R. and Koppelman, F. S. (1999). A retrospective and prospective survey of timeuse research. Transportation 26(2) 119139. Bhat, C. R. and Misra, R. (1999). Di scretionary activity time allocation of individuals between inhome and outofhom e and between weekdays and weekends. Transportation 26(2) 193229. Bhat, C. R. and Singh, S. K. (2000). A comprehensive daily activitytravel generation model system for workers. Transportation Research A 34(1) 122. Bhat, C. R. (2001a). Modeling the commut e activitytravel pattern of workers: formulation and empirical analysis. Transportation Science 35(1) 6179. Bhat, C. R. (2001b). Quasirandom maxi mum simulated likelihood estimation of the mixed multinomial logit model. Transportation Research B 35 677693. Bhat, C. R. (2004). A comprehensive econometric microsimulator for daily activitytravel Patterns (CEMDAP). CDROM of the 83rd Annual Meeting of the Transportation Research Board. National Research Council, Washington, D.C. Bhat, C. R., and Sardesai R. (2006). The impact of stopmaking and travel time reliability on commute mode choice. Transportation Research B 40(9) 709730. Cambridge Systematics, Inc. (1997). Time ofday modeling procedures: stateofthepractice, stateoftheart Final Report, Travel Model Improvement Program, U.S. Department of Transportation, Washington, D.C. Cox, D. (1961). Tests of separate families of hypotheses. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1. Berkeley: University of California Press. Cox, D. (1962). Further results on tests of separate families of hypotheses. Journal of the Royal Statistical Society B 24 406424. Dubin, J. and McFadden, D. (1984). An econometric analysis of residential electric appliance hold ings and consumption. Econometrica 52(2) 345362. Eluru, N. and Bhat, C. R. (2005). A joint econometric analysis of seat belt use and crashrelated injury severity. Technical pa per, Department of Civil Engineering, The University of Texas at Austin. Fujii, S. and Kitamura, R. (2000). Evaluation of tripinducing effects of new freeways using a structural equations model system of commuters time use and travel. Transportation Research B 34(5) 339354. 174 PAGE 186 Goulias, K. G. (1997). Activitybased trav el forecasting: what are some issues? In Texas Transportation Institute (ed) Ac tivityBased Travel Forecasting Conference, June 25, 1996: Summary, Recommendations, an d Compendium of Papers, Travel Model Improvement Program. US Department of Transportation, Washington DC 3749. Golob, T. F. (2000). A simultaneous mode l of household activity participation and trip chain generation. Transportation Research B 34(5) 355376. Golob, T. F. (2003). Structural equation modeling for travel behavior research. Transportation Research B 37(1) 125. Greene, W. H. (1998). Gender economics cour ses in liberal arts colleges: further results. Journal of Economic Education 29(4) 291300. Greene, W. H. (2002). LIMDEP Version 8.0: User's Manual Econometric Software, Inc.. Plainview, NY. Greene, W. H. (2003). Econometric Analysis Fifth Edition. Pearson Education, Inc., NJ. Hamed, M. and Mannering, F. L. (1993). Modeling travelers' postwork activity involvement: toward a new methodology. Transportation Science 27(4) 381394. Harvey, A. S. and Taylor, M. E. (2000). Activity settings and travel behavior: a social contact perspective. Transportation 32(1) 5373. Hensher, D. A. and Reyes A. J. (2000). Trip chaining as a barrier to the propensity to use public transport. Transportation 27(4) 341361. Horowitz, J. L. (1983). Statistical compar ison of nonnested pr obabilistic discrete choice models. Transportation Science 17(3) 319350. Hunt, J. D., and Patterson, D. M. (1996). A stated preference examination of time of travel choice for a recreational trip. Journal of Advanced Transportation 30(3) 1744. Kasturirangan, K., Pendyala, R. M. and K oppelman, F. S. (2002). Role of history in modeling daily activity frequency and duration for commuters. Transportation Research Record 1807, Journal of Transportation Research Board. National Research Council, Washington D.C. 129136. Kitamura, R., Yamamoto, T., Fujii, S.. and Sampath, S. (1996). A discretecontinuous analysis of time allocation to tw o types of discretionary activities which accounts for unobserved heterogeneity. In Lesort, J.B. Transportation and Traffic Theory, Elsevier, Oxford 431453. 175 PAGE 187 Kitamura, R., Chen, C., Pendyala, R. M. and Narayanan R. (2000). Microsimulation of activitytravel patter ns for travel demand forecasting. Transportation 27(1) 2551. Kumar, A. and Levinson, D. (1994). Tem poral variations on allocation of time. Transportation Research Record 1439, Transportation Research Board National Research Council, Washington, D.C. 118127. Lee, L. F. (1983). Generalized econo metric models with selectivity. Econometrica 51(2) 507512. Levinson, D. and Kumar, A. (1995). Activit y, travel, and the allocation of time. Journal of the American Planning Association 61(4) 458470. Lewbel, A. (2000). Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables. Journal of Econometrics 97 145177. Lockwood, P. B. and Demetsky, M. J. (1994). Nonwork travel a study of changing behavior. Presented at the 73rd Annual Meeting of the Transportation Research Board January 913, Washington, D.C. McFadden, D. L. (1973). Conditional logit an alysis of qualitative choice behavior, in P. Zarembka (ed.), Frontiers in Econometrics Academic Press. McFadden, D. and Train, K. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics 15(5) 447470. Maddala, G. S. (1983). Limited dependent and qualitative variables in econometrics. Cambridge University Press, Cambridge. Mahmassani, H. and Chang, G. L. (1985) Dynamic aspects of departuretime choice behavior in a commuting system: theoretic al framework and experimental analysis. Transportation Research Record 1037 National Research Council, Washington D.C. 88101. Mahmassani, H. and Stephan, D. (1988). Experimental investigation of route and departure time dynamics of urban commuters. Transportation Research Record 1203. National Research Council, Washington D.C. 6984. Mannering, F. L., Murakami, E. and Kim, S. G. (1994). Models of travelers' activity choice and homestay duration: analysis of functional form and temporal stability. Transportation 21(4) 371392. 176 PAGE 188 McGuckin, N. and Murakami, E. (1999) Examining tripchaining behavior: a comparison of travel by men and women. Transportation Research Record 1693. National Research Council, Washington D.C. 7985. Mroz, T. A. (1987). The sensitivity of an empirical model of married women's hours of work to economic and statistical assumptions. Econometrica 55 765799. Mroz, T. A. (1999). Discrete factor approximations in simultaneous equation models: estimating the impact of a dummy e ndogenous variable on a continuous outcome. Journal of Econometrics 92 233274. Muthen, B. (1979). A structural probit model with latent variables. Journal of the American Statistical Association 74(368) 807811. Noland, R. B. and Small, K. A. (1994). Traveltime uncertainty, departure time choice, and the cost of morning commutes. Transportation Research Record 1439. National Research Council, Washington D.C. 150158. Pas, E. I. and Harvey, A. S. (1997) Time use research and travel demand analysis and modeling. In Stopher, P. R. and LeeGosselin, M. Understanding Travel Behavior in an Era of Change Elsevier, Oxford 315338. Pendyala, R. M, Kitamura, R, Chen, C. and Pas, E. I. (1997). An activitybased microsimulation analysis of transportation control measures. Transport Policy 4(3) 183192. Pendyala, R. M, Kitamura, R. and Reddy, D. V. G. P. (1998). Application of an activitybased travel demand model incorporating a rulebased algorithm. Environment and Planning B: Planning and Design 25 753772. Pendyala, R. M, Yamamoto, T. and Kitamura, R. (2002). On the formulation of time space prisms to model constraints on personal activitytravel engagement. Transportation 29(1) 7394. Pendyala, R. M. and Bhat, C. R. (2004). An exploration of the relationship between timing and duration of maintenance activities. Transportation 31(4) 429456. Pendyala, R. M., Kitamura R., Kikuchi, A., Yamamoto, T., Fujii, S. (2005). FAMOS: the florida activity mobility simulator. CDROM of the 84th Annual Meeting of the Transportation Research Board. National Research Council, Washington D.C. Rhine, S. L.W., Greene, W. H. and ToussaintComeau M. (2006). The importance of checkcashing businesses to the unbanked: racial/ethnic differences. The Review of Economics and Statistics 88(1) 146157. 177 PAGE 189 Schmertmann, C. P. (1994). Selectivity bi as correction methods in polychotomous sample selection models. Journal of Econometrics 60 101132. Schmidt, P. and Strauss, R. P. (1975). Estimation of models with jointly dependent qualitative variables: a simultaneous logit approach. Econometrica 43(4) 745755. Shiftan, Y. (1998). Practical appr oach to model trip chaining. Transportation Research Record 1645 National Research Council, Washington D.C. 1723. Steed, J. L. and Bhat, C. R. (2000). On modeling departure time choice for homebased social/recreational and shopping trips. Transportation Research Record 1706 National Research Council, Washington D.C. 152159. Stopher, P. R. (1993). Defi ciencies of travel fore casting methods relative to mobile emissions. ASCE Journal of Transportation Engineering 119(5) 723741. Strathman, J. G. and Dueker, K. J. (1 994). Effect of household structure and selected characteristics on trip chaining. Transportation 21 2345. Strathman, J. G. and Dueker, K. J. (1 995). Understanding trip chaining, special reports on trip and vehicle attributes. 1990 NPTS Reports Series Publication No. FHWAPL95033. U.S. Department of Transportation 11 ~ 127. Train, K. (2002). Discrete choice methods with simulation Cambridge University Press. Tringides, C. A., Ye, X. and Pendyala, R. M. (2004). Departuretime choice and mode choice for nonwork trips: alternativ e formulations of joint model systems. Transportation Research Record 1898 National Research Council, Washington D.C. 19. Walker, J. (2002). The mixed logit (or logit kernel) model: dispelling misconceptions of identification. Transportation Research Record 1805. National Research Council, Washington D.C. 8698. Wang, J. J. (1996). Timing utility of daily activities and its impact on travel. Transportation Research A 30 189206. Weiner, E. and Ducca, F. (1996). U pgrading travel demand forecasting capabilities: USDOT Travel Model Improvement Program. TR News 186. Transportation Research Board, National Research Council, Washington D.C. 26. Wen, CH. and Koppelman, F. S. ( 2000). A conceptual and methodological framework for the generation of activity travel patterns. Transportation Research B 32(1) 523. 178 PAGE 190 Vovsha, P. (1995). Application of crossnest ed logit model to mode chocie to in Tel Aviv, Israel, Metropolitan Area. Transportation Research Record 1607 National Research Council, Washington D.C. 615. Yamamoto, T. and Kitamura, R. (1999). An analysis of time allocation to inhome and outofhome discretionary activities across working days and nonworking days. Transportation 26(2) 211230. Ye, X. and Pendyala, R.M. (2003). Descri ption of the Switzerland Microcensus 2000 travel survey sample. Research Re port prepared for Jenni + Gottardi AG, Department of Civil and Environmental Engine ering, University of South Florida, Tampa, FL. Ye, X., Pendyala, R. M. and Gottardi, G. (2004). An exploration of the relationship between auto mode choice and complexity of trip chaining patterns. Transportation Research B 41(1) 96113. 179 PAGE 191 Bibliography Barnard, P. O. and Hensher, D. A. ( 1992). Joint estimation of a polychotomous discretecontinuous choice system : an analysis of the spatial distribution of retail expenditures. Journal of Transport Economics and Policy XXVI(3) 299312. Bhat, C. R. (2003). Simulation estimation of mixed discrete choice models using randomized and scrambled halton sequences. Transportation Research B 37(9) 837855. Dissanayake, D. and Morikawa, T. ( 2002). Household travel behavior in developing countries: nested logit model of vehicle ownership, mode choice, and trip chaining. Transportation Research Record 1805, J ournal of the Transportation Research Board 4552. National Research Council, Washington, D.C. Doherty, S. T. and Miller, E. J. ( 2000) A computerized household activity scheduling survey. Transportation 27(1) 7597. Feller, W. (1971). An introduction to probability theory and its applications Wiley, New York. Geweke, J., Keane, M. and Runkle, D. (1994). Alternat ive computational approaches to inference in the multinomial probit model. Review of Economics and Statistics 76 609632. Hajivassiliou, V, McFadden, D., and Paul R. (1996). Simulation of multivariate normal orthant probabilities: methods and programs. Journal of Econometrics 72 85134. Hanemann, W. M. (1984). Discrete/con tinuous models of consumer demand. Econometrica 52 541561. Keane, M. (1992). A note on identification in the multinomial probit model. Journal of Business and Economics Statistics 10 193200. Keane, M. (1994). A computa tionally practical simulation estimator for panel data. Econometrica 62(1) 95116. Koppelman, F. S. and Pas, E. I. (198 4). Estimation of disaggregate regression models of person trip generation with multiday data. Proceedings of the Ninth International Symposium on Transportation and Traffic Theory Delft, the Netherlands. 180 PAGE 192 Mannering, F. L. and Hensher, D. A. (1987). Discretecontinuous econometric models and their applicati on to transport analysis. Transport Reviews 7(3) 227244. Ouyang, Y., Shankar, V. and Yamamoto, T. (2002). Modeling the simultaneity in injury causation in multivehicle collisions. Transportation Research Record 1784 National Research Council, Washington D.C. 143152. Pendyala, R. M. and Ye, X. (2005). C ontributions to understanding joint relationships among activity and travel variables, In H. Timmermans (ed.) Progress in ActivityBased Analysis Pergamon, Elsevier Sc ience Ltd., Oxford, UK 124. 181 PAGE 193 Appendices 182 PAGE 194 Appendix A: Gauss Code for Genera ting and Storing Halton Sequences proc halton(n,s); local phi,i,j,y,x,k; k=floor(ln(n+1) ./ ln(s)); phi={0}; i=1; do while i .le k; x=phi; j=1; do while j .lt s; y=phi+(j/s^i); x=xy; j=j+1; endo; phi=x; i=i+1; endo; x=phi; j=1; do while j .lt s .and rows(x) .lt (n+1); y=phi+(j/s^i); x=xy; j=j+1; endo; phi=x[2:(n+1),1]; retp(phi); endp; /* The procedure is extracted from the c odes for mixed logit model by Kenneth Train, Professor in Department of Economics at University of California, Berkeley */ n = 3.3e7; h1 = halton(n,2); n1 = cdfni(h1); outhalt = "c:\\gauss\\data\\n1"; let vnames = h1; create f1 = ^outhalt with ^vnames, 0, 8; if writer(f1, n1 ) /= n; print "Disk Full"; end; endif; closeall f1; h2 = halton(n,3); n2 = cdfni(h2); 183 PAGE 195 Appendix A: (Continued) outhalt = "c:\\gauss\\data\\n2"; let vnames = h2; create f1 = ^outhalt with ^vnames, 0, 8; if writer(f1, n2 ) /= n; print "Disk Full"; end; endif; closeall f1; h3 = halton(n,5); n3 = cdfni(h3); outhalt = c:\\gauss\\data\\n3"; let vnames = h3; create f1 = ^outhalt with ^vnames, 0, 8; if writer(f1, n3 ) /= n; print "Disk Full"; end; endif; closeall f1; h4 = halton(n,7); n4 = cdfni(h4); outhalt = "c:\\gauss\\data\\n4"; let vnames = h4; create f1 = ^outhalt with ^vnames, 0, 8; if writer(f1, n4 ) /= n; print "Disk Full"; end; endif; closeall f1; 184 PAGE 196 Appendix B: Gauss Code of Mixed Discretecontinuous Model (Exemplified by Noncommuter Model Where Timeofday Choice Affects Activity Duration) library maxlik; N= 14970; Load data[N,119] = "C:\\PHD Dissertation\\Swiss\\timing_dur ation\\main_act_file_gauss.dat"; commuter = data[., 100 ]; filter_x = (commuter .== 0); filter_x =miss(filter_x,0); data = packr(data~filter_x); N = rows(data); one=ones(rows(data),1); /* define variables */ intnr = data[., 1 ]; hhnr = data[., 2 ]; tripnum = data[., 3 ]; /* Definition of variables from th e dataset is tedious and excluded. */ ln_dur = data[., 115 ]; ampeak = data[., 116 ]; pmpeak = data[., 117 ]; midday= data[., 118 ]; offpeak = data[., 119 ]; age = age/100; pmale = (sex.==1); age_sq = age.*age; car_0 =(n_auto.==0) ; car_ge2 = (n_auto .>=2); hhsize1 = (hhsize .==1 ); low_inc = (hhincome .<3 .and hhincome .>0 ); high_inc = (hhincome .>= 6 ); y1 = ampeak; y2 = pmpeak; y3 = midday; y4 = offpeak; y = ln_dur; xx1 = one~age~hhsize~low_inc~car_0; xx2 = one~age~pmale~high_inc~car_0; xx3 = one~age~pmale~car_0; xx = one~age~age_sq~pmale~hhsize~high_inc~car_ge2~y1~y2~y3; 185 PAGE 197 Appendix B: (Continued) data = xx1~xx2~xx3~xx~y1~y2~y3~y4~y; s_n = 100; /* number of the random seeds */ outhalt = "c:\\gauss\\data\\n1"; open fin = ^outhalt for read; call seekr(fin,(1000)); as1 = readr(fin,s_n*N); fin = close(fin); outhalt = "c:\\gauss\\data\\n2"; open fin = ^outhalt for read; call seekr(fin,(1000)); as2 = readr(fin,s_n*N); fin = close(fin); outhalt = "c:\\gauss\\data\\n3"; open fin = ^outhalt for read; call seekr(fin,(1000)); as3 = readr(fin,s_n*N); fin = close(fin); outhalt = "c:\\gauss\\data\\n4"; open fin = ^outhalt for read; call seekr(fin,(1000)); as4 = readr(fin,s_n*N); fin = close(fin); as1 = (reshape(as1,N,s_n)); as2 = (reshape(as2,N,s_n)); as3 = (reshape(as3,N,s_n)); as4 = (reshape(as4,N,s_n)); proc lpr(b,z); local xxx1, xxx2, xxx3, xxx4, xxx, u1,u2,u3,u4,p,p1,p2,p3,p4,ln_p,i,pd_sum,sigma,pd,d,u; pd_sum = 0; i = 1; do while (i<=s_n); xxx1 = xx1~as1[.,i]; xxx2 = xx2~as2[.,i]; xxx3 = xx3~as3[.,i]; 186 PAGE 198 Appendix B: (Continued) xxx4 = as4[.,i]; xxx = xx~as1[.,i]~as2[.,i]~as3[.,i]~as4[.,i]; u1=xxx1*b[1:cols(xxx1)] ; u2=xxx2*b[cols(xxx1)+1:co ls(xxx1~xxx2)]; u3=xxx3*b[cols(xxx1~xxx2)+1:cols(xxx1~xxx2~xxx3)] ; u4=xxx4*b[cols(xxx1~xxx2~xxx3)+1:cols(xxx1~xxx2~xxx3~xxx4)]; p1=exp(u1)./ ( exp(u1) + exp(u2) + exp(u3) + exp(u4)); p2=exp(u2)./ ( exp(u1) + exp(u2) + exp(u3) + exp(u4) ); p3=exp(u3)./ ( exp(u1) + exp(u2) + exp(u3) + exp(u4)); p4= exp(u4)./ ( exp(u1) + e xp(u2) + exp(u3) + exp(u4)); p = (p1.^y1).*(p2.^y2).*(p3.^y3).*(p4.^y4); u = y xxx*b[cols(xxx1~xxx2~xxx3~xxx4)+1:cols(xxx1~xxx2~xxx3~xxx4~xxx)] ; sigma = b[cols(xxx1~xxx2~xxx3~xxx4~xxx)+1]; d = (1/sigma)*pdfn(u/sigma); pd = p.*d; pd_sum = pd_sum + pd; i = i + 1; endo; retp ( ln(pd_sum/s_n) ); endp; proc lgd(b,z); local xxx1,xxx2,xxx3,xxx4,xxx,u1,u2,u3,u4,p1,p2,p3,p4,g1,g2,g3,g4,p,g,i,p_sum,g_sum, gp_sum,pd_sum,gds_sum,gp,d,gds,pd,gd,d_gamma,d_sigma,gd_sum,sigma,u; p_sum = 0; g_sum = 0; gd_sum =0; pd_sum = 0; gds_sum =0; i = 1; do while (i<=s_n); xxx1 = xx1~as1[.,i]; xxx2 = xx2~as2[.,i]; xxx3 = xx3~as3[.,i]; xxx4 = as4[.,i]; xxx = xx~as1[.,i]~as2[.,i]~as3[.,i]~as4[.,i]; u1=xxx1*b[1:cols(xxx1)] ; u2=xxx2*b[cols(xxx1)+1:co ls(xxx1~xxx2)]; u3=xxx3*b[cols(xxx1~xxx2)+1:cols(xxx1~xxx2~xxx3)] ; u4=xxx4*b[cols(xxx1~xxx2~xxx3)+1:cols(xxx1~xxx2~xxx3~xxx4)]; 187 PAGE 199 Appendix B: (Continued) p1=exp(u1)./ ( exp(u1) + exp(u2) + exp(u3) + exp(u4)); p2=exp(u2)./ ( exp(u1) + exp(u2) + exp(u3) + exp(u4) ); p3=exp(u3)./ ( exp(u1) + exp(u2) + exp(u3) + exp(u4)); p4= exp(u4)./ ( exp(u1) + e xp(u2) + exp(u3) + exp(u4)); p = (p1.^y1).*(p2.^y2).*(p3.^y3).*(p4.^y4); g1 = (y1.*p1.*(1 p1)+ y2.*p2.*( p1) + y3.*p3.*(p1)+ y4.*p4.*(p1)).*xxx1; g2 = (y1.*p1.*(p2)+ y2.*p2.*(1 p2) + y3.*p3.*(p2)+ y4.*p4.*(p2)).*xxx2 ; g3=(y1.*p1.*(p3) + y2.*p2.*( p3) + y3.*p3.*(1 p3)+ y4.*p4.*(p3)).*xxx3; g4=(y1.*p1.*(p4) + y2.*p2.*( p4) + y3.*p3.*(p4)+ y4.*p4.*(1p4)).*xxx4; g = g1~g2~g3~g4; u = y xxx*b[cols(xxx1~xxx2~xxx3~xxx4)+1:cols(xxx1~xxx2~xxx3~xxx4~xxx)] ; sigma = b[cols(xxx1~xxx2~xxx3~xxx4~xxx)+1]; d = (1/sigma)*pdfn(u/sigma); pd = p.*d; pd_sum = pd_sum + pd; gd = g.*d; gd_sum = gd_sum + gd; d_gamma = u/(sigma^2).*d.*xxx; d_sigma = d.*(u.^2/sigma^3 1/sigma); gds = (d_gamma~d_sigma).*p; gds_sum = gds_sum + gds; i = i + 1; endo; gp = gd_sum./pd_sum; gds = gds_sum./pd_sum; retp ( gp~gds ); endp; _max_GradProc = &lgd; _max_CovPar = 1; b0 = zeros(cols(xx1~xx2 ~xx3~xx)+8,1)1 ; 188 PAGE 200 Appendix B: (Continued) _max_parnames = "one""age""hhs ize""low_inc"" car_0""f1" "one" "age""pmale""high_inc""car_0""f2" "one ""age""pmale""car_0""f3""f4" "one ""age""age_sq"" pmale""hhsize" "high_inc"" car_ge2""ampeak""pmpeak""midday" "g1""g2""g3""g4""sigma"; _max_Active = 111111 110111 110110 111111011111101 ; {b,f,g,cov,ret}=maxlik(data,0,&lpr,b0); call maxprt(b,f,g,cov,ret); 189 PAGE 201 Appendix C: Gauss Code of Discretecontinuous Model Based on Lee Transformation (Exemplified by Noncommuter Model Where Timeofday Choice Affects Activity Duration) library maxlik; N= 14970; Load data[N,119] = "C:\\PHD Dissertation\\Swiss\\timing_dur ation\\main_act_file_gauss.dat"; commuter = data[., 100 ]; filter_x = (commuter .== 0); filter_x =miss(filter_x,0); data = packr(data~filter_x); N = rows(data); one=ones(rows(data),1); /* define variables */ intnr = data[., 1 ]; hhnr = data[., 2 ]; tripnum = data[., 3 ]; /* Definition of variables from th e dataset is tedious and excluded. */ ln_dur = data[., 115 ]; ampeak = data[., 116 ]; pmpeak = data[., 117 ]; midday= data[., 118 ]; offpeak = data[., 119 ]; age = age/100; pmale = (sex.==1); age_sq = age.*age; car_0 =(n_auto.==0) ; car_ge2 = (n_auto .>=2); hhsize1 = (hhsize .==1 ); low_inc = (hhincome .<3 .and hhincome .>0 ); high_inc = (hhincome .>= 6 ); y1 = ampeak; y2 = pmpeak; y3 = midday; y4 = offpeak; y = ln_dur; xx1 = one~age~hhsize~low_inc~car_0; xx2 = one~age~pmale~high_inc~car_0; xx3 = one~age~pmale~car_0; xx = one~age~age_sq~pmale~hhsize~high_inc~car_ge2~y1~y2~y3; data = xx1~xx2~xx3~xx~y1~y2~y3~y4~y; 190 PAGE 202 Appendix C: (Continued) proc lpr(b,z); local ut1,ut2,ut3,p1,p2,p3,p4,p,r1,r2,r3,r4,sigma,l,bb1,bb2,bb3,bb4,pp1,pp2,pp3,pp4; ut1 = xx1*b[1:cols(xx1)]; ut2 = xx2*b[cols(xx1)+1:cols(xx1~xx2)]; ut3 = xx3*b[cols(xx1~xx2)+1:cols(xx1~xx2~xx3)]; p1 = exp(ut1)./(exp(ut1)+e xp(ut2)+ exp(ut3) + 1); p2 = exp(ut2)./(exp(ut1)+e xp(ut2)+ exp(ut3) + 1); p3 = exp(ut3)./(exp(ut1)+e xp(ut2)+ exp(ut3) + 1); p4 = 1 p1 p2 p3; r1 = b[cols(xx1~xx2~xx3~xx)+1]; r2 = b[cols(xx1~xx2~xx3~xx)+2]; r3 = b[cols(xx1~xx2~xx3~xx)+3]; r4 = b[cols(xx1~xx2~xx3~xx)+4]; sigma = b[cols(xx1~xx2~xx3~xx)+5]; l = (y xx*b[cols(xx1~xx2~xx3)+1:cols(xx1~xx2~xx3~xx)])/sigma; bb1 = (cdfni(p1) r1*l)./sqrt(1r1^2); bb2 = (cdfni(p2) r2*l)./sqrt(1r2^2); bb3 = (cdfni(p3) r3*l)./sqrt(1r3^2); bb4 = (cdfni(p4) r4*l)./sqrt(1r4^2); pp1 = (1/sigma/sqrt(2*pi))* exp(l.^2/2).*cdfn(bb1); pp2 = (1/sigma/sqrt(2*pi))* exp(l.^2/2).*cdfn(bb2); pp3 = (1/sigma/sqrt(2*pi))* exp(l.^2/2).*cdfn(bb3); pp4 = (1/sigma/sqrt(2*pi))* exp(l.^2/2).*cdfn(bb4); p = y1.*pp1 + y2.*pp2 + y3.*pp3 + y4.*pp4; retp (ln(p)); endp; _max_CovPar = 1; b0 = zeros(cols(xx1~xx2 ~xx3~xx)+4,1)1 ; _max_parnames = "one""age" "hhsize""low_ inc""car_0" "one ""age""pmale""high_inc""car_0" "one""age""pmale""car_0" "one""age""age_sq""pmale""hhsize ""high_inc""car_ge2" "ampeak""pmpeak""midday""r1""r2""r3""r4""sigma"; 191 PAGE 203 Appendix C: (Continued) _max_Active = 11111 11111 1111 111111011111111 ; {b,f,g,cov,ret}=maxlik(data,0,&lpr,b0); call maxprt(b,f,g,cov,ret); 192 PAGE 204 Appendix D: Gauss Code of Mixed Binarymultinomial Choice Model (g i Fixed at 1, Exemplified by Noncommuter Model Where Binary Timeofday Choice Affects Multinomial Mode Choice) library maxlik; N= 12939; load data[N,139] = "C:\\PHD Dissert ation\\Swiss\\tod_mode\\mode_tod.dat"; commuter = data[., 105 ]; filter_x = (commuter .== 0 ); filter_x = miss(filter_x,0); data = packr(data~filter_x); N = rows(data); one = ones(rows(data),1); /* define variables */ intnr = data[., 1 ]; hhnr = data[., 2 ]; tripnum = data[., 3 ]; /* Definition of variables from th e dataset is tedious and excluded. */ ampeak = data[., 136 ]; pmpeak = data[., 137 ]; midday= data[., 138 ]; offpeak = data[., 139 ]; old = (age .> 60); y1 = sov; y2 = hov; y3 = transit; y4 = nmotor; z1 = (ampeak .or pmpeak) ; z2 = (midday .or offpeak ); peak = z1; nm_time = (time_bic+time_wk)/2 ; age = age/100; pmale = (sex.==1); age_sq = age.*age; car_0 =(n_auto.==0) ; car_ge2 = (n_auto .>=2); hhsize1 = (hhsize .==1 ); trst_sub = ((o_sub .== 1) .or (g_sub.==1)); shopping = (purpose .== 3); leisure = (purpose .== 4); 193 PAGE 205 Appendix D: (Continued) service = (purpose .== 5); home = (purpose .== 7); swiss = (add_swit .==1 ); hhsize2 = (hhsize .==2 ); s_n = 100; /* number of the random seeds */ outhalt = "c:\\gauss\\data\\n1"; open fin = ^outhalt for read; call seekr(fin,(1000)); as1 = readr(fin,s_n*N); fin = close(fin); outhalt = "c:\\gauss\\data\\n2"; open fin = ^outhalt for read; call seekr(fin,(1000)); as2 = readr(fin,s_n*N); fin = close(fin); outhalt = "c:\\gauss\\data\\n3"; open fin = ^outhalt for read; call seekr(fin,(1000)); as3 = readr(fin,s_n*N); fin = close(fin); outhalt = "c:\\gauss\\data\\n4"; open fin = ^outhalt for read; call seekr(fin,(1000)); as4 = readr(fin,s_n*N); fin = close(fin); as1 = (reshape(as1,N,s_n)); as2 = (reshape(as2,N,s_n)); as3 = (reshape(as3,N,s_n)); as4 = (reshape(as4,N,s_n)); xx1 = one~(car_time/100)~termtime~pklot_d~car_0~car_ge2~z1; xx2 = one~(car_time/100)~termtime~pklot_d~car_0~car_ge2~z1; xx3 = one~(iveh/100)~(owt/100)~freq~trst_sub~z1; xx4 = (nm_time/100); zz1 = one~old~hhsize2~swiss~shopping~service; proc lpr(b,z); 194 PAGE 206 Appendix D: (Continued) local u1,u2,u3,u4,v1,v2,v3,v4,xxx1,xxx2,xxx3,xxx4,zzz1,zzz2,zzz3,zzz4, p,p_u,p_v,pu1,pu2,pu3,pu4,pv1,pv2,pv3,pv4,ln_p,i,puv_sum,p_uv; puv_sum = 0; i = 1; do while (i<=s_n); xxx1 = xx1~as1[.,i]; xxx2 = xx2~as2[.,i]; xxx3 = xx3~as3[.,i]; xxx4 = xx4~as4[.,i]; zzz1 = zz1~as1[.,i]~as2[.,i]~as3[.,i]~as4[.,i]; u1= xxx1*b[1:cols(xxx1)] ; u2= xxx2*b[cols(xxx1)+1:cols(xxx1~xxx2)] ; u3 = xxx3*b[cols(xxx1~xxx2)+1:cols(xxx1~xxx2~xxx3)] ; u4 = xxx4*b[cols(xxx1~xxx2~xxx3)+1:cols(xxx1~xxx2~xxx3~xxx4)]; pu1=exp(u1)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu2=exp(u2)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu3= exp(u3)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu4= exp(u4)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); p_u = (pu1.^y1).*(pu2.^y2).*(pu3.^y3).*(pu4.^y4); v1 = zzz1*b[cols(xxx1~xxx2~xxx3~xxx4)+1:cols(xxx1~xxx2~xxx3~xxx4~zzz1)]; pv1 = cdfn(v1); pv2 = cdfn(v1); p_v = (pv1.^z1).*(pv2.^z2); p_uv = p_u.*p_v; puv_sum = puv_sum + p_uv; i = i + 1; endo; retp ( ln(puv_sum/s_n) ); endp; proc lgd(b,z); local xxx1,xxx2,xxx3,xxx4,zzz1,zzz2,zzz3,zzz4, u1,u2,u3,u4,v1,v2,v3,v4,pu1,pu2,pu3,pu4,pv1,pv2,pv3,pv4,p,g,i,g_u,g_v,gu_sum, gv_sum,puv_sum,p_u,p_v,p_uv,gu_1,gu_2,gu_3,gu_4,gv_1,gv_2,gv_3,gv_4,gu,gv; 195 PAGE 207 Appendix D: (Continued) gu_sum =0; gv_sum = 0; puv_sum =0; i = 1; do while (i<=s_n); xxx1 = xx1~as1[.,i]; xxx2 = xx2~as2[.,i]; xxx3 = xx3~as3[.,i]; xxx4 = xx4~as4[.,i]; u1= xxx1*b[1:cols(xxx1)] ; u2= xxx2*b[cols(xxx1)+1:cols(xxx1~xxx2)] ; u3 = xxx3*b[cols(xxx1~xxx2)+1:cols(xxx1~xxx2~xxx3)] ; u4 = xxx4*b[cols(xxx1~xxx2~xxx3)+1:cols(xxx1~xxx2~xxx3~xxx4)]; pu1=exp(u1)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu2=exp(u2)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu3= exp(u3)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu4= exp(u4)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); p_u = (pu1.^y1).*(pu2.^y2).*(pu3.^y3).*(pu4.^y4); zzz1 = zz1~as1[.,i]~as2[.,i]~as3[.,i]~as4[.,i]; v1 = zzz1*b[cols(xxx1~xxx2~xxx3~xxx4)+1:cols(xxx1~xxx2~xxx3~xxx4~zzz1)]; pv1 = cdfn(v1); pv2 = cdfn(v1); p_v = (pv1.^z1).*(pv2.^z2); p_uv = p_u.*p_v; gu_1=(y1.*pu1.*(1pu1)+y2.*pu2.*(pu1)+y3.* pu3.*(pu1)+y4.*pu4.*(pu1)).*xxx1; gu_2=(y1.*pu1.*(pu2)+y2.*pu2.*(1pu2)+y3.* pu3.*(pu2)+y4.*pu4.*(pu2)).*xxx2; gu_3=(y1.*pu1.*(pu3)+y2.*pu2.*(pu3)+y3.*p u3.*(1pu3)+y4.*pu4.*(pu3)).*xxx3; gu_4=(y1.*pu1.*(pu4)+y2.*pu2.*(pu4)+y3.*p u3.*(pu4)+y4.*pu4.*(1pu4)).*xxx4; g_u = (gu_1~gu_2~gu_3~gu_4).*p_v; gv_1 = (z1.*pdfn(v1) z2.*pdfn(v1) ).*zzz1; 196 PAGE 208 Appendix D: (Continued) g_v = (gv_1).*p_u; gu_sum = gu_sum + g_u; gv_sum = gv_sum + g_v; puv_sum = puv_sum + p_uv; i = i + 1; endo; gu = gu_sum./puv_sum; gv = gv_sum./puv_sum; retp ( gu~gv ); endp; b0 = zeros(cols(xx1~xx2~ xx3~xx4~zz1)+4,1)1111; _max_GradProc = &lgd; _max_Active = 1111111 1 1111111 0 1 11111 1 1 1 1101110000 ; _max_CovPar = 1; _max_parnames = "CONS_1""CARTIME1""TERMTIM1""PKLOT_D1""CAR_01""CAR_GE1" "peak""f1" "CONS_2""CARTIME2""TERMTIM2""PKLOT_D2""CAR_02""CAR_GE2" "peak""f2" "CONS_3""IVEH3""OWT3""FREQ3 ""TRST_SUB""peak""f3" "nm_time""f4" "CONS_1""old""hhsize2""s wiss""shopping""service" "g11""g12""g13""g14" ; {b,f,g,cov,ret}=maxlik(data,0,&lpr,b0); call maxprt(b,f,g,cov,ret); 197 PAGE 209 Appendix E: Gauss Code of Mixed Binarymultinomial Choice Model (f i  = g i , Exemplified by Noncommuter Model Where Binary Timeofday Choice Affects Multinomial Mode Choice) library maxlik; N= 12939; load data[N,139] = "C:\\PHD Dissert ation\\Swiss\\tod_mode\\mode_tod.dat"; commuter = data[., 105 ]; filter_x = (commuter .== 0); filter_x = miss(filter_x,0); data = packr(data~filter_x); N = rows(data); one = ones(rows(data),1); /* define variables */ intnr = data[., 1 ]; hhnr = data[., 2 ]; tripnum = data[., 3 ]; /* Definition of variables from th e dataset is tedious and excluded. */ sov = data[., 132 ]; hov = data[., 133 ]; transit = data[., 134 ]; nmotor = data[., 135 ]; ampeak = data[., 136 ]; pmpeak = data[., 137 ]; midday= data[., 138 ]; offpeak = data[., 139 ]; old = (age .> 60); y1 = sov; y2 = hov; y3 = transit; y4 = nmotor; z1 = (ampeak .or pmpeak) ; z2 = (midday .or offpeak ); with_kid = ((hhsize hh6plus).>0); nm_time = (time_bic+time_wk)/2 ; age = age/100; pmale = (sex.==1); age_sq = age.*age; car_0 =(n_auto.==0) ; car_ge2 = (n_auto .>=2); hhsize1 = (hhsize .==1 ); 198 PAGE 210 Appendix E: (Continued) trst_sub = ((o_sub .== 1) .or (g_sub.==1)); shopping = (purpose .== 3); leisure = (purpose .== 4); service = (purpose .== 5); home = (purpose .== 7); swiss = (add_swit .==1 ); hhsize2 = (hhsize .==2 ); s_n = 100; /* number of the random seeds */ outhalt = "c:\\gauss\\data\\n1"; open fin = ^outhalt for read; call seekr(fin,(1000)); as1 = readr(fin,s_n*N); fin = close(fin); outhalt = "c:\\gauss\\data\\n2"; open fin = ^outhalt for read; call seekr(fin,(1000)); as2 = readr(fin,s_n*N); fin = close(fin); outhalt = "c:\\gauss\\data\\n3"; open fin = ^outhalt for read; call seekr(fin,(1000)); as3 = readr(fin,s_n*N); fin = close(fin); outhalt = "c:\\gauss\\data\\n4"; open fin = ^outhalt for read; call seekr(fin,(1000)); as4 = readr(fin,s_n*N); fin = close(fin); as1 = (reshape(as1,N,s_n)); as2 = (reshape(as2,N,s_n)); as3 = (reshape(as3,N,s_n)); as4 = (reshape(as4,N,s_n)); xx1 = one~(car_time/100)~termtime~pklot_d~car_0~car_ge2~z1; xx2 = one~(car_time/100)~termtime~pklot_d~car_0~car_ge2~z1; xx3 = one~(iveh/100)~(owt/100)~freq~trst_sub~z1; xx4 = (nm_time/100); zz1 = one~old~hhsize2~swiss~shopping~service; 199 PAGE 211 Appendix E: (Continued) proc lpr(b,z); local u1,u2,u3,u4,v1,v2,v3,v4,xxx1,xxx2,xxx3,xxx4,zzz1,zzz2,zzz3,zzz4, p,p_u,p_v,pu1,pu2,pu3,pu4,pv1,pv2,pv3,pv4,ln_p,i,puv_sum,p_uv, b_eta1,b_eta2 ,b_eta3,b_eta4, b_eta ; puv_sum = 0; i = 1; do while (i<=s_n); u1= xx1*b[1:cols(xx1)] + as1[.,i ]*b[cols(xx1~xx2~xx3~xx4~zz1)+1]; u2=xx2*b[cols(xx1)+1:cols(xx1~xx2)] + as2[ .,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+2]; u3=xx3*b[cols(xx1~xx2)+1:cols(xx1~xx2~xx3)] +as3[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+3]; u4=xx4*b[cols(xx1~xx2~xx3)+1:cols(xx1~xx2~xx3~xx4)]+ as4[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+4]; pu1=exp(u1)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu2=exp(u2)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu3= exp(u3)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu4= exp(u4)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ) ; p_u = (pu1.^y1).*(pu2.^y2).*(pu3.^y3).*(pu4.^y4); v1 = zz1*(b[cols(xx1~xx2~xx3~xx4) +1:cols(xx1~xx2~xx3~xx4~zz1)]) + as1[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+1] as2[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+2] as3[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+3] as4[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+4]; pv1 = cdfn(v1); pv2 = cdfn(v1); p_v = (pv1.^z1).*(pv2.^z2); p_uv = p_u.*p_v; puv_sum = puv_sum + p_uv; i = i + 1; endo; retp ( ln(puv_sum/s_n) ); endp; proc lgd(b,z); local xxx1,xxx2,xxx3,xxx4,zzz1,zzz2,zzz3,zzz4, u1,u2,u3,u4,v1,v2,v3,v4,pu1,pu2,pu3,pu4,pv1,pv2,pv3,pv4,p,g,i,g_u,g_v, 200 PAGE 212 Appendix E: (Continued) gu_sum,gv_sum,puv_sum,p_u,p_v,p_uv,gu_1,gu_2,gu_3,gu_4,gv_1,gv_2,gv_3,gv_4,gu, gv, b_eta1,b_eta2 ,b_eta3,b_eta4, b_eta, g_c df,g_et,guet,gvet,guet_1,guet_2, guet_3, guet_4, gvet_1,gvet_2, gvet_3, gvet_4,get_sum, get, g1,g2,g3,g4 ; gu_sum =0; gv_sum = 0; puv_sum =0; get_sum = 0; i = 1; do while (i<=s_n); u1 = xx1*b[1:cols(xx1)] + as1[.,i ]*b[cols(xx1~xx2~xx3~xx4~zz1)+1]; u2=xx2*b[cols(xx1)+1:cols(xx1~xx2)] + as2[ .,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+2]; u3=xx3*b[cols(xx1~xx2)+1:cols(xx1~xx2~xx3)] + as3[.,i]*b[cols (xx1~xx2~xx3~xx4~zz1)+3]; u4=xx4*b[cols(xx1~xx2~xx3)+1:cols(xx1~xx2~xx3~xx4)] + as4[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+4]; pu1=exp(u1)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu2=exp(u2)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu3= exp(u3)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ); pu4= exp(u4)./ ( exp(u1) + exp( u2) + exp(u3) + exp(u4) ) ; p_u = (pu1.^y1).*(pu2.^y2).*(pu3.^y3).*(pu4.^y4); v1 = zz1*(b[cols(xx1~xx2~xx3~xx4) +1:cols(xx1~xx2~xx3~xx4~zz1)]) + as1[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+1] as2[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+2] as3[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+3] as4[.,i]*b[cols(xx1~xx2~xx3~xx4~zz1)+4]; pv1 = cdfn(v1); pv2 = cdfn(v1); p_v = (pv1.^z1).*(pv2.^z2); p_uv = p_u.*p_v; g_cdf = (z1.*pdfn(v1) z2.*pdfn(v1) ); gu_1=(y1.*pu1.*(1pu1)+y2.*pu2.*(pu1)+ y3.*pu3.*(pu1)+y4.*pu4.*(pu1)).*xx1; gu_2=(y1.*pu1.*(pu2)+y2.*pu2.*(1pu2)+y3.* pu3.*(pu2)+y4.*pu4.*(pu2)).*xx2 ; gu_3=(y1.*pu1.*(pu3)+y2.*pu2.*(pu3)+y3.*p u3.*(1pu3)+ y4.*pu4.*(pu3)).*xx3; gu_4=(y1.*pu1.*(pu4)+y2.*pu2.*(pu4)+y3.*p u3.*(pu4)+y4.*pu4.*(1pu4)).*xx4 ; 201 PAGE 213 Appendix E: (Continued) guet_1=(y1.*pu1.*(1pu1)+y2.*pu2.* (pu1)+y3.*pu3.*(pu1)+y4.*pu4.*(pu1)) .*as1[.,i]; guet_2=(y1.*pu1.*(pu2)+y2.*pu2.*(1pu2)+y3.*pu3.*(pu2)+y4.*pu4.*(pu2)) .*as2[.,i]; guet_3=(y1.*pu1.*(pu3)+y2.*pu2.*(pu3)+y3.*pu3.*(1pu3)+y4.*pu4.*(pu3)) .*as3[.,i]; guet_4=(y1.*pu1.*(pu4)+y2.*pu2.*(pu4)+y3.*pu3.*(pu4)+y4.*pu4.*(1pu4)) .*as4[.,i]; guet = guet_1~guet_2~guet_3~guet_4; gvet_1 = g_cdf.*as1[.,i]; gvet_2 = g_cdf.*as2[.,i]; gvet_3 = g_cdf.*as3[.,i]; gvet_4 = g_cdf.*as4[.,i] ; gvet = gvet_1~gvet_2~gvet_3~gvet_4; g_et = guet.*p_v + gvet.*p_u ; g_u = (gu_1~gu_2~gu_3~gu_4).*p_v; gv_1 = g_cdf.*zz1; g_v = (gv_1).*p_u; gu_sum = gu_sum + g_u; gv_sum = gv_sum + g_v; get_sum = get_sum + g_et; puv_sum = puv_sum + p_uv; i = i + 1; endo; gu = gu_sum./puv_sum; gv = gv_sum./puv_sum; get = get_sum./puv_sum; retp ( gu~gv~get); endp; b0= zeros(cols(xx1~ xx2~xx3~xx4~zz1)+4,1); _max_GradProc = &lgd; max_Active = 1111111 1111111 202 PAGE 214 Appendix E: (Continued) 1 11111 1 1111111111 ; _max_CovPar = 1; _max_parnames = "CONS_1""CARTIME1""TERMTIM1""PKLOT_D1""CAR_01""CAR_GE1" "peak" "CONS_2""CARTIME2""TERMTIM2""PKLOT_D2""CAR_02""CAR_GE2" "peak" "CONS_3""IVEH3""OWT3""FRE Q3""TRST_SUB""peak" "nm_time" "CONS_1""old""hhsize2""s wiss""shopping""service" "f1""f2""f3""f4" ; {b,f,g,cov,ret}=maxlik(data,0,&lpr,b0); call maxprt(b,f,g,cov,ret); 203 PAGE 215 About the Author Xin Ye received a Bachelors Degree in Traffic Civil Engineering from Tongji University, Shanghai, China in 2000 and a M.S. degree in Civil Engineering from University of South Florida in 2004. He continued to study for a Ph.D. degree in Transportation Systems of Civil Engineering at the University of South Florida in 2004. While in the Ph.D. program at the Univer sity of South Florida, Mr. Ye was very active in transportation research. He has coauthored three journal publications in Transportation Research Transportation and Transportation Research Record respectively and made several paper presentations at important conferences such as International Symposium on Transportation and Traffic Theory (ISTTT) and Annual Meeting of Transportation Research Board. 