USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
xml version 1.0 encoding UTF8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 001681050 003 fts 005 20060215071153.0 006 med 007 cr mnuuuuuu 008 051220s2005 flu sbm s000 0 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0001072 035 (OCoLC)62715932 SFE0001072 040 FHM c FHM 049 FHMM 090 GE70 (Online) 1 100 Hashimoto, Tomoyuki. 0 245 Spatial analysis of pedestrian accidents h [electronic resource] / by Tomoyuki Hashimoto. 260 [Tampa, Fla.] : b University of South Florida, 2005. 502 Thesis (M.S.)University of South Florida, 2005. 504 Includes bibliographical references. 516 Text (Electronic thesis) in PDF format. 538 System requirements: World Wide Web browser and PDF reader. Mode of access: World Wide Web. 500 Title from PDF of title page. Document formatted into pages; contains 77 pages. 3 520 ABSTRACT: Improving traffic safety for pedestrians is an urgent task for the Hillsborough County. The objective of this research is to understand the contributing factors that cause pedestrian accidents in Hillsborough County in the State of Florida. Specifically, this research attempts to determine the effects of demographic, land use, roadway and traffic volume factors on the number of pedestrian accidents. Five hypotheses were proposed to examine these factors. GIS was used to perform spatial analysis. Based on pedestrian accident records from 1999 to 2001, accident density map was created using Kernel density method. The value of the accident density was assigned to each census block group, and this value formed a dependent variable of the model. Accident models were developed using negative binomial regression to model the relationship between contributing factors and pedestrian accidents. 590 Adviser: Dr. Paul Zandbergen. Coadviser: Dr. Donald Duke 653 Gis. Hillsborough. Land use. Negative binomial. Traffic safety. 690 Dissertations, Academic z USF x Environmental Science and Policy Masters. 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.1072 PAGE 1 Spatial Analysis of Pedestrian Accidents by Tomoyuki Hashimoto A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Environmental Science and Policy College of Arts and Science University of South Florida CoMajor Professor: Paul Zandbergen, Ph.D CoMajor Professor: L. Donald Duke, Ph.D Jayajit Chakraborty, Ph.D Date of Approval: January 27, 2005 Keywords: GIS, Hillsborough, land use, negative binomial, traffic safety Copyright 2005, Tomoyuki Hashimoto PAGE 2 DEDICATION I dedicate this work to my parents, my idols, and to my family. Thank you for all the support you have given me throughout my lif e, for your care, and for the invaluable education I have acquired from you. PAGE 3 ACKNOWLEDGEMENTS I would like to thank the ESP personnel fo r making my stay at USF memorable. I extend special gratitude to Dr. Paul Zandbergen for his helpful assistance in this research. Thank you to Dr. L. Donald Duke and Dr. Jayajit Chakraborty for th eir constant guidance and commitment to quality research. PAGE 4 i TABLE OF CONTENTS LIST OF TABLES.............................................................................................................iv LIST OF FIGURES............................................................................................................v ABSTRACT....................................................................................................................... vi CHAPTER 1: INTRODUCTION.......................................................................................1 1.1.Background...............................................................................................................1 1.2.Objective.................................................................................................................. .3 CHAPTER 2: LITERATURE REVIEW............................................................................5 2.1. Demographic Factors...............................................................................................5 2.2. Land Use Factors.....................................................................................................9 2.3. Roadway Factors....................................................................................................13 2.4. Traffic Factors........................................................................................................14 2.5. Statistical Properties of Models.............................................................................15 2.5.1. Multiple Linear Regression Model.................................................................15 2.5.2. Poisson Regression Model..............................................................................16 2.5.3. Negative Binomial Regression Model............................................................17 2.6. Spatial Analysis Techniques..................................................................................20 2.7. Summary of the Literature Review........................................................................21 CHAPTER 3: METHODOLOGY....................................................................................23 3.1. Data Collection......................................................................................................23 PAGE 5 ii 3.1.1. Accident Data..................................................................................................23 3.1.2. Roadway Data.................................................................................................24 3.1.3. Census Data....................................................................................................25 3.1.4. Traffic Data.....................................................................................................27 3.1.5. Land Use Data.................................................................................................28 3.2. Data Processing......................................................................................................29 3.2.1. Software Used.................................................................................................29 3.2.2. Exclusion of the Areas....................................................................................30 3.2.3. Create Accident Density Map.........................................................................31 3.2.4. Calculate Accident Value for Census Block Groups......................................34 3.2.5. Development of Land Use Variables..............................................................35 3.2.6. Development of Roadway Variable................................................................37 3.2.7. Develop Traffic Variable................................................................................37 3.2.8. Development of Demographic Variables........................................................39 3.3. Statistical Analysis.................................................................................................39 3.3.1. Distribution Fitting..........................................................................................39 3.3.2. Selection of Statistical Model.........................................................................42 CHAPTER 4: RESULTS AND DISCUSSION................................................................45 4.1. Statistical Properties of The Variables...................................................................45 4.2. Result of Negative Binomial Regression Model...................................................46 4.3. Discussion of Each Variable..................................................................................49 4.3.1. Demographic Variables..................................................................................49 4.3.2. Land Use Variables.........................................................................................54 PAGE 6 iii 4.3.3. Roadway Variable...........................................................................................58 4.3.4. Traffic Variable...............................................................................................59 CHAPTER 5: CONCLUSION.........................................................................................63 5.1. Findings..............................................................................................................63 5.2. Limitations and Assumptions............................................................................64 5.3. Recommendation for future research.................................................................64 REFERENCES.................................................................................................................66 PAGE 7 iv LIST OF TABLES Table 1 Statistical Properties of Variables........................................................................46 Table 2 Result of Negative Binomial Regression.............................................................48 Table 3 Coefficient of Correlations for Independent Variables........................................49 Table 4 Regression Result without Children Resident Rate.............................................52 Table 5 Regression Result with Vulnerable Resident Rate..............................................53 Table 6 Negative Binomial Model with Population Density............................................55 Table 7 Negative Binomial Model with Residential Land Use Percentage......................56 Table 8 Regression Results for Block Groups with Traffic Count Data Inside................61 PAGE 8 v LIST OF FIGURES Figure 1 Hillsborough County............................................................................................2 Figure 2 Accident Point Data............................................................................................24 Figure 3 State Roadways..................................................................................................25 Figure 4 Boundaries of Census Block Groups..................................................................27 Figure 5 Land Use Map....................................................................................................29 Figure 6 Excluded Census Block Groups.........................................................................31 Figure 7 Accident Density Map........................................................................................34 Figure 8 Four Land Use Types.........................................................................................35 Figure 9 AADT Points......................................................................................................38 Figure 10 Distribution of The Dependent Variable..........................................................40 Figure 11 Distribution with A Radius of 250m................................................................41 Figure 12 Distribution with A Radius of 1000m..............................................................42 Figure 13 Census Block Groups with Traffic Count Data................................................60 PAGE 9 vi SPATIAL ANALYSIS OF PEDESTRIAN ACCIDENTS Tomoyuki Hashimoto ABSTRACT Improving traffic safety for pedestrians is an urgent task for the Hillsborough County. The objective of this research is to understand the contributi ng factors that cause pedestrian accidents in Hillsborough County in the State of Florida. Specifically, this research attempts to determine the effects of demographic, land use, roadway and traffic volume factors on the number of pedestrian accidents. Five hypotheses were proposed to examine these factors. GIS was used to pe rform spatial analysis. Based on pedestrian accident records from 1999 to 2001, accident density map was created using Kernel density method. The value of the accident density was assigned to each census block group, and this value formed a dependent variab le of the model. Accident models were developed using negative binomial regres sion to model the relationship between contributing factors and pe destrian accidents. The results showed that commercial and service land use, residential land use, and the density of state roadways increase the num ber of pedestrian accidents. It was also found that average household income and the pe rcentage of elderly residents lower the number of pedestrian accidents. The percen tage of child residents did not show a statistically significant result. The effect of traffic volume wa s not determined because of data limitations. It should be explored in future research. PAGE 10 1 CHAPTER 1: INTRODUCTION 1.1 Background In the United States, 4,995 people died in the year 2001 while walking down the street, up from the toll of 4,843 in 2000. An estimated 78,000 pedestrians were injured during each of those two years (Surface Transportation Policy Project, 2002). The State of Florida has been suffering an enormous socioeconomic loss due to pedestrian accidents. In the St ate of Florida, 16.8% of all traffic deaths were pedestrians, as compared to a national average of 12% in the year 20002001 (Surface Transportation Policy Project, 2002). The study area of this research is Hi llsborough County in the State of Florida (Figure 1). Hillsborough County is located midw ay along the west coast of Florida, with a population of 1,083,520 in 2003 (Metropoli tan Planning Organization, 2004). The population is growing, and mass transit is not developed around this area. Severe demands are placed on the roadways. Also, for pedestrians, TampaSt. PetersburgClearwater was ranked as the second most da ngerous metropolitan area in the nation, based on an index of pedestri an deaths and percentage of commuters walking to work (Surface Transportation Policy Project, 2002). Improving safety for pedestrians is an urgent task for Hillsborough County. PAGE 11 Figure 1 Hillsborough County 2 PAGE 12 3 1.2 Objective The objective of this research is to understand the cont ributing factors that cause pedestrian accidents in Hillsborough County. Specifically, this research attempts to determine the effects of demographic, land use, roadway and traffic variables on the number of pedestrian accidents. Five hypotheses are proposed to examine thes e factors. The first hypothesis is that residents in low income areas are more likel y to experience pedestrian accidents because of their behavioral patterns (i.e. most of them dont own cars and walk in the neighborhood more often compared to the residents in the other areas). The second hypothesis is that land use, in particular concentrated reside ntial and commerc ial land use, have a significant effect on the number of pe destrian accidents because a large number of the people walk around these areas The third hypothesis is that age distribution in an area affects the number of accidents. It is reasona ble to assume children and elderly people are less careful when they face traffic and they ar e more likely to get involved in accidents. The fourth hypothesis is that th e areas with high roadway density (the length of the state roadways divided by the total area) have more pedestrian accidents than other areas. The fifth hypothesis is that high traffic vol ume areas experience more accidents. To examine these hypotheses, an accident model is developed using demographic, land use, roadway and traffic factors as indepe ndent variables. This research begins with a review of previous research that have attempted to rela te traffic accidents (including pedestrian accidents) and contri buting factors (Chapter 2). On the basis of this review, an appropriate modeling approach is selected to study the relationship between the PAGE 13 4 contributing factors and pedest rian accidents (Chapter 3) This is followed by model estimation and discussion of the model (Cha pter 4). Finally, an overall summary of model findings and conclusions are provided (Chapter 5). If any of the independent variables have a statistically significant eff ect on the pedestrian accidents, and residents in particular areas have higher risk than in others, this wi ll give useful information on transport and land use planning policies. PAGE 14 5 CHAPTER 2: LITER ATURE REVIEW The purpose of this review is to inves tigate the literature on the subject of pedestrian accident analysis. Four of the accid ent contributing factors are explored in this chapter, followed by a review of statistical models and sp atial analysis techniques. 2.1 Demographic Factors First, several studies have analyzed the effects of demographic factors on pedestrian accidents. Rivera et al. (1984) investigated th e factors in a childs living environment and socioeconomic background that contribute to the risk of pedestrian injury. 210 pedestrian injuries among children aged 0 to 14 years were reported in the city of Memphis in 1982. The City of Memphis had a population of around 650,000 at that time. For every census tract in the cit y, stepwise multiple regression analyses were performed to determine a combination of econom ic and environmental variables. Injuries per acre formed the dependent variable. Census tracts with report ed injuries had twice th e percentage of nonwhite population, lower household incomes, more child ren living in femaleheaded house holds, more families living below the poverty level, and greater household crowding, compared with those without reported injuries. PAGE 15 6 Abdalla (1997) investigated the relationship between casualty frequencies and the distance of the accidents from the zones of residence. Abdallas research was conducted for the Lothian region in Scotland. While mo re driver, rider or passenger casualties occurred at longer distances (greater than 2,500 meters) from the zone of residence, the number of pedestrian casualties decreased with the distance from the zone of residence, and there was a high concentration of pedest rian casualties within 500 meters of the residence. Abdalla compared the casualty rate in the most affluent areas (15%) with the most deprived areas (15%). In other words, th e top 15 % areas in average income and the bottom 15% areas. The rate in the most de prived areas per 10,000 residents was above the entire regions rate, whereas in the most affluent areas it was below. Also, it was observed that for short distances (up to 500 meters) between the casualtys residence zone and the accident location (not only pede strian accidents), higher rates of casualties are seen in areas characterized as deprived rather than affluent. Two important suggestions were provided by Abdallas research. First, pedestrian accidents are likely to occur within 500 meters of the residence. Second, residents in lower income areas are likely to show higher rates of pedestrian accidents. Harruff (1998) performed a retrospective analysis of 217 pedestrian traffic fatalities in Seattle. All deaths classified as pedestrian traffic fatalities during six years (19901995) were analyzed. Harruff found that elderly pedestrians we re most vulnerable because they were more likely to be injured as a pedestrian and more likely to die of injuries that a younger person might survive. PAGE 16 7 Petch et al. (2000) examined child road safe ty in the urban environment in the city of Salford (United Kingdom). Sp atial distribution of the casualties was investigated using both statistical and spatial analysis techniques. As concluding remarks, it was stated that the majority of child pedestrian and cycle accidents occurred within 500 meters of the casualties place of residence. LaScala (2000) explored some geogra phic correlates of pe destrian injury collisions through a spatial an alysis of pedestrian accident s data. This study examined rates of pedestrian injuries acr oss 149 census tracts in the c ity of San Francisco, which is highly developed area. The number of motor ve hicle collisions in which a pedestrian was injured or killed was aggregated within census tracts. The results showed that pedestrian injury rates were related to population density, age composition of the local population, unemployment ratio, gender and education. Availability of alcohol through ba rs was directly related to pede strian injury collisions in which the pedestrian had been drinking alcohol. Graham (2003a) attempted to understand in detail how the nature of the local environment affected pedestrian casualties in England. He chose w ards as the spatial unit for this analysis. The average area of wards was 14 square kilometers. Graham adopted the number of pedestri an casualties in each ward as the dependent variable. Poisson regression and negative binomial regression modeling in case of over dispersion were us ed. It was stated that thes e models were appropriate because accident data were discrete in natu re and had a preponderance of zeros and small values. PAGE 17 8 The result showed that the wards with a large population had higher incidence of pedestrian casualties than other wards. Ho wever, as population density increased, the incidence of accidents decreased. Also, the cas ualties increased with urban density, but this effect diminished for the most extrem ely dense wards due to the effect of urban congestion. Also, socioeconomic status had a powerful negative influence on pedestrian casualties. Graham (2003b) analyzed child pedestrian casualties in England, focusing on the influence of socioeconomic deprivation. As well as the aforementioned study by Graham, 8414 wards were used as spatial analysis units Using the same independent variables as the previous study, an accident model was de veloped for adult pedestrian accidents and children pedestrian accidents. Negative bi nomial regression modeling was used on the grounds that the discrete nature of account data and the preponderance of zeros and small values meant that the use of linear multiple regression could produce inefficient, inconsistent, and biased estimates. Also, overdispersion of the data convinced to use a negative binomial model instead of a Poisson regression model. The result showed an association betw een increased deprivation and higher number of pedestrian casualties across Engl and. The deprivation effect was strong both for all children casualties and for children ki lled or seriously injured. Estimates for adult casualties also revealed a pos itive and significant associati on with increasing deprivation, but the magnitude of the effect was smaller than for children casualties. Noland et al (2003) explored statistical associations between road casualties and demographic variables for 8414 wards in Engl and. To do so, statistical models were created to explain the inciden ce of accidents. Wards was ch osen as the spatial unit for PAGE 18 9 e types. analysis as well as Grahams research. Po isson and negative binom ial regression were used as opposed to multiple regression. It was concluded that densely populated areas tend to have few traffic casualties while commercial areas have many traffic casual ties. Also, the areas w ith higher levels of social deprivation had relative ly higher casualty rates, bu t this effect was less strong when only motorized casualties were considered. Noland and Quddus (2004) presented an analysis of pedestrian and bicycle casualties using crosssectiona l time series data for 11 regions of the Great Britain. Regional data on pedestrian and bicycle ro ad accidents over 20 years (from 1979 to 1998) were collected for the analysis. Negative binomial regression was used on the grounds that the distribution of acci dents had a Poisson distributi on with overdispersion in the error term. The findings included that increased expenditure on alcohol was positively associated with increased pedestrian and bicy cle accidents. Also, analyses did not show a clear relationship between population age characteristics and the likelihood of casualties. 2.2. Land Use Factors Some of the studies showed that certain land use types have an effect on the number of accidents. However, in some cases actual land use maps were not available, so proxy variables were used to estimate the land us Levine (1995a) examined spatial patterns in motor vehicle crashes on the Island of Oahu for 1990. Accidents were categorized as fatalities, serious injuries, alcohol PAGE 19 10 ekends. f pulation. related, any injuries, and all accidents. Also, accidents were examined for each hour in the 24hr day and for weekdays and we The results showed that most accidents were closer to employment centers than to residential areas. In the suburban and rural ar eas, however, accidents were more likely to involve fatalities or serious injuries, and they were related to nighttime driving and alcohol. Levines followup study (1995b) attempte d to explain the spatial patterns by population, employment and road ch aracteristic variables in or der to show that activities which generate trips also indirectly predict crashes. This method focused on characteristics of neighborhoods. In Levines research, the dependent variable was the number of accidents in 1990 in each census bl ock group on the Island of Oahu in the state of Hawaii. Linear regression was used in the model. The results showed that retail employmen t contributed the most to the number o accidents. It was suggested that two biases produced by assigning accidents to zones. First, accidents were assigned to zones, rath er than specific locations, producing a spatial error. Second, the use of zones assumes that the risk of accidents was uniform at all locations within the zone, a situation which was frequently not co rrect. However, it was claimed that by choosing small and relative ly homogenous zones, the advantages of grouping, namely the ability to associate charac teristics of the zone with accidents, outweigh the biases produced by grouping. Ng (2002) attempted to integrate the mapping and statistical techniques to develop a systematic algorithm to assess acc ident risk in Hong Kong, which is a highly developed area with a dense po PAGE 20 11 r e effects. The number of accident events in each Tra ffic Analysis Zone (274 zones) in Hong Kong was used as the dependent variable. Land use factors (27 types) were used as the independent variable on the gr ounds that they can influen ce human activities and human behavior, which affects the number of accident s. Ng attempted to explain accidents only by land use factors. Negative binomial regressi on was adopted as the statistical method. Ng stated that Negative bi nomial regression was found to be an appropriate form to mimic the relationship between the number of accidents and the land use factors. There were important findings in Ngs research. Highaccide nt rate zones and lowaccident rate zones showed different results. In highacci dent rate zones, the numbe of cinema seats, commercial area, and flatted factory area were found to have positive and significant effects on the number of pede strianrelated accidents. The results also showed that greenbelt area, specialized factory area, and th e number of territory school places had significantly negativ However, in lowaccident rate zones, there were no land us e factors that had significant effects on the pedest rianrelated acciden t occurrences. This result suggested that areas with high accident rates have diffe rent land use patterns compared to the areas with low rates. Also, it was stated that the algorithm was more efficient in the case of fatality and pedestrianr elated accident analysis. Kim and Yamashita (2002) examined the relationship between land use and accidents including pedestrian accidents in th e City and County of Honolulu. By linking police crash data (19861995) with land used data at the parcel level, each accident point was assigned to a land use code including Residential, Visitor Lodging or Other, Education, Religious & Social Institutions, Re creational and Cultural Activities, Military, PAGE 21 12 n other land u ot y ance, there might be resi dential uses occurring within comm rk node were in dicative overall of the relative densities of econo ian casua s. ere high, and that adults were more on traffic accidents. Dummy variables for la nd use factors were created based upon the Public Services, Manufacturing & Industrial, Commercial & Services, Utilities and Communication, Agriculture, and Vacant/Open land. There were several findings in the research. First, Visitor Lodging and Commercial land use dominated in pedest rian accidents. Second, the number of pedestrian crashes produced pe r acre of residential land use was much lower tha ses. It was pointed out that the difficult natu re of understandi ng the relationships between land use and traffic accidents. First, accidents occur on roadways, generally n on the adjacent properties. Second, the classi fication of land use may not be the most appropriate categories for accident analysis. Third, the land use maps do not completel portray all uses of land. For inst ercial or office districts. In the aforementioned research by Grah am (2003a), land use data were not observed at the ward level, so it was proposed that the amount of employment and the amount of population per netwo mic and residential uses. This research indicated that land use factors should be included in the pedestr lty model, in particular, residential land use and commercial land use factor Grahams another study (2003b) showed that children were more likely to encounter traffic accidents where residential po pulations w likely to be hit by cars in large employment centers. Nolands aforementioned research (2003) al so considered the land use effect PAGE 22 13 f f distribution of observed employment density and populati on density. Dummy variables for the level of urbanization were al so included as a land use variable. 2.3. Roadway Factors Many researchers have insisted that road ways have an effect on the number o accidents. However, most of the researches used road segments as the dependent variable. The focus of this research is the characteristics of the accidentprone areas, not the roadways. Therefore, only the research rela ted to zonal analysis are mentioned in this section. Levines research (1995b) included roadway factors as a contributing factor. The results showed that three of the roadway vari ables (the existence of a freeway link, miles of major arterial, and miles of freeway ramp/freeway access) produced positive and significant regression coefficients. Harruffs pedestrian fatality analysis (1998) showed there was lit tle correlation o the severity of injuries w ith the types of roadways. Grahams research (2003a) showed that th e length of main str eets in a ward had a positive effect on the number of pedestrian casualties, but minor roads had a negative effect. His followup research ( 2003b) found that roadway charac teristics had little effect on fatality accidents. Road length was, howev er, associated with increases in serious injuries. Also, the number of junctions and roundabouts in wards were not associated with any casualty types, except slight injuries. Nolands aforementioned research (2004) showed that road infrastructure expressed as the amount of each functional road class within a region, could affect the PAGE 23 14 ow. casualties. More minor roads in a region we re associated with fewer casualties. He suggested that if minor roads tend to have lower speeds, limiting speeds might be effective at reducing casualties. 2.4. Traffic Factors Some researchers have suggest ed that traffic factors, in particular traffic volume are associated with the number of pedestrian accidents. However, traffic volume in a particular area is more difficult to obtain than traffic volume of the roadways. Also, traffic volume data were not available in some of the research, and some of the studies applied proxy variables to repres ent traffic volume in study areas. Levines aforementioned research (1995b) suggested traffic volume data in the zones would have improved his model, wh ich was not available for his research. In Petchs aforementioned research on ch ild safety (2000), the model predicted a positive correlation between average traffic volume and the child pedestrian/cyclist casualty rate. In the case of main streets, a positive associ ation was found between average traffic volume and child pe destrian/cyclist casualty rate. LaScala (2000)s aforementioned research also showed that pedestrian injury rates were positively related to high traffic fl To estimate traffic flows in each ward, the aforementioned research by Graham (2003a) constructed proxy variables for traffic flow. The level of employment and resident population was as a parameter of tr affic volume in the ward and included in the model. The logic was that the relative locati on of people, jobs, and distances from other wards can provide proxy vari ables for traffic flow. PAGE 24 15 f s. : The results were mixed. As the employment parameter increase d, the incidence o pedestrian casualties increased in most cases However, in highly developed areas, the incidence fell, probably due to congestion effect on traffic speed s or traffic calming measures. In contrast, proximate population had an increasing effect on pedestrian casualties. Nolands aforementioned research (2003) adopted the same method as Grahams to represent traffic volume. He concluded tr affic proxy variables were positively related to the number of accidents in the ward 2.5. Statistical Properties of Models Regression analysis has been used for mo st of the existing studies to relate accidents to a set of independent variables. Regression analysis is used for this research because regression model can estimate the st atistical significance of multiple factors (variables) in one model. Each of the factor s can be tested its e ffect on the dependent variable as an additional effect. If one of the variables show ed a statistically significant value, it can be interpreted that the variable affects the dependent variable in addition to the rest of the inde pendent variables. Statistical modeling of accident freque ncy of a given area can be obtained by multiple linear regression, Poisson regression or negative binomial regression. In this section, the characteristics of these three regression models are reviewed. 2.5.1. Multiple Linear Regression Model Multiple linear regression models have the following general form PAGE 25 x Where = Expected mean number of events. x = Vectors representing the independent variables. = Vectors representing parameters to be estimated. = Error terms assumed to be distributed as normal (Mostafa, 1998) is To identify the optimum model, Fvalue, Rsquare and mean square error should be used. Individual parameters in the vector are tested to investigate the null hypothesthat a given parameter is zero using tstatistics. However, multiple linear regression should be used with caution because accident frequency data are nonnegative, nonnormally distributed and have error terms with unequal variance (Mostafa, 1998). Ng and AbdelAty (2000) stated that the multiple regression model was unsuitable for determining the number of accident events. 2.5.2. Poisson Regression Model Poisson regression is based on the assumption that the dependent variable isPoissondistributed. Poisson regression models the probability of discrete events such as traffic accidents according to the Poisson process as follows: !)exp()(Priinininobi 16 PAGE 26 And )exp(iix Where, in is the target number of events on section i over a period of time ; t i is expected mean number of events; x is a vector representing the independent variables of section i; and is a vector representing parameters to be estimated; In Poisson regression, the coefficient vector can be estimated by a standardmaximum likelihood method with the likelihood function, )( L being NiiniinxxLi1!)])][exp(exp(exp[)( Where, is the total number of analysis units (Mostafa, 1998). N 2.5.3. Negative Binomial Regression Model The negative binomial model arises from the Poisson model by specifying: )exp(iix Where, 17 PAGE 27 i is expected mean number of events on sectioni; is a vector representing parameters to be estimated; ix is a vector representing the an independent variable on section ; i i is error term, where )exp(i has a gamma distribution with mean 1 and variance 2 The resulting probability distribution is as follows: !)]exp()][exp(exp[)(Priniiiiinnobi Integrating out of the expression produces the unconditional distribution of n. The formulation of this distribution is: )1()!)(()()(Priiiiiuunnnob Where, )( iu and 1 The corresponding likelihood function is: 18 PAGE 28 iniiNiiiiuunnL)1(!)()()(1 Where is the total number of sections. This function is maximized to obtain coefficient estimates for and Compared with the Poisson model, this model has an additional parameter N such that ]][1][[][iiinEnEnVar The choice between the negative binomial model and the Poisson model is determined by the statistical significance of the estimated coefficient If is not significantly different from zero (as measured by tstatistics) the negative binomial model simply reduces to a Poisson regression with ][][iinEnVar If is significantly different from zero, the negative binomial is the correct choice and the Poisson model is inappropriate (Mostafa, 1998). Therefore, in most research on accident analysis, multiple linear regression was not used because of the characteristics of accident data. Poisson regression and negatibinomial regression have been used for this type of research. Also, the choice between Poisson regression and negative binomial regression was made based on the value of ve 19 PAGE 29 20 l tatistics. 2.6. Spatial Analysis Technique Some of the studies used Geographic Information System (GIS) to perform spatia analysis on traffic accidents. GIS is a com puterbased system capable of capturing, storing, manipulating, and displaying geographically referenced data, and converting them into spatial informati on useful in solving complex spatial problems (Yeung, 2002). Using GIS, the relationship between accident occurrence and the characteristics of the areas can be analyzed. Specifically, the following techniques are used in this research. Density is a function in ArcGIS Spatial Analys t that distributes the quantity or magnitude of point or line observations over a unit of area to create a con tinuous raster (GIS Dictionary, 2004). Based on accident point data an accident density map is created by this function. Zonal statistics is a functi on in ArcGIS Spatial Analyst that calculates a statistic for each zone of a zone data set based on va lues from another data set. A single output value is computed for every zone in the input zone data set (GIS Dictionary, 2004). Accident density for each area (census block gr oup) is calculated us ing Zonal s Intersect is the topological overlay of two layers that preserves a feature falling within the spatial extent common to both la yers (Yeung, 2002). Intersect make it possible to apply the boundaries of census block groups to other datasets such as land use and roadway data. Spatial Join is a type of table join operations in which fields from one layers attribute table are appended to another la yers attribute table based on the relative PAGE 30 21 ee e locations of the features in the two layers (GIS Dictionary, 2004). The traffic volume of each area is estimated from AADT point data using this function. 2.7. Summary of The Literature Review After careful review of the literature, th is research found several important aspects that should be considered when perf orming pedestrian accident analysis. Lowincome areas are positively related to the number of pedestrian accidents. Age distribution in the areas should also be taken into account. The effect of land use types, especially residential areas and comme rcial areas, needs to be explored. Roadway factors and traffic volume in th e areas should be included as f actors as well. However, the result of this research might be different fr om the ones from previous studies. The degr of development, transportation methods, or other geographic conditi ons can affect th results. The result of the research in th e Hillsborough County might not be the same as the result of the other research. For example, some of the previous studies were conducted in England. Railroad transportati on is developed in England. Commuters do not necessarily use cars to go to their wor kplaces. In the Hillsborough County, passenger trail is not available. Most people commute to work by car. Also, the scale of the research is different. Some research used much larger area as the analysis unit than this research. These factors can affect the result of the model. The multiple linear regression model, which is useful in many situations, is not an appropriate method when using the number of accidents as a dependent variable. Accident data are generally nonnegative a nd nonnormally distributed, and they have PAGE 31 22 error terms with unequal varian ce. Most studies have used Poisson regression or negative binomial regression to perform accident analysis. A large body of research exists on the influence of road geometry and demographic factors on traffic accidents incl uding pedestrian accidents. However, fewer studies have examined the effect of la nd use and traffic volume on the number of accidents, mainly because these data were diffi cult to obtain. Also, most studies analyzed large geographic area such as ward (the av erage area of each ward is about 14 square kilometers in England). In order to implemen t effective policy for pedestrians, analysis on smaller geographic areas is important when considering limited funds for pedestrian safety. In addition, zonal analysis assumes th at the risk of accident is uniform at all locations within the zone, which is frequently not true. Small scale analysis is necessary to analyze accidents accurately. In this research, a census block group, which is a fairly small geographic area (the average area of census block groups is 4.125 square kilometers in the Hillsborough County) is used as a spatia l analysis unit. PAGE 32 23 CHAPTER 3: METHODOLOGY 3.1. Data Collection 3.1.1. Accident Data The accident data used in this research was obtained from the Traffic Division at the Hillsborough County for the years 1999 to 2001. The data consist of the accident records reported by the City of Tampa and the Hillsborough County Sheriffs office. The data included 1,648 pedestrian accident record s (Figure 2). They are stored in shapefile format (shapefile is the file format for graphical data used by ArcGIS). Because Hillsborough County has several authorities which report the accidents in different formats, it was difficult to collect all of the pe destrian accident data. This was the largest sample dataset available for the research. PAGE 33 Figure 2 Accident Point Data 3.1.2. Roadway Data Roadway data were obtained from the Florida Geographic Data Library (FloridaGeographic Data Library, 2004). The roadway data were created in 2001 by Florida Department of Transportation and the Transportation Statistics Office. The data cover all of the roadways owned by the State of Florida. The data are shown in Figure 3. In addition, roadway data which include local roadways were obtained from the Traffic 24 PAGE 34 Division at Hillsborough County. This dataset was created in 2001, and it was stored in shapefile format as well. Figure 3 State Roadways 3.1.3. Census Data Census block groups were used as spatial units for the model. As described inChapter 2, accident analysis on small geographic area is lacking from previous research 25 PAGE 35 26 efforts. The smallest census unit is census bl ock, but demographic da ta were not available at census block level. Also, it was expected that most of th e census blocks would have no accident value (Even at census block group le vel, 59 out of 732 block groups had a value of zero). Accordingly, census block gr oups were used as analysis unit. In the year of 2000, there were 795 cens us block groups in the county; and the average area of one census block group wa s 4.125km. Also, the average number of residents in one census block group was 1,256. The census data were created by the U.S. Census Bureau, and stored in shapefile format by the Geoplan Center at the University of Florida (Geoplan Center, 2004). The census data used in this research include average household income, population density, and pop ulation by age groups in each block group, which are used as demographic variables in the model. The boundaries of census blocks are shown in Figure 4. PAGE 36 Figure 4 Boundaries of Census Block Groups 3.1.4. Traffic Volume Data Traffic volume data were provided by the Metropolitan Planning Organization. The data include Annual Average Daily Traffic (AADT) measured at 551 stations from 1999 to 2001 in the Hillsborough County (MPO, personal communication). This value is used as a parameter of the traffic volume for each census block. The procedures aggregating AADT into each block groups is described in a la ter section. 27 PAGE 37 28 e Florida. 3.1.5. Land Use Data Land use data were collected to examin e land use distribution in each census block group. The land use data were obt ained from South West Florida Water Management District (SWFWMD, 2004). Land use data were created in 1999, and stored in shapefile format. All of the areas in Hillsborough County were categorized by more than 40 land use types based on Florida Depa rtment of Transporta tion Florida Land Us and Land Cover Classification System (FLUCCS ). Land use map is shown in Figure 5. Zoomed area shows the neighborhood of the University of South Among those are Commercial and Serv ice, Residential High Density Residential Middle Density, and Residen tial Low Density. Because one of the hypotheses is that these concentrated resi dential and commercial land use affect the number of pedestrian accidents, these four land use types are used as independent variables in the model. PAGE 38 Figure 5 Land Use Map 3.2. Data Processing 3.2.1. Software Used The GIS software used was ArcGIS 8.3, developed by ESRI. The procedure ofcombining the accident points, census, land use, roadway, and traffic volume data is described in the following section. As for statistical analysis, LIMDEP was used in this 29 PAGE 39 30 t research. LIMDEP was developed by Economet ric Software Corporation. Regression models in this research were examined using this software. 3.2.2. Exclusion of The Areas Some of the census block groups were removed from analysis. There were eigh census block groups with no residents. These zones were excluded because demographic analysis is not possible fo r those block groups. Also, 53 block groups within the jurisdiction of Plant City and Temple Terrace were removed as well. This was because although these authorities investigate traffic acc idents, their reports we re not available for this research. In order to avoid bias in the data, these census blocks should not be used for analysis. As a result, 732 out of 795 block groups were used for analysis. Excluded areas are shown in Figure 6. PAGE 40 Figure 6 Excluded Census Block Groups 3.2.3. Create Accident Density Map Accident points have to be counted for each census block group. As seen Figure 2and Figure 4, Accident data are points, and census block group data are areas in nature. The easiest way to count accidents is to overlay these two datasets and count the number of accident points for each census block group. t However, there was a problem with the accident point data. All of the accidenpoints were geocoded based on intersections. Even though some of the accidents 31 PAGE 41 32 ps. sprea wed most of the pe IS 8.3 to calculate the density of the cells, which are si ccurred occurred somewhere between the intersections, they were geocoded on the closest intersection. In other words, coded accident points do not represent the exact location where the accidents occurred. Also, another problem was identified which is related to the characteristics of point data. All of the accide nt points are geocoded on only one of the census block groups, in spite of the fact that some of the accidents occur at the boundary of census block groups. These accidents are counted for either one of the census block grou However, accidents occur at the boundaries of census block groups, not at one of the census block groups. In order to take account of adjacent areas, accident points have to be d out. This research decided to create an accide nt density map to deal with these issues All of the areas in Hillsborough County were divided into 30m by 30m cells. For each 30m by 30m cell, the number of accidents with in a 500m radius was counted. The search radius was set at 500 meters because a previous study by Abdalla ( 1997) sho destrian accidents occurred within 500 meters of the casualtys home. There are two options in ArcG mple method and Kernel method. For each cell, the simple method adds up the number of accidents which o within a circle of 500 meters radius of the cell. The distance from the cell is not considered for this method. For example, an accident which occurred at the distance of 499 meters (from the center of the cell) is count ed as same weight as an accident at the distance of five meters. This method is not ap propriate to represent accident point density PAGE 42 33 A cel l ts unted with much more weight than reate the accident density map. The resulting map is shown in Figure 7. l value can be very high though accident does not occur nearby the cell. Also, a cel value can be low even if an accident poi nt is located exactly at the cell. On the other hand, the Kernel method calcu lates cell values using the distance of accident points from the center of the cell. Th e Kernel method creates a density map with a smooth surface in which the density at each location reflects the concentration of poin in the surrounding area. Distances are meas ured from the center of the cell to each observation that falls within 500 meters. Each accident point contri butes to the density value of the cell based on its distance from the center. Nearby accidents are given more weight in the density calculation than those further away. For example, an accident which occurred at five meters from the center of th e cell is co an accident at 499 meters. Therefore, the Kern el method was used to c PAGE 43 Figure 7 Accident Density Map 3.2.4. Calculate Accident Value for Census Block Groups The average value of all of the cells in each census block group was calculatedusing zonal statistics. The output value is the number of accidents per km for the census block groups. This value ranged from 0 to 35.72 accidents per km. It should be mentioned that 391 of the 732 block groups had a value of less than 1, and 59 out of 732 block groups had a value of zero. These low values have to be considered when selecting 34 PAGE 44 an appropriate statistical model. This is discussed in the statistical analysis section of this chapter. 3.2.5. Development of Land Use Variables 35s As mentioned before, this research uses four types of independent variablecontributing to pedestrian accidents. Land use variables were developed as follows. Census block groups were intersected with land use data. Figure 8 is a map which represents the four land use types used as independent variables. Zoomed area shows the neighborhood of the University of South Florida. PAGE 45 36 , Figure 8 Four Land Use Types Next, for each block group, the total area was calculated as well as the total areas of Commercial and Service, Residential Hi gh Density, Residential Medium Density and Residential Low Density. The area of Commercial and Service was divided by the total area of the census block group, a nd multiplied by 100, which represents the percentage of commercial and service la nd use in a census block group. This value formed one independent variable in the model. As for residential land use, the area was cal culated based on resident density. It is a reasonable assumption that there are more pedestrians in the neighborhoods of high density residential areas than ones in low de nsity areas. If there are more pedestrians on the street, those areas are more likely to have higher accident rates. This should be taken into account. According to the definition of the land us e data, Residential High Density areas are composed of areas with six or more dw elling units per acre. Residential Medium Density areas have two to five dwelling un its per acre. Also, Residential Low Density areas are less than two dwe lling units per acre (FDOT, 2004). Using this information, the residential land use variable was calculated as follows: (The area of Residential High Densit y* 3 + the area of Residential Med Density 2 + the area of Residential Low De nsity 1) / the total area of the block group*100 PAGE 46 37 e. ntration as This value was included in the m odel as an independent variable. 3.2.6. Development of Roadway Variable In preliminary research, all of the roadways including lo cal roads were calculated as roadway variable. However, it was found that the length of all roadways was highly correlated with the residential land use variable. It was also found that the length of state roadways was less correlated with any of the independent variables. In addition, preliminary research found that 68.3 % of the accidents (1126 out of 1648 accidents) occurred along state roadways. Therefore, it was decided to use the length of state roadways as roadway variable. The roadway shapefile, which included every state roadway, was intersectd with the census block group shapefil The total length of the state roadways in each block group was calculated. Next, this value was divided by the total land area of each block group to obtain roadway density in meters per km. It is obvious that a census block group with a large area is more likely to have long roadways. It does not mean, however, that the roadways are concentrated in those large block groups. Ther efore, in order to explore the conce of the roadways, the length of roadway was divided by land area. This value was used a roadway variable in the model 3.2.7. Development of Traffic Variable PAGE 47 38ured. Using traffic count data from the Metropolitan Planning Organization, traffic count values were assigned to census block groups. Figure 9 shows the locations of the points where AADT was meas Figure 9 AADT Points Spatial join was used to assign AADT values to each census block group. If acensus block group does not have any AADT point inside the area, the value of the closest AADT point was assigned to each census block group. The average distance from a census block group to AADT point was 355 meters. If a census block group has AADT PAGE 48 39 point inside the area, the value of AADT wa s applied to the census block group. For the census block groups with multiple AADT point s inside, the average of the AADT values were calculated and used as AADT values for the census block groups. It is assumed that these values represent the traffi c volume of the census block group. 3.2.8. Development of Demographic Variables Each census block group has demographic data based on the 2000 census Population, average household in come, and population by age were used in the research. First, average household income was include d in the model. Second, population density, childrenrate, and elderlyrate were calculated as follows: Population density = total population / total area (km) Childrenratio = population at the age of 0 ~ 17 / total population Elderlyratio = population of more than 65 y ears old / total population These values were included in the model as well. 3.3. Statistical Analysis 3.3.1. Distribution Fitting The dependent variable in the model is the average number of pedestrian accidents from 1999 to 2001 in each census block group. Prior to the statistical modeling, the general shape of the average number of pe destrian accidents was explored in order to PAGE 49 provide the basis for understanding distribution of the data. Figure 10 shows the histogram for average number of pedestrian accidents in census block groups. As mentioned before, the search radius of the density map was set at 500m. In addition to this, two search radii (250m and 1000m) were tested to see whether the search radius changes the distribution of the dependent variable. For reference, two additional histograms are shown in Figure 11(search radius of 250m) and Figure 12 (search radius of 1000m). Census Block n = 732 Mean = 2.78 Max = 35.72 Min = 0 Standard Deviation = 4.07 Groups 04812162024283236 0100200300400 Figure 10 Distribution of The Dependent Variable 40 PAGE 50 Census Block n = 732 Mean = 2.75 Max = 40.96 Min = 0 Standard Deviation = 4.58 Groups 0481216202428323640 0100200300400 Figure 11 Distribution with A Radius of 250m 41 PAGE 51 Census Block n = 732 Mean = 2.64 Max = 23.69 Min = 0 Standard Deviation = 3.59 Groups 024681012141618202224 0100200300400 Figure 12 Distribution with A Radius of 1000m In Figure 10, it is apparent that a large number of census block groups have no overy low values of pedestrian accidents. The distribution seemed to follow the Poisson distribution. Also, as seen in Figure 11 and Figure 12, the distribution was similar to the one in Figure 10. r 3.3.2. Selection of Statistical Model 42 PAGE 52 43 As previous researchers sugge sted, in general, accident frequency is nonnegative, nonnormally distributed and it has error terms with unequal variance. These are violations of the statistical assumptions of the multiple linear regression models such as a normal distribution and homoscedastic assu mption on error terms. Violation of the assumptions can invalidate the hypothesis tests concerning the significance of the parameters (Jovanis & Chang, 1986). In addi tion, the multiple linear regression model might predict a negative number of accidents th at never occurs in reality. Preliminary research using multiple linear regression s howed that 94 out of 732 census block groups predicted negative values. Therefore, other types of models should be considered. Due to the problems associated with the multiple linear regression model, the Poisson regression model should be consider ed. The Poisson regression possesses most of the desirable statistical prope rties in describing accident frequency (Jovanis et al., 1986, Miau, et al, 1993, Shanker, et al, 1995). However, Poi sson regression requires the variance of the data to be equal to the mean. When the mean and the variance of the data are not approximately equal, the model coefficients are biased (Peng, 2004). When the variance is larger than the mean, it is term ed overdispersion. This discrepancy can be overcome by an additional assumption on the expected mean, which is a gamma distribution assumption. It is assumed that the number of accident events has a Poisson distribution with the expected mean of a ga mma distribution. As such, the number of accident events can be regarded as having a negative binomial di stribution (Cox, 1983; Land et al., 1996; McCullagh et al, 1989). Th e negative binomial regression model is particularly useful in accounting for the overdispersion (Land et al., 1996). PAGE 53 44 l, is asse ssed in the next chapter. The mean and variance of the data were calculated for the dependent variable (number of accidents / km). for this rese arch. The mean and variance were 4.072 and 16,579, respectively. The varian ce is much larger than the mean, which suggests that the results of the modeling would be biased if this research used the Poisson regression model. Also, the choice between the Negative Binomial model and the Poisson model can largely be determined by the statistica l significance of the estimated coefficient If is not significantly different from zero (as measured by tstatistics) the negative binomial model simply reduces to a Poisson regression. If is significantly different from zero, the negative binomial is the correct choice and th e Poisson model is inappropriate (Mostafa, 1998). value of the data, which is critical to select an appropriate mode PAGE 54 45 CHAPTER 4: RESULTS AND DISCUSSION 4.1. Statistical Properties of The Variables Before running regression models, statistic al properties of each variable were reviewed, which is shown in Table 1. M ean, standard deviation, minimum value, maximal values were examined. PAGE 55 46 Table 1 Statistical Properties of Variables Variables Mean Standard deviation min. max. Accidents (number / km) 2.781792 4.071728 0 35.72015 Children rate ratio 0.244663 0.089762 0 0.617 Elderly ratio 0.130598 0.123647 0 0.899 Average income (dollars) 50120.61 24821.19 0 200001 Commercial (%) 10.53521 14.23568 0 88.80763 Residential land use (%) 142.7811 89.52254 0 300 Road density (m / km) 1693.212 1667.707 0 13820.22 Traffic volume (vehicles / day) 31313.22 27996.13 1091 163000 4.2. Result of Negative Binomial Regression Model Table 2 represents the accident model obtained by negative binomial regression The dispersion parameter, is highly significant (as tvalue is 19.909 which is much greater than 1.96) at the 95% confidence inte rval. Therefore, the mean varies from the variance significantly, and this dataset is overdispersed. This value confirms the PAGE 56 47 appropriateness of the negative binomial regres sion relative to the Poisson regression for modeling this dataset. In order to measure the overall goodnessoffit, the deviance value (2(LL(( )(0)) which follows the Chisquare distribution has been used as suggested by Agresti (1990). This Chisquare test of the deviance value (310646.6 at degree of freedom at 6), which is analogues to the Ftest in lin ear regression modeling, str ongly rejects the null hypothesis that the obtained model has explanatory power equal to that of the model with the constant term only. This value theref ore shows a good overall statistical fit. Also, the coefficient of correlations was estimated for each independent variable The result is shown in Table 3. PAGE 57 48 Table 2 Result of Negative Binomial Regression Independent variables Value Standard Error t value p value (Intercept) 3.9646 0.2293 17.292 0.00 Children resident (0 to 17 years) ratio 0.7890E01 0.5385 0.147 0.88 Elderly resident (65 years or more) ratio 0.9537 0.4114 2.318 0.02 Average household income 0.1838E04 0.1665E05 11.041 0.00 Commercial and service land use 0.4476E01 0.4235E02 10.570 0.00 Residential land use 0.8409E02 0.5316E03 15.819 0.00 Road density 0.1598E03 0.3250E04 4.916 0.00 Traffic volume 3.7834E06 0.1739E05 2.212 0.03 Chi squared 310646.6 19.909 PAGE 58 Table 3 Coefficient of Correlations for Variables Variables Accidents Children ratio Elderly ratio Medium family income Commercial and Service land use Residential land use Road density Accidents 1 Children rate 0.065 1 Elderly rate 0.002 0.509 1 Average household income 0.405 0.083 0.055 1 Commercial and service land use 0.498 0.131 0.026 0.328 1 Residential land use 0.204 0.013 0.076 0.037 0.104 Road density 0.475 0.15 0.04 0.155 0.544 1 Traffic volume 0.230 0.113 0.099 0.063 0.185 0.012 0.119 4.3. Discussion of Each Variable 4.3.1. Demographic Variables 49 Average household income, children resident ratio, and elderly resident ratio wereincluded in the model as demographic variables. Table 2 showed that average household income has a strong negative effect on the number of pedestrian accidents in census block groups. The tvalue of 11.041 is much lower than 1.96, which is the 95% confidence level of significance. This value suggests that an increase in the average household income lowers pedestrian accidents in a census block group. PAGE 59 50 ces the street hat because elderl y people are likely to die of injuries which young he to school using The elderly resident rate showed a tvalue of 2.318. This value suggests a negative relationship between the elderly resi dent rate and pedestrian accidents. This result was against the hypothesis that an area with high elderly reside nt ratio experien more accidents than others. A possible interp retation could be that elderly people spend more time at home than the other populat ion and they are less likely to encounter accidents. It could also be suggested that elderly people are more careful crossing s. Harruffs research (1998) showed the olde st age group has the highest pedestrian fatality rates, which differs from the result of this research. A probabl e explanation is t this research used all pedestrian accidents regardless of the severity of the injuries. If fatality accidents were used as the dependent variable, the result would have been similar to the one in Harruffs research, er people might survive. The children resident ratio did not indicat e a statistically significant result. T children resident ratio was expected to have a positive relationship with pedestrian accidents based on the assumption that children are generally more careless than adults. It could be that children at a very young ag e are accompanied by th eir parents, and the parents watch out for cars. The result could have been different if the research had used children over 10 years old. Anothe r factor could be that some of the children go a school bus, and they do not wa lk around the neighborhood very often. In Petchs research (2000), the model predicted a positive correlation between average traffic volume and child pedestrian/cyclist casualty rate. Only children casualties PAGE 60 51 were used as the dependent variable by Petch s research. It could interpreted that children migh the likelihood of rounding errors in the ca an nt ratio zed that bot h elderly people and children were likely to encounter pedestrian accidents, and this m odel treated these two age groups together. The result is shown in Table 5. t be more vulnerable in congest ed areas than the other age groups. However, there was a correlation between the children resident ratio and the elderly resident ratio. Table 3 indicated the coefficient of correlation was 0.509. This may result in multicollinearity. Multicollin earity exists when two or more of the independent variables used in regression are moderately or highly correlated. High correlations among independent variables in crease lculations of the estimates and standard errors. Multicollinearity can also have effect on the signs of the paramete r estimates (Mendenhall et al, 2003). In order to avoid multicollinearity and im prove the accuracy of the model, this research assessed two additional accident models. The first model used the same variables as the ones in Table 2 except the ch ildren resident ratio variable, because the tvalue of the children ra tio (0.147) was not statistically si gnificant. The result is shown in Table 4. The second model used the children re sident ratio and the el derly reside as one variable. The value of the elderly resi dent ratio in each census block was simply added to the value of the children reside nt ratio, and a new valuable, vulnerable population ratio was created. This research hypothesi PAGE 61 52 Table 4 Regression Result without Children Resident Rate Independent variables Value Standard Error t value p value (Intercept) 3.9900 0.1503 26.540 0.00 Elderly residents (65 years or more) ratio 0.9799 0.3707 2.644 0.01 Average household income 1.8845E04 0.1605E05 11.497 0.00 Commercial and service land use 0.4470E01 0.4211E02 10.613 0.00 Residential land use 0.8418E02 0.5284E03 15.931 0.00 Road density 0.1597E03 0.3251E04 4.913 0.00 Traffic volume 0.3849E05 0.1738E05 2.215 0.02 Chi squared 310646.6 PAGE 62 53 Table 5 Regression Result with Vulnerable Resident Rate Independent variables Value Standard Error t value p value (Intercept) 4.1395 0.2155 19.213 0.00 Vulnerable population ratio (0~17 years old + 65 years or more) 0.6300 0.4101 1.536 0.12 Average household income 0.1891E04 0.1636E05 11.561 0.00 Commercial and service land use 0.4388E01 0.4218E02 10.403 0.00 Residential land use 0.8422E02 0.5352E03 15.735 0.00 Road density 0.1547E03 0.3244E04 4.767 0.00 Traffic volume 0.3912E05 0.1725E05 2.268 0.02 Chi squared 310642.8 Table 4 showed the same value of Chis quared test of deviance value, 310646.6 This value indicated that childr en resident rate did not improve the model. Therefore, this variable should not be included. The vulnerable population rate, which was shown in Table 5 did not show a statistically significant result. The t value of the vulnerabl e population rate was 1.536, which is higher than critical value of 95 % confidence level, 1.96. Therefore, only elderly resident ratio should be used as an age factor. It should be mentioned that there are a significant number of travelers or temporarily residents in the State of Florida. They were not counted as residents in census block groups, but they could also encounter acci dents on the street as we ll as the residents. PAGE 63 54 ult The result might have been diffe rent if this research had included travelers or temporarily citizens. 4.3.2. Land Use Variables Residential land use and commercial and se rvice land use were included as land use variables. In Table 4, residential land use showed a tvalue of 15.931. This res indicates that residential land use had a in creasing effect on the number of pedestrian accidents. As described in Chapter 3, this re search adjusted the area of residential land use based on the density of th e residential land use type. High density resi dential areas were multiplied by 3, and middle density residential areas were multiplied by 2, while low density areas were left intact. Because of this adjustment, this variable represented the density of residents in census block groups, as well as the percentage of residential areas. St ated differently, we could attain a similar result us ing population density instead of adjusted residential land use. We also might get a different result simp ly by adopting the percentage of residential land use without adjustment. Two additional regression models were developed using population density and the pe rcentage of residential land use, respectively. PAGE 64 55 Table 6 Negative Binomial Model with Population Density Independent variables Value Standard Error t value p value (Intercept) 4.1971 0.1649 25.455 0.00 Elderly residents (65 years or more) ratio 0.2722 0.4441 0.613 0.54 Average income 0.1525E04 0.1732E05 8.802 0.00 Commercial and service land use 0.3609E01 0.4362E02 8.275 0.00 Population density 0.5613E03 0.4962E04 11.313 0.00 Road density 0.1835E03 0.3329E04 5.512 0.00 Traffic volume 0.5613E03 0.4962E04 1.707 0.09 Chi squared 310592.6 PAGE 65 56 Table 7 Negative Binomial Model with Residential Land Use Percentage Independent variables Value Standard Error t value p value (Intercept) 3.8193 0.1614 23.656 0.00 Elderly residents (65 years or more) ratio 0.9005 0.3753 2.400 0.02 Average income 0.1879E04 0.1586E05 11.849 0.00 Commercial and service land use 0.4640E01 0.4360E02 10.644 0.00 Residential land use (not adjusted) 0.2487E01 0.1654E02 15.034 0.00 Road density 0.1831E03 0.3360E04 5.538 0.00 Traffic volume 0.4554E05 0.1793E05 2.540 0.01 Chi squared 310638.6 As expected, Table 6 showed a similar resu lt to the one with adjusted residentia land use (Table 4). However, elderly reside nt ratio and traffic volume did not show significant results at the 95% confidence level. Absolute values decreased for all of the variables, and the value of Chisquared fell from 310646.6 to 310592.6. Population density showed a t value of 11.313, which s uggests an increase in population density is positively related to the number of pedestri an accidents. The overall accuracy of the model was slightly decreased by replaci ng adjusted land use by population density. l Table 7 showed a very similar result to the one with adjusted residential land use (Table 4). All of the t values showed the sa me sign, and each of them was very close, though the value of Chi squared test decrea sed slightly (310646.6 to 310638.6). This can PAGE 66 57 be interpreted that pedestri ans tend to encounter accident s nearby residential areas, and accident prone areas are not necessarily high density areas. it was found that the adjustment of land use percentage did not improve the model. Levine (1995a) showed that most motor vehicle accidents (n ot only pedestrian accidents) were closer to employment cente rs than to residential areas, which do not agree with the result of this research. This research showed both of residential land use and commercial land use increase the number of accidents. This difference could be explained that caroncar accidents and pede strian accidents have a different spatial distribution. Grahams study (2003b) showed that child ren were more likely to encounter pedestrian accidents where resi dential population are high, and adults were more likely to be hit in large employment centers. Land us e effect might vary depending on the age groups. In addition, the result could have been different if diffe rent coefficients had been used to represent resident density. For in stance, Residential High Density areas were multiplied by three. According to the definition of the land use data, these areas have six or more dwelling units per acre. Some of the multistory apartments have much more than six dwelling units per acre. In that case, high density areas should be multiplied by a higher coefficient than 3, because more pe destrians are expected in these areas. The information on the details of land use types was not available for the research. Such information will be helpful for future research. PAGE 67 58 les had ethod. 4.3.3. Roadway Variable In Table 4, the tvalue of the roadway variable was 4.913. This value suggested that state roadway density is positively corre lated to the number of pedestrian accidents It was anticipated because preliminary resear ch showed that 68.3 % of accidents occurred along the state roadways. Table 3 showed that commercial and service land use had some correlation with the roadway variable. The coefficient of correlation was 0.544. This value was the highest of all the correlations. Again, ther e was a concern that these two variab multicollinearity. This correlation could be explained by the fact that, in general, commercial sites are developed along major roadways. This research used only state roadways, most of which are main streets. Consequently, this correlation might be unavoidable. As described in Chapter 3, all types of ro adways were used as roadway variable in preliminary research. However, it was found th at the density of th e all roadways was strongly correlated with the residential land use variable. The coefficient of correlation was approximately 0.70, which was higher than the value between state roadways and commercial and service land use. This correlation might be due to the fact that residential areas generally have a number of narrow and co mplicated roads inside. It is safe to say that state roadways and commercial and se rvice land use were less correlated compared to all roadways and residential land use. Th erefore, the density of state roadways was used in this research. The effect of the roadway density agrees with the result of previous research efforts, though each research measured the roadway factor in its own m PAGE 68 59 n up. This research used the density of state roadways as a roadway variable. However road geometric and environmental factors are also important to analyze pedestrian accidents. For example, the number of lanes, the median width, sidewalks, or the brightness of the roadways (at night) would have a significant effect on pedestrian accidents. These data were not available at a census block group level. It is recommended that future research addresses these factors. 4.3.4. Traffic Variable According to Table 4, the traffic variable indicated a tvalue of 2.215, which was higher than 1.96 (95% confiden ce level of significance). This value indicates that traffic volume in census block groups has a positive effect on the number of pedestrian accidents. As explained in Chapter 3, point data of traffic counts were assigned to each census block group. 289 census block groups had count data inside them, but 443 census block groups did not have count data. For thes e block groups, the va lue of traffic count that is closest to the block groups was applied to represent traffic vol ume. It is uncertai that these values can be used as indicator s of traffic volume for the whole block gro Therefore, additional regression models were tested. This model tested only the block groups that had traffic count data in their area. In this model, 289 census block groups formed a dependent variable, and se lected census block groups are shown in Figure 13. The result of negative binomial regression model is shown in Table 8 PAGE 69 Figure 13 Census Block Groups with Traffic Count Data 60 PAGE 70 61 Table 8 Regression Results for Block Groups with Traffic Count Data inside Independent variables Value Standard Error t value p value (Intercept) 3.7108 0.2114 17.584 0.00 Elderly residents (65 years or more) ratio 0.7646E01 0.4862 1.573 0.12 Average household income 0.1499E04 0.2426E05 6.178 0.00 Commercial and service land use 0.4234E01 0.4928E02 8.591 0.00 Residential land use 0.9803E02 0.8207E03 11.946 0.00 Road density 0.1622E03 0.4482E04 3.618 0.00 Traffic volume 0.2777E05 0.2096E05 1.325 0.19 Chi squared 146801.2 Table 8 showed a different result in terms of the traffic variable. Compared to the model including all census block groups (T able 4), the t value dropped from 2.212 to 1.325. This value is smaller than 1.96, a cri tical value of 95 % conf idence level. This value does not reject the null hypothesis that tr affic variable had no relationship with the number of pedestrian accidents. This result could be interpreted in several ways. First, a census block group was too small as analysis units. As seen in Figure 13, some of the census block groups had only one point inside their area. It may not be appropriate to measure the traffic volume of the entire zone from the value of a single location. The re sult would have been different if the analysis units were set at la rger areas, such as census tracts, because more PAGE 71 62 data are obtained per one analysis unit. Lascala (2000) found a positive relationship between traffic flow and pedestrian injury ra tes at census tract level. However, it was mentioned that the data points for traffic c ounts were not randomly distributed, because those points included only accidentprone areas and high traffic volume areas. It could also be suggested that tra ffic volume actually does not have a strong effect on pedestrian accidents. For example, as Graham (2003a) suggested, high traffic volume areas are not necessarily accident pr one areas. In congested streets, vehicles speed is generally low, so it is unlikely that pedestrians are involved in accidents. Consequently, census block groups with very high traffic volume might not have many accidents. In addition to this, the traffic regula tion such as traffic calming or traffic lights might have affected the result. In any case, this research could not reach a conclusion about the effect of traffic volume on pedestrian accidents. Understanding the traffic factor will be an important theme for future research. PAGE 72 63 re CHAPTER 5: CONCLUSIONS 5.1. Findings The objective of the research was to dete rmine the effects of contributing factors on the number of pedestrian accidents in the Hillsborough County. Five hypotheses we proposed to examine these factors. Base d on accident records from 1999 to 2001, an accident density map was created, and the valu e of the accident density was assigned to each census block group. Accident models were developed using negative binomial regression to model the relationship betw een contributing factors and pedestrian accidents. The result showed that commercial and se rvice land use, residential land use, and the density of state roadways increases the number of pedestrian accidents. It was also found that average household income and elde rly resident ratio lowers the number of pedestrian accidents. Children resident ratio did not show a statisti cally significant result. Also, population density was used instead of residential land use variable in the additional model, and the result was a similar one. In addition, per centage of residential use showed a similar result to the one with ad justed residential land use by the density of residents. This research gives useful information on transport and land us e planning policies The study area was in the Hillsborough County, but the findings of the research will be PAGE 73 64 le, applicable to the other parts of the country, in particular developing metropolitan areas where public transit is not developed. 5.2. Limitations and Assumptions It should be mentioned that there were so me limitations in the research. First although the accident data used for the research were the best dataset available, they did not include all of the pedestrian accidents occurred from 1999 to 2001. In addition, the area of the Plant City and the Temple Terrace was removed from analysis because accident records were not available. Second, accident data used to create accident density map were geocoded on intersections, so the location of the accident points was not completely accurate. Third, it was assumed that traffic volume in an area can be measured by the AADT value of the points in the area. The effect of traffic volume was not determined because it is not certain that this method can be used as a parameter for traffic volume for the area. Fourth, this research assumed that pedestrian accidents occur within 500 meters of the casualtys home. 5.3. Recommendation for Future Research This research used all kinds of pedestrian accidents regardless of the severity of the accidents. It is recommended that future research address this factor. For examp research on fatality accidents wi ll give useful information. The density of state roadway was used as a roadway variable. However, road geometric and environmental factors are also important to analyze pedestrian accidents PAGE 74 65 w explored. such as the number of lanes, the median wi dth, sidewalks, or lights. This type of information should be included in future research. Finally, it is recommended that future re search analyze the effect of traffic volume in detail. In this research, some of the census block groups had only one point inside their area. It may not be appropriate to measure the traffic volume of the entire zone from the value of a single location. Using census tracts as analysis units might sho a different result. Also, the effect of congest ion and traffic regulation should be PAGE 75 66 REFERENCES Abdalla, M., Raeside R., Barker D., McGuigan, R.D. (1997). An investigation into the relationships between area social char acteristics and road accident casualties. Accident Analysis and Prevention 29 (5), 583593. AbdelAty, M., Radwan, E. (2000). Modeling traffic accident occurrence and involvement. Accident analysis a nd prevention 32, 633642. Agresti, A. (1990). Categorical Data Analysis. New York: Jhon Willey, 7196. ArcGIS Desktop Help (2004) Retrieved on 08/30/04, from AsSaidi Ayman Hasan. (1995). Using geographic informatio n systems to investigate demographic characteristics of drivers involved in traffic crashes. M.S. thesis, Department of Civil Engineering, University of Central Florida. Chen, C. (1997). Statistical Analysis of the Effect of Demographic and Roadway Factors on Traffic Crash Involvement. M.S. thesis, Department of Civil Engineering, University of Central Florida. Cox, D.R. (1983). Some remarks on overdispersion. Biometrika, 70, 269274. Florida Department of Transportati on (2004). Retrieved on 09/15/2004, from http://www.dot.state.fl.us/surveyingandmapping/geographic.htm#fluccs Florida Geographic Data Library (2 004). Retrieved on 09/20/2004, from http://www.fgdl.org/ GeoPlan Center (2004). Census data. GeoPlan Center, University of Florida for the FGDL contributing organizations. GIS Dictionary (2004). Retrieved on 11/29/2004, from http://support.esri.com/index.cfm?fa=knowledgebase.gisDictionary/ Graham, D., Glaister S. (2003). Spatial Variation in Road Pedestrian Casualties: The Role of Urban Scale, Density and Landuse Mix. Urban Studies vol. 40 No.8 July, 15911607. PAGE 76 67 Graham, D., Glaister, S., Anderson, R. (2003). Child pedestrian casualties in England: the Effect of Area Deprivation. A report to the Institute for Public Policy Research, 13 th September. Harruff, Richard. C., Anne Avery and Amy S. A lterPandya (1997). Analysis of circumstances and injuries in 217 pedestrian traffic fatalities. Accident analysis and Prevention vol. 30 No.1, 1120. Jovanis P., Chang, H. (1986). Modeling the relationship of a ccidents to miles traveled. Transportation Research Record 1068. Kim, K., Yamashita, E. (2002). Motor vehicle crashes and land use Empirical analysis from Hawaii. Transportation Research Record vol. 1784 7379. Land, K. C., McCall, P.L., & Nagin, D. S. (1996). A comparison of Poisson, negative binomial, and semiparametric mixed Po isson regression modelswith empirical applications to criminal careers data. Sociological Methods and Research, 24(4), 387442. LaScala, Elizabeth. A., Daniel Gerber, Paul J. Gtuenewald (2000). Demographic and environmental correlated of pedestrian injury collisions: a statial analysis. Accident Analysis and Prevention 32 (2000) 651658. Levine, Ned., Karl, E. K., Nitz, L, H. (1995). Spatial Analysis of H onolulu Motor Vehicle Crashes: I. Spatial Patterns. Accident Analysis and Prevention vol. 27 No.5, 663674 Levine Ned, Karl E. Kim and Lawrence H. Nitz, (1995). Spatial Analysis of Honolulu Motor Vehicle Crashes: Zonal generators. Accident Analysis and Prevention vol. 27 No.5, 675685. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. London, New York: Chapman & Hall. Mendenhall, William., Sincich Terry (2003). A Second Course in Statistics: Regression Analysis. Sixth Edition New Jersey Pearson Education Inc. Metropolitan Planning Or ganization (2002).Retriev ed on 04/20/04, from http://www.hillsboroughmpo.org/about/purpose.htm Miaou, S.P.(1994). The relationship between truck a ccidents and geometric design of road section: Poisson versus negative Binomial regression. Accident Analysis and Prevention 26 (4), 471482. Mostofa H., (1998). Modeling of Traffic accidents on Principal Arterial. M.S. thesis, Department of Civil Engineering, University of Central Florida. PAGE 77 68 Ng Kwoksuen, Hung Wingtat, Wong Winggun., (2002). An algorithm for assessing the risk of traffic accident. Journal of Safety Research 33, 387410 Noland, R. B., Quddus M. A. (2003). A spatially disaggregate analysis of road casualties in England. Accident Analysis a nd Prevention, November. Peng, Haolei. (2004). Effects of TwoLane LeftTurn Lane on Roadway Safety. M.S. thesis, Department of Civil Engineer ing, University of South Florida. Petch, R.O., R.R. Henson (2000). Child safety in the urban environment. Journal of Transport Geography 8 197211. Shankar, V., Mannering F., and Barfield, W. (1995). Effect of roadway geometric and environment factors on rural freeway accident frequencies, Accident Analysis and Prevention 27 (30). South West Florida Water Management District (2004). Retrieved on 08/20/04, from http://www.swfwmd.state.fl.us / Surface Transportation Policy Project. (2002). Mean Streets 2002. Retrieved on 09/19/2004, from http://www.transact.org/ Lo, C.P. Yeung, Albert K.W (2002). Concepts and Techniques of Geographic Information Systems. PrenticeHall of India, New Delhi. 