Transportation Systems Planning Methods and Applications 08 Transportation engineering and transportation planning are two sides of the same coin aiming at the design of an efficient infrastructure and service to meet the growing needs for accessibility and mobility. Many well-designed transport systems that meet these needs are based on a solid understanding of human behavior. Since transportation systems are the backbone connecting the vital parts of a city, in-depth understanding of human nature is essential to the planning, design, and operational analysis of transportation systems. With contributions by transportation experts from around the world, Transportation Systems Planning: Methods and Applications compiles engineering data and methods for solving problems in the planning, design, construction, and operation of various transportation modes into one source. It is the first methodological transportation planning reference that illustrates analytical simulation methods that depict human behavior in a realistic way, and many of its chapters emphasize newly developed and previously unpublished simulation methods. The handbook demonstrates how urban and regional planning, geography, demography, economics, sociology, ecology, psychology, business, operations management, and engineering come together to help us plan for better futures that are human-centered.
Trang 18 Statistical and Econometric Data Analysis
CONTENTS
8.1 Introduction 8.2 Models 8.3 Type of Data and Levels of Measurement 8.4 Categorical (Discrete) Data
Choice • Nonchoice • Counting Processes
8.5 Continuous Data
Other References and Journals • What Is Next?
References
8.1 Introduction
One can look at statistical analysis of data as a medium “for extracting information from observed data
and dealing with uncertainty” (Rao, 1989, p 98) Another way of saying the same thing is to consider statistics as a group of methods that are used to collect, analyze, present, and interpret data From the
myriad of methods available to us for data analysis (Snedecor and Cochran, 1980; Spanos, 1999), regression methods are one family of data analysis that are comprehensive in their ability to address data issues, efficient in their ability to extract large amounts of information in a concise way, and widely available, because even spreadsheet software provides for facilities to estimate simple regression models Regression methods, particularly when one considers generalized regression models, can also be consid-ered as the general family of models that contains analysis of variance and a variety of other methods for the analysis of experiments as special cases Regression methods and models are also the techniques
dominating econometrics — the art and science of analyzing economic data, which, when considering
the leading textbooks on the subject, is nothing but the study of regression models (see Amemiya, 1985; Greene, 2000; Johnston and DiNardo, 1997; Pindyck and Rubinfeld, 1998; among many others)
In the previous chapters of this book different authors pointed out the richness and variety of data available for understanding and predicting travel behavior In this chapter we set out to accomplish a few humble goals:
1 Give a short introduction on statistical and econometric methods and introduce a road map of transportation data analysis, mentioning a few major milestones
2 Provide an introduction to the next steps in data analysis methods and introduce the next three chapters
3 Provide selective references to transportation planning books and articles where these methods were used successfully and where new information may be found in the future
Konstadinos G Goulias
Pennsylvania State University
Trang 28.2 Models
Statistical methods are used in a wide variety of instances in transportation planning to help us identify, study, and solve many complex practical problems For example, in the public involvement arena these methods enable decision makers, planners, and managers to make informed decisions, consistent with legislation, about the elements of their policies, plans, and programs This is accomplished by collecting data from a variety of persons and groups using a wide variety of techniques to collect qualitative and quantitative data that are combined to yield answers to specific policy and planning questions In another area, regional travel demand forecasting, regional models, and statistics are developed to build large-scale simulation models of a region or even an entire state to help identify alternate urban and regional designs, economic activity locations, and new or improved infrastructure system components After data are collected from individuals and their households, statistical models are estimated and their equations are used in a spreadsheet-like format or embedded into a computer program code Then other input data are provided to create predictions for each individual, household, or even geographical area, using these statistical models In this way statistical models are at the heart of these simulation systems and any errors, omissions, misrepresentations, or other approximations may be amplified and provide the wrong indications This is the key reason we continuously look for better and more precise and accurate model-building methods Figure 8.1 provides a pictorial representation
of a linear sequential version of this process The feedback in the figure can be used to improve the models within a given project or to provide recommendations for improvements in a sequence of projects (lagged feedback)
As expected, data analysis is needed to support actions and project development in these contexts Before moving into the details of data analysis, a digression to define a few terms is required Decision
makers take action based on knowledge about an issue In the path from data to knowledge one can
envision a sequence of transformations that lead to increase in power and confidence for the decisions
to be made using the data The sequence starts from data to information, which is the transformation
of data to something that is relevant to a specific decision problem Then the information becomes
a group of facts, when statements can be supported by the data at hand Facts, in turn, become
knowledge when they are used to complete the decision process Finally, knowledge aids actions when there is an implementation plan A statistical or econometric analysis and estimated models are the enabling devices (or vehicles) to move from data to knowledge Therefore the statistical or econometric models play a very important role in this example too because they summarize in a concise way the myriad pieces of information in a database In a way similar to that for prediction and simulation, any errors, omissions, misrepresentations, or other approximations may be amplified by the decision process and lead to the wrong actions, which in turn may cause dramatic damages to humans and their environment For this reason data analysis methods, together with operations research methods, have been considered of paramount importance in the decision sciences Both data analysis and operations research are quantitative methods, and they have a long history of development There are, however, other data collection and transformation devices and tools that have received very little systematic attention in transportation planning (Goulias, 2001), but they are beyond the scope of this chapter
FIGURE 8.1 A sequential version of a system in which regression models are estimated and used.
Data Collection
Model Estimation
Simulation/
Prediction using Equations
Verification, Validation, Interpretation, &
Policy Definition Feedback – most times is lagged
Trang 38.3 Type of Data and Levels of Measurement
Classifications and taxonomies of data analysis methods abound (e.g., Judge etþal., 1985; Jobson, 1991; Gelman etþal., 1995), and they depend on the purpose of the reviews Since we focus on regression methods and models, we will use one classification that is consistent with most textbooks and research
in travel behavior
In typical transportation surveys information is collected using qualitative or quantitative data Qual-itative data, such as the color of your car, is not computable by arithmetic operations The color is a label that informs us about a category, a group, a region, or any other classification in which a person or artifact falls These are named the categorical variables On the other side of this classification we find data that are measured on the real line and take any value on it (e.g., a ratio or proportion) There are very few examples in transportation where the data can be considered (completely) continuous because
we consider either finite countable and integer quantities, such as trips, cars, sites to visit, and so forth,
or variables that may be characterized by a limited range of variation (e.g., a proportion can take values between 0 and 1)
The presentation of the models available and resources to study and apply them is divided into two major groups: categorical (discrete) data and continuous data Each of these groups contains a variety
of other models, depending on the more specific nature of the variable, the variation of which we are trying to explain (dependent or to be explained) The variation of this variable is explained by explanatory variables and parameters that we need to estimate (the combination of which is named systematic variation) and a random variation that we cannot explain It is also important to stress that the classifi-cation we use here is not based on the variables we use as explanatory (predictor) variables They can also be of any type, and there are ways to incorporate almost any type of explanatory variable in a regression model by converting it into some sort of numerical coding that can be handled by the software (Greene (2000) provides a discussion on this; Kennedy (1998) and Pindyck and Rubinfeld (1998) also provide a good discussion and examples)
Emphasis in this chapter is placed on cross-sectional data (data collected for individuals and household
at one time point) Longitudinal data analysis methods are also starting to emerge in transportation planning, and within each section a short mention is made to this type of data analysis, emphasizing panel surveys (time series are excluded from this presentation entirely) In addition, emphasis is given
to the single equation because the issues are similar when one considers each equation of a system of equations (Pendyala in Chapter 2 of this book provides examples of multiple equations issues) In addition, Chapter 11 provides an overview of structural equations and models, which are the premier methods when one wants to consider multiple dependent variables jointly Goulias also addresses the multiple equations issue, incorporating time and social levels, in Chapter 9, on multilevel models
8.4 Categorical (Discrete) Data
For the sake of convenience, these can be further divided into choice models, nonchoice models, and models for counting processes The models for counting processes can be further divided into event count and duration models This is described in additional detail below, and key references are provided
8.4.1 Choice
In transportation planning a typical example is mode choice, when for a trip a person decides which mode to choose from among a finite set of modes A typical model in discrete choice will be a model
of the probability of choosing a mode as a nonlinear function of mode attributes, trip characteristics, and traveler demographics The usual formulation shows the (indirect) utility of each mode as the function from which we depart, and then making certain assumptions about its stochastic nature, we derive the probability shape This enables us to use specialized algorithms for estimation of the parameters driving the function Most transportation planning and modeling textbooks contain the
Trang 4basic theory and examples of mode choice (Ortuzar and Willumsen, 2001; Meyer and Miller, 2001) There is also a monograph dedicated to the theory, data collection, and experience with choice problems, edited by Gärling etþal (1998) Many milestones in the past literature and important developments are reviewed in these three books
However, classic references to discrete choice models are: for the logit model and initial formulations,
Domencich and McFadden (1975); for the probit model and a very good detail on estimation, Daganzo (1979); and for a comprehensive review with clear examples, Ben-Akiva and Lerman (1985) Many subsequent developments have improved the original algorithms in these books, and widely available software exists for model estimation (see the websites at the end of this chapter)
There are, however, many very important and more recent developments in model formulation and estimation that are expanding the scope of discrete choice models, making them by far more flexible and usable than in the past In a recent handbook, Bhat (2000a) and Koppelman and Sethi (2000) review some of these developments Bhat in Chapter 10 of this book provides the latest review of developments and identifies many important issues that have been resolved The analysis of repeated choices, however, has not received wide attention in discrete choice It is expected that with the increasing use of stated choice and preference data (Louviere etþal., 2000), we may see new develop-ments in the field
8.4.2 Nonchoice
This area is also known as contingency table analysis and cross-classification categorical data analysis and
contributes one of the richest groups of models that have immense flexibility and potential applications
in transportation Fienberg’s (1977) book is still one of the best presentations of the original methods Agresti’s (1990) textbook is one of the most comprehensive surveys linking contingency table analysis methods to logit for binary and multicategory data Two somewhat newer expositions are Powers and Xie (1999) and Le (1998)
One early application due to Goodman’s way of looking at contingency data analysis is reported in Kitamura etþal (1990); extensive use of this method was done for the design of a microsimulator in Goulias (1991) More recently, repeated observations of the same individuals have been analyzed with contingency table methods that contain latent classes (Goulias, 1999), and a connection between latent class and discrete choice has inspired some very interesting model-building work that has become available only recently (see the discussion by Golledge and Gärling in Chapter 3 in this book) Kitamura (2000) also reviews some of these models from a model formulation viewpoint
8.4.3 Counting Processes
This group of models targets event counts — the number of times an event occurs In probability, this
is the realization of a nonnegative integer random variable Counts and durations seem to be the two sides of the same coin:
An event may be thought of as the realization of a point process governed by some specified rate of occurrence of the event The number of events may be characterized as the total number of such realizations over some unit of time The dual of the event count is the interarrival time, defined as the length of the period between events (Cameron and Trivedi, 1998, p 4)
In travel behavior there are many examples for both the count and duration regression models In the case of duration models the study of activity episode durations (see Pendyala in Chapter 2) has received considerable attention in the past few years The earliest examples are cited in Kim and Mannering (1997), and a review can be found in Bhat (2000b) Counts using Poisson and negative binomial models also abound for the number of trips, activities per day, number of departures, and so forth Two of the earlier examples are Mannering (1989) and Monzon etþal (1989) Arentze and Timmermans (2000) and Ma and Goulias (1999) provide updates on count data models and more recent examples A comprehensive review can also be found in Andersen etþal (1992)
Trang 5One particular class of models emerges when the count is ordered, for example, the number of vehicles
a household owns In this case having three cars is more than having two cars, and having two cars is more that having one, etc These models are particularly attractive because they allow use of certain
estimation tricks Greene (2000) provides an extensive discussion on ordered models An earlier example
of ordered regression is reported in Kitamura and Bunch (1990), in which repeated observations are also used These models can also be used in attitudinal responses and judgments (Kim etþal., 2001)
8.5 Continuous Data
Every introduction to regression and econometrics departs from a model with a continuously varying dependent variable The usual treatment follows the same sequence with a discussion about the simple linear model and then removing each of a number of assumptions (very often referred to as the Gauss–Markov theorem assumptions) In this way, more and more complex and flexible models are built Because the majority of time and effort in introductory econometrics courses and texts is dedicated to the linear regression model, and because its assumptions are consistently violated by transportation data, the references about this model are limited here to a few key texts and emphasize transportation appli-cations of the limited dependent variable variety For linear regression models, Greene’s textbook is one
of the best and most comprehensive references The textbook also contains a very nice section on nonlinear regression models (Greene, 2000, Chap 10, pp 416–453)
When the dependent variable is limited (e.g., cannot take values below or above a value), special attention needs to be paid in computing its mean, but also in estimating the regression coefficients Again, the standard textbook is Greene (2000), but a very good reference is also Maddala (1983) The typical example of a limited dependent variable is the Tobit model (see Monzon et al., 1989) There are also simpler methods, as illustrated in the practical application in Goulias and Kitamura (1993)
8.5.1 Other References and Journals
Consistently through the past 20 years transportation researchers have utilized many of the new regression methods almost immediately after they have been developed, and very often transportation problems have offered motivation for statisticians and econometricians to develop new methods A notable example
is D McFadden, who won the Nobel Prize in 2000 The methods of the other person who won the Nobel Prize for econometric contributions in 2000, J Heckman, are also used very often in transportation data analysis Transportation journals and conference proceedings always contain papers and chapters that will either provide a review of new methods or apply a new method to a transportation problem When seeking these new developments, one should examine the following:
Transportation Research Record — A journal of the Transportation Research Board
Transportation Research — A Pergamon international journal that is divided into parts dedicated to a
specific focus
Transportation — A Kluwer international journal
The proceedings of the conferences mentioned in Chapter 1 of this book are also very good sources There are also many websites with extensive treatment on statistical and econometric models The two sites with the best and most up-to-date links for statistical and econometric software are:
http://www.feweb.vu.nl/econometriclinks/software.html
http://www.fas.harvard.edu/~stats/survey-soft/survey-soft.html
8.5.2 What Is Next?
Pendyala in Chapter 2 provided a state-of-the-art presentation of more sophisticated and informative models in travel behavior with the stochastic frontier models, mixtures of discrete and continuous dependent variable models, and the duration models In discrete choice, Bhat, in Chapter 10, discusses
Trang 6other directions focusing on microeconometric data Goulias, in Chapter 9, illustrates extensions of the linear regression that incorporate multiple hierarchies in the data, multiple equations, and multiple ways to incorporate randomness Golob’s review (Chapter 11) also provides another set of directions, along which we will see new advances Finally, another direction of data analysis that we are starting
to see develop is in the nonparametric data analysis methods, such as the example in Kharoufeh and Goulias (2002)
References
Agresti, A., Categorical Data Analysis, Wiley, New York, 1990.
Amemiya, T., Advanced Econometrics, Harvard University Press, Cambridge, MA, 1985.
Andersen, P.K et al., Statistical Models Based on Counting Processes, Springer-Verlag, New York, 1992 Arentze, T and Timmermans, H., Albatross: A Learning Based Transportation Oriented Simulation System,
European Institute of Retailing and Service Studies, Technical University of Eindhoven, Nether-lands, 2000
Ben-Akiva, M and Lerman, S.R., Discrete Choice Analysis, MIT Press, Cambridge, MA, 1985.
Bhat, C.R., Flexible model structures for discrete choice analysis, in Handbook of Transport Modelling,
Hensher, D.A and Button, K.J.,þEds., Pergamon, Amsterdam, 2000a, pp 71–89
Bhat, C.R., Duration Modeling, in Handbook of Transport Modelling, Hensher, D.A and Button, K.J.,
Eds., Pergamon Amsterdam, 2000b, pp 91–110
Cameron, A.C and Trivedi, P.K., Regression Analysis of Count Data, Cambridge University Press, U.K.,
1998
Daganzo, C., Multinomial Probit: The Theory and Its Application to Demand Forecasting, Academic Press,
New York, 1979
Domencich, T and McFadden, D., Urban Travel Demand: A Behavioral Analysis, Elsevier/North Holland,
Amsterdam, 1975
Fienberg, S.E., The Analysis of Cross-Classified Categorical Data, MIT Press, Cambridge, MA, 1977 Gärling, T., Laitila, T., and Westin, K., Theoretical Foundations of Travel Choice Modeling, Elsevier,
Amster-dam, 1998
Gelman, A et al., Bayesian Data Analysis, Chapman & Hall/CRC Press, Boca Raton, FL, 1995.
Goulias, K.G., Long-Term Forecasting with Dynamic Microsimulation, unpublished Ph.D dissertation, University of California, Davis, 1991
Goulias, K.G., Longitudinal analysis of activity and travel pattern dynamics using generalized mixed
Markov latent class models, Transp Res B, 33, 535–557, 1999.
Goulias, K.G., On the role of qualitative methods in travel surveys, workshop report on qualitative methods Q-5, International Conference in Transport Survey Quality and Innovation, Kruger National Park, South Africa, August 5–10, CD-ROM, 2001
Goulias, K.G and Kitamura, R., Analysis of binary choice frequencies with limit cases: Comparison of
alternative estimation methods and application to weekly household mode choice, Transp Res B Methodol., 27, 65–78, 1993.
Greene, W.H., Econometric Analysis, 4th ed., Prentice Hall, Upper Saddle River, NJ, 2000.
Jobson, J.D., Applied Multivariate Analysis, Vols 1 and 2, Springer, New York, 1991.
Johnston, J.þand DiNardo, J., Econometric Methods, 4th ed., McGraw-Hill, New York, 1997.
Judge, G.G et al., The Theory and Practice of Econometrics, 2nd ed., Wiley, New York, 1985.
Kennedy, P., A Guide to Econometrics, 4th ed., MIT Press, Cambridge, MA, 1998.
Kharoufeh, J.P and Goulias, K.G., Nonparametric identification of daily activity durations using Kernel
density estimators, Transp Res B Methodol., 36, 59–82, 2002.
Kim, T., Koza, S.A., and Goulias, K.G., Analysis of the resident component in PennPlan’s public involve-ment survey: Survey overview and item nonresponse selectivity issues, paper preprint 01-2772,
Transp Res Rec., 1780, 145–154, 2001.
Trang 7Kim, S and Mannering, F., Panel data and activity duration models: Econometric alternatives and
applications, in Panels for Transportation Planning: Methods and Applications, Golob, T., Kitamura,
R., and Long, L., Eds., Kluwer, Boston, 1997, pp 349–373
Kitamura, R., Longitudinal methods, in Handbook of Transport Modelling, Hensher, D.A and Button,
K.J.,þEds., Pergamon, Amsterdam, 2000, pp 113–128
Kitamura, R and Bunch, D.S., Heterogeneity and state dependence in household car-ownership: A panel
analysis using ordered-response Probit models with error components, in Transportation and Traffic Theory, Koshi, M., Ed., Elsevier/North Holland, Amsterdam, 1990, pp 477–496.
Kitamura, R., Nishii, K., and Goulias, K.G., Trip chaining behavior by central city commuters: A causal
analysis of time–space constraints, in Developments in Dynamic and Activity-Based Approaches to Travel Analysis, Jones, P., Ed., Avebury, Aldershot, U.K., 1990, pp 145–170.
Koppelman, F.S and Sethi, V., Closed-form discrete-choice models, in Handbook of Transport Modelling,
Hensher, D.A and Button, K.J., Eds., Pergamon, Amsterdam, 2000, pp 211–225
Le, C.T., Applied Categorical Data Analysis, Wiley, New York, 1998.
Louviere, J.J., Hensher, D.A., and Swait, J.D., Stated Choice Methods: Analysis and Applications, Cambridge
University Press, Cambridge, U.K., 2000
Ma, J.þand Goulias, K.G., Application of Poisson regression models to activity frequency analysis and
prediction, Transp Res Rec., 1676, 86–94, 1999.
Maddala, G.S., Limited Dependent and Qualitative Variables in Econometrics, Cambridge University Press,
U.K., 1983
Mannering, F., Poisson analysis of commuter flexibility in changing route and departure times, Transp Res B, 23, 53–60, 1989.
Meyer, M.D and Miller, E.J., Urban Transportation Planning, 2nd ed., McGraw-Hill, Boston, 2001 Monzon, J., Goulias, K.G., and Kitamura, R., Trip generation models for infrequent trips, Transp Res Rec., 1220, 40–46, 1989.
Ortuzar, J.þde D and Willumsen, L.G., Modelling Transport, 3rd ed., Wiley, Chichester, U.K., 2001 Pindyck, R.S and Rubinfeld, D.L., Econometric Models and Economic Forecasts, 4th ed., McGraw-Hill,
Boston, 1998
Powers, D.A and Xie, Y., Statistical Methods for Categorical Data Analysis, Academic Press, New York, 1999 Rao, C.R., Statistics and Truth: Putting Chance to Work, International Co-Operative Publishing House,
Fairland, MD, 1989
Snedecor, G.W and Cochran, W.G., Statistical Methods, 7th ed., Iowa State University Press, Ames, 1980 Spanos, A., Probability Theory and Statistical Inference: Econometric Modeling with Observational Data,
Cambridge University Press, U.K., 1999