Transportation Systems Planning Methods and Applications 14 Transportation engineering and transportation planning are two sides of the same coin aiming at the design of an efficient infrastructure and service to meet the growing needs for accessibility and mobility. Many well-designed transport systems that meet these needs are based on a solid understanding of human behavior. Since transportation systems are the backbone connecting the vital parts of a city, in-depth understanding of human nature is essential to the planning, design, and operational analysis of transportation systems. With contributions by transportation experts from around the world, Transportation Systems Planning: Methods and Applications compiles engineering data and methods for solving problems in the planning, design, construction, and operation of various transportation modes into one source. It is the first methodological transportation planning reference that illustrates analytical simulation methods that depict human behavior in a realistic way, and many of its chapters emphasize newly developed and previously unpublished simulation methods. The handbook demonstrates how urban and regional planning, geography, demography, economics, sociology, ecology, psychology, business, operations management, and engineering come together to help us plan for better futures that are human-centered.
Trang 114 Demographic Microsimulation with DEMOS 2000: Design, Validation,
and Forecasting
CONTENTS
14.1 Introduction14.2 DEMOS 2000
DEMOS Process • Data Used as Input Population • Validation Method and Results • DEMOS Simulation Forecasts • DEMOS Information and Communication Technology Market Penetration Forecasts
14.3 Summary and ConclusionsReferences
14.1 Introduction
As illustrated earlier in this handbook, new concepts in travel demand modeling capture and predict travel behavior more realistically than ever before At the heart of any forecasting model, however, are social, economic, and demographic data In addition, the overwhelming majority of these new travel demand models need this type of data at the person or household levels Moreover, predictions from these forecasting models are very sensitive to the accuracy of the information provided as input Although this has been recognized for more than 20 years, sociodemographic forecasting for travel demand systems
is progressing at a much slower pace than needed to support new policy initiatives and associated building efforts (Goulias, 1997)
model-Two of the reasons for this gap are the detailed microinput required by these models and the complexity involved in designing model systems that produce them With the advent of new techniques in survey methods like activity-based surveys and panel surveys, precise person and household level information
is now available to build demographic simulators Also, tremendous advancements have been made in the development of microanalytic simulation models and related programming languages, which can effectively handle the complexities involved in the human life cycle evolution Miller, in Chapter 12, presents the theoretical background and history of these models In addition, Sundararajan (2001), Kazimi (1995), and Goulias (1991) provide reviews of demographic microsimulation methods This chapter describes an application in demographic microsimulation using data from the United States that
Trang 2can be extensively applied to travel forecasting The chapter also provides validation examples that are useful for other applications too.
14.2 DEMOS 2000
DEMOS is a microsimulator of social, economic, and demographic attributes describing an individual and a household During the process of simulation an individual will be born and progressed through different life cycle stages While progressing through these life cycle stages the individual is exposed to different events in the form of death, giving birth to a child, leaving the nest and living elsewhere, marrying
or divorcing, acquiring a license and a job, buying a new vehicle, and so on All these changes are simulated probabilistically in DEMOS Most of the transition probabilities are obtained by cross-classification of the variables between two successive years of the Puget Sound Transportation Panel (PSTP) data In the process, different person and household attributes are internally generated in a conditional way For example, the age of the mother affects the age and number of children living in the household, and the income group in which a household lives affects the lifestyle and the number of vehicles owned by the household DEMOS captures all these correlations by specifying the probability of change as a function
of the household and person characteristics
The model system combines the unique concepts of microsimulation and object-oriented ming (OOP) to develop a highly modular simulation model in which the submodels can be added, altered, or replaced without affecting the other components of the model system The concepts of OOP and microsimulation go hand in hand because OOP was designed to handle complex problems for large-scale microsimulation types of applications The purpose of OOP is to model the behavior of the real-world objects (Mandava, 1999) These objects can be persons, households, vehicles, firms, highways, intersections, and so on The real world consists of these objects that are interacting and evolving over time Each object has its own state and behavior There is a one-to-one mapping of the objects in the real world and the simulated or virtual world
program-An object in a simulated world, however, is considered an abstraction of the real-world object The object has a number of methods that explain the behavior of the object simulated For example, in DEMOS each individual is simulated based on two classes, namely, PERSON and HOUSEHOLD As the names suggest, the PERSON class has all the methods describing the individual and the HOUSEHOLD class contains all the methods required to explain the household Then every individual living in a household is an object of the PERSON and HOUSEHOLD classes Any changes in the HOUSEHOLD attributes are updated to all the members of the household, reflecting the changes taking place at the PERSON level For example, if an individual dies, the household attribute and type for every person is changed appropriately Part of this more basic research work and the computer code were developed in Mandava (1999) Due to difficulties in internal validation of that earlier work, it was decided that a new computer code should be developed to incorporate some of the original ideas, but not the computer code from the earlier work; the results using the new code (DEMOS 2000) are described in this chapter Next, a description of the DEMOS simulation process and structure of the software are given Then the data used and the validation of select variables are presented The chapter concludes with a few forecasting examples and a brief summary
Trang 3and it is difficult to design interdependent processes between persons and households A diagram of longitudinal simulation is shown in Figure 14.1.
Figure 14.2 shows the flowchart of the simulation process First the input data for a particular individual are read and then he or she is progressed through the first year In DEMOS the individual’s household attributes are determined first Then the person attributes are simulated, followed by simulation of the information and communication technologies (ICTs) owned and used, and then activity–travel duration models are applied to simulate activity and travel behavior In case a child is born during the simulation, the children are simulated after the mother is simulated Also, if an eligible single person gets married during the simulation period, then the new person is simulated based on data about the member in the original database A user can provide the number of years and number of simulations (replications for each person to be simulated) as inputs
The order in which an individual is exposed to different events is shown in Figure 14.3 First the individual is checked for the event “death.” If he or she dies, then the individual is removed from the simulation Following death, based on gender, the individual is exposed to “birth.” The next event is
“child leave nest.” If the person is below 25 years age, then he or she is eligible to leave the parents’ household Based on marital status, the individual is then exposed to either “divorce” or “marriage.” In all these cases changes are made to other members’ household attributes as required Then the income group of the household is simulated, followed by the total number of vehicles in the household After the household characteristics are simulated, the person characteristics are estimated The chances of the individual holding a driver’s license are estimated, followed by the employment status and occupation type Detailed descriptions of each of these events and the data used are provided in Sundararajan (2001)
14.2.2 Data Used as Input Population
The data used in the analysis are from waves 1 (1989), 2 (1990), 3 (1991), 4 (1992), 5 (1993), and 7 (1997) of the PSTP data PSTP is the first general-purpose travel panel survey in an urban area in the
FIGURE 14.1 Longitudinal simulation of individuals (From Hain, W and Helberger, C., Microanalytic Simulation
Models to Support Social and Financial Policy, Orcutt, G.H., Merz, J., and Quinke, H., Eds., North Holland,
Amer-sterdam, 1986 With permission )
Update Attributes after Every Period
Period 1
Period 2
Events Occurring during Evolution Sequence of Simulation Final
Period
Trang 4United States The survey was conducted in the Seattle metropolitan area by the Puget Sound Regional Council in partnership with the transit agencies in the region It is a longitudinal survey in which similar measurements are made on the same sample at different times Each measurement conducted during a time point is called a wave The first survey was initiated in the fall of 1989 Murakami and Watterson (1990) provide more information regarding the origins of this panel survey.
PSTP’s three components are household demographics, person socioeconomics, and travel behavior Trip information was collected using a travel diary as an instrument The travel diary consisted of every trip a person made during two consecutive weekdays, which remained approximately the same during the panel years Each trip was characterized by trip purpose, type, mode, start and end times, origin and destination, and distance In DEMOS, the first wave serves as the input population Transition proba-
bilities were estimated from waves 1 and 2, which determine the probability of a particular event to occur
or not occur for an individual Waves 2, 3, and 4 are used to validate the model predictions In waves 1
through 4 there was a total of 1621 respondents (928 households), and in wave 5 there were 1383 respondents or 801 households Finally, wave 7 was also used to develop the information and commu-nication technology ownership and use models (Sundararajan, 2001)
In addition to the PSTP, some additional information was used from the U.S Census Bureau and the National Center for Health Statistics (NCHS) The U.S Census Bureau provides detailed data about the people and economy of the United States NCHS is the federal government’s principal vital and health statistics agency NCHS data systems include data on vital events as well as information on health status, lifestyle, and health care
In the simulation, the first wave of the PSTP data was used as the input population to DEMOS The reasons for using the first wave as the input population are that (1) the short-term forecasting ability of
FIGURE 14.2 Flowchart of simulation.
INPUT DATA
NUMBER OF YEARS (Y)
SIMULATION OF HOUSEHOLD ATTRIBUTES
HH TYPE, DEATH, BIRTH, DIVORCE, CHILD LEAVES NEST, MARRIAGE, INCOME
SIMULATION OF PERSON ATTRIBUTES AGING, EDUCATION LEVEL, LICENSE HOLDING, EMPLOYMENT STATUS, OCCUPATION TYPE
Y = Y + 1
CHILD SIMULATION
SIMULATION OF OTHER MEMBER IN CASE OF MARRIAGE
Trang 5the software can then be tested to the four remaining waves, thus allowing sufficient data for validation, and (2) Ma (1997) has done considerable research in developing the activity and travel indicators using the same data These models can be directly embedded in DEMOS and can be used to study the activity and travel pattern of individuals in the future In addition, Ma (1997) has also developed models for daily time allocation and models for daily activity and travel scheduling using the PSTP data All these models can be incorporated in DEMOS, and then the microsimulator can be extended to predict the daily activity and travel budget of the individuals in the sample Finally, the predicted activity and travel durations for different activities can be validated to the PSTP data.
Initially, the model was designed to simulate 1621 respondents The PSTP data do not provide detailed information about the children in the household, but contain information on the total number of children between the ages of 1 to 5 and 6 to 17 Based on this information, the characteristics of the children were simulated (synthetically generated) separately This resulted in a total of 2157 respondents, including the children Finally, the model database was expanded to 8628 respondents by replicating the same charac-teristics of the individuals and households The model can simulate a maximum of 25 years and 100 simulations However, by changing the size of arrays at appropriate places, the simulation period can be expanded During an average DEMOS run it takes about 10 min to simulate 10 years over 100 times for
1621 respondents, and about 60 min to simulate 20 years over 100 times for 8628 respondents, using a personal computer with 384 MB of RAM, Pentium III processor
The summaries of the socioeconomic and demographic characteristics of the persons and households for wave 1 are provided in Tables 14.1 a and b, respectively The sample has more women than men and
is relatively older, considering the fact that the average age for both men and women is around 47 years
FIGURE 14.3 Order in which individual is exposed to different events during evolution.
PERSON ATTRIBUTES
Trang 6Most of the respondents have a driver’s license About 75% of the men are employed, while only 57%
of the women are employed Out of those employed, about 44% of the men are employed as production workers or foremen, vehicle operators, service workers, and so forth About 28% of the men are profes-sionals and about 17% are managers Among women, the majority are employed as secretaries or professionals About 16% of the women are managers
At the household level, the majority of the households have a total household income between $30,000 and $70,000 Only 1.7% of the households do not have a vehicle About 46% of the households have at least two vehicles The sample has a total of 928 households, and the average household size is 2.74 per household Other wave data are documented in Sundararajan (2001)
DEMOS was developed using Microsoft Visual C++ (VC++) Version 6.0 One of the main reasons for choosing VC++ was its visual capabilities and its OOP approach VC++ allows the data to be read from the Microsoft Access database DEMOS relies on the input population from the first wave of the PSTP data, which is stored as an MS Access file All the variables are stored in two large tables
The OOP approach allows the use of classes and objects DEMOS is based on three important classes and a source file:
• CDATA: This class holds all the variables from the database The variables are established matically once the input file is specified, while creating a project file initially It is important to note that the variables in this class should be exactly the same as the variables in the database If any modifications such as adding or deleting a variable or changing a variable name are made to the database later, then a new class has to be created
auto-TABLE 14.1A Summary of Household Characteristics (Wave 1)
Household (HH) Characteristics
Percent (Number of Respondents = 1621)
Average household size = 2.74
TABLE 14.1B Summary of Person Characteristics (Wave 1)
a Minimum = 15; maximum = 89; standard deviation = 14.3.
b Minimum = 15; maximum = 90; standard deviation = 14.3.
Trang 7• PERSON: This class holds all the methods or functions that are relevant to the individual.
• HOUSEHOLD: This class has all the methods or functions that are related to each household in the data
• CDEMOSVIEW: This is a source file that can be considered the heart of DEMOS Objects are created from the PERSON, HOUSEHOLD, and CDATA classes and the functions are called from this file in the specified order Also, this file contains the relevant code used to aggregate the information and provide results
An object of these classes is created for processing the information For example, every individual is considered an object of the PERSON and HOUSEHOLD classes, identified by a row of characteristics explaining the individual
The following is a complete list of input parameters fed into the model:
• Age and gender of the individual
• Employment status and occupation type of the individual
• License-holding status of the individual
• Total number of adults and the number of children between the ages of 1 and 5 and 6 and 17 in the household
• Income category and the number of vehicles in the household
The following is a complete list of the output from DEMOS for every year:
• Number of people alive and dead by age groups
• Number of women giving birth to a child by age group and total number of children in the household before the current birth
• Number of married people divorcing in the year
• Number of children leaving the household
• Number of singles or single parents getting married
• Number of people in the respective household types
• Number of people employed and not employed, by gender
• Number of people having and not having a license, by gender
• Number of people in respective income groups and number of vehicles
• Number in respective occupation types
The events that can occur during the evolution of an individual are represented by member functions
or methods in the software The programming methodology adopted to build each method and the probabilities used are explained in Sundararajan (2001) In the majority of these methods a Monte Carlo experiment is performed A random number is drawn from a uniform distribution, and the random number is compared with the probability of the event If the random number is less than the probability, then the event occurs; otherwise, the event does not occur Also, the events are designed to occur in discrete times So any event can occur to an individual during any time period based on his or her eligibility to the particular event Additional details about the probabilities of occurrence for each event and the source data are reported in Sundararajan (2001)
14.2.3 Validation Method and Results
Validation involves testing the model’s predictive capabilities by comparing model predictions and external data In this section the forecasting ability of DEMOS is provided based on the comparison between the observed data in PSTP and the predicted results from DEMOS Comparison data are from later waves, census data, and other external information Usually, measures that check for forecasting accuracy are computed and inferences are drawn regarding the model’s predictions The main objective
Trang 8of this exercise is to check how synthetic evolution through DEMOS matches the real-world evolution Validation also gives the opportunity to check if the external probabilities used from the U.S Census, and other sources are applicable to the sample from the Puget Sound region that has been used here.
In DEMOS two different sets of probabilities are used First, the directly observable parameters in PSTP data, like license holding, employment status, number of vehicles, income groups, household types, and occupation types, are estimated using the transition probabilities from waves 1 and 2 (this is in contrast to other simulators, such as MIDAS, that estimate probability models instead) The second set of probabilities
is for the events that bring significant changes in the household attributes These are birth, death, divorce, marriage, and children leaving the nest The probabilities for these parameters have been estimated from U.S Census, NCHS, and other panel surveys So two different validation methods have been used
In the first case, where sufficient data are available to validate at disaggregate levels, the predictions from DEMOS were compared with waves 2, 3, 4, and 5 of the PSTP data Validation is made from the results obtained after simulating 1621 respondents 100 times The following parameters are computed
to test the forecasting accuracy, where error is the difference between observed and predicted values:For every year t,
• Absolute difference:
• Percent error of the predicted average:
• Mean absolute percent error:
• Mean square error:
• Theil’s inequality coefficient (Theil, 1971):
Oi i
2
2
1 2
n
11
i i n
=∑=1
Pi pjj k
Trang 9(l = 1621 for waves 1, 2, 3, and 4; l = 1383 for wave 5); m = 1, 2, 3, …, l; and om is the observed value for person m in the PSTP sample.
The mean absolute percent error (MAPE) has the observed value in the denominator So when the observed values are large, MAPE gives a small value, even for a relatively large absolute difference The mean square error (MSE) measures the average squared distance between the prediction and the observed values
It penalizes a large value more than a small value Theil’s inequality coefficient, U, is another measure that
is obtained by dividing the MSE by the sum of the squares of the observed mean It can be seen immediately that U = 0 occurs only when there is a perfect match between predictions and observations, and U = 1 results in predictions worse than no-change extrapolation Also, U does not have any upper bound
In the second case, where there are no disaggregate data to compare the predictions with tions, the predicted probabilities for the occurrence of the event are compared with the observed probabilities computed from the external data In such cases, only the absolute difference and the percent error of the predicted average are calculated Validation is made from the results obtained by simulating 8628 respondents 100 times to increase the sample size of predictions and allow the algorithms to produce “rare” events
observa-In the following discussion, a small selection from the validation results (that include birth, death, marriage, divorce, and child leaving nest) is provided Death is based on the age and gender of the individual Since the probabilities were estimated from the year 1998, DEMOS forecasts from the year
1998 were used to validate this event Table 14.2 shows the validation results The observed and predicted probabilities were converted to rates per 1000 persons, and the absolute error and percent prediction error were calculated The predictions for male children less than or equal to 1 year age is underpredicted
by almost 52%, while the predictions for female children are almost perfect For male children between the ages of 2 and 14 the predictions are less than the observed number of deaths, while for female children they are more than the observed number of deaths The model predictions for both males and females between 25 and 34 years old differ from observed rates by more than 20%
The predictions fit the original rates for both males and females between 15 and 24 years old and above 35 years old The prediction errors in these cohorts range from 0.02 to 6.44% It can be observed that the predictions are more precise for age cohorts above 35 years old than for age cohorts less than 35 years old In order to determine whether the proportion of different age groups is the same across both census and PSTP data, a marginal chi-square test was conducted The chi-square statistic was 1404.14 for men and 1338.84 for women, while the critical value with α = 0.5 and 7 degrees of freedom (df ) is 14.06 Tables 14.3 and 14.4 show the chi-square calculations for men and women, respectively The chi-square results provide more insights about the distribution of people
in different age cohorts There are about 90% less children in the age groups less than 1 and 2–14 than expected Similarly, there are about 80% less men and women in the age group 25 to 34 Since there are not enough observations in the age group, the prediction error for these cohorts is very high.The probability of a woman giving birth to a child is associated with the total number of children in the household and the age of the potential mother Also, we assumed that a mother can give birth to up
to three children Table 14.5 provides the comparisons between the observed and the predicted data It can be observed that the model predictions are always less than the observed rates In fact, the prediction error for all the cases varies from –45 to –99% This results in significant underpredictions for the number
of births occurring every year In order to determine whether there is a significant difference between the population distributions of PSTP and the U.S Census, a chi-square test was conducted The chi-square value was 520.49 with df = 3, while the critical value was 7.81 The chi-square calculations are provided in Table 14.6 So this again proves that there is a significant difference between the PSTP and the U.S Census It can be observed that there are about 67% more women in the age group of 40 to 49
in the PSTP data Since the increase in age is characterized by decrease in the probability of birth for women, this significantly affects the number of births in the simulation year Also, the number of births
is a means through which new members are added into the simulation; this underprediction may have major effects on other results Additionally, it is an indication of potential major differences between PSTP and the overall Seattle population
Trang 10TABLE 14.2 Validation of Death: Year 1998
Predicted Probability
of Death (C)
Death Rate per 1000 People in PSTP (D = B*1000)
Absolute Error for Death Rate (B – D)
Percent Error for Death Rate (D – B)/B
TABLE 14.3 Chi-Square Calculations for Validation of Death: Men
Row Marginal Total Men
Trang 11© 2003 CRC Press LLC
TABLE 14.4 Chi-Square Calculations for Validation of Death: Women
Row Marginal Total Women
For 1000 Persons (B = A
× 1000)
Predicted Probability of Birth (C)
For 1000 Persons (D =
Absolute Difference
Systematic Difference