1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Transportation Systems Planning Methods and Applications 09

22 170 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 22
Dung lượng 3,08 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Transportation Systems Planning Methods and Applications 09 Transportation engineering and transportation planning are two sides of the same coin aiming at the design of an efficient infrastructure and service to meet the growing needs for accessibility and mobility. Many well-designed transport systems that meet these needs are based on a solid understanding of human behavior. Since transportation systems are the backbone connecting the vital parts of a city, in-depth understanding of human nature is essential to the planning, design, and operational analysis of transportation systems. With contributions by transportation experts from around the world, Transportation Systems Planning: Methods and Applications compiles engineering data and methods for solving problems in the planning, design, construction, and operation of various transportation modes into one source. It is the first methodological transportation planning reference that illustrates analytical simulation methods that depict human behavior in a realistic way, and many of its chapters emphasize newly developed and previously unpublished simulation methods. The handbook demonstrates how urban and regional planning, geography, demography, economics, sociology, ecology, psychology, business, operations management, and engineering come together to help us plan for better futures that are human-centered.

Trang 1

9 Multilevel Statistical Models

CONTENTS

9.1 Introduction9.2 The Basic Model9.3 The Basic Multilevel Model

Data Example 1: Time Allocation to Leisure Activities

9.4 Multivariate Multilevel Model

Data Example 2: Time Allocation to Activities and Travel

on Different Days

9.5 Summary

Further Reading

AcknowledgmentsReferences

9.1 Introduction

Recent travel demand forecasting systems and related data analyses target individual and household variations in behavior not only as functions of individual and household characteristics, but also as functions of other variables to capture the effect of social and geographical context on individual and household behavior As discussed in previous chapters new conceptual and theoretical ideas in travel behavior are increasingly offered, providing analytical frameworks for human behavior in geographic space, social space, and time To test hypotheses within these frameworks and to capture the relationships within and across different dimensions (levels), suitable data analytic techniques are needed

Assuming we are able to identify and clearly define levels of social groupings, such as the family, the neighborhood, or the professional group, our interest centers on explaining individual behavior not only

as a function of personal motivational factors, but also as a function of group influence, such as task allocation(s) and role assignments within a group (e.g., the household) At a more macro (aggregate) level we are also interested in the role personal factors play in shaping group behavior(s) Techniques to accomplish this must support behavioral theories that aim to explain behavior using factors that influence behavior at the same level of the behavioral unit of analysis (named micro-to-micro relationships), at one level higher (more aggregated) from processes taking place (named macro-to-micro effects), and at one level lower from processes taking place (named micro-to-macro effects)

Data analyses that include variables from different levels (e.g., a person, household, neighborhood, city, state) are inherently operating at multiple levels These are called multilevel analyses because they examine the relationships among variables that are defined at different and multiple levels Figure 9.1

provides an example of a hierarchy of this type Each observation is a time point at which a person’s behavior has been recorded or reported (within the time dimension we can have another hierarchy of Konstadinos G Goulias

Pennsylvania State University

Trang 2

different temporal entities, indicated in Figure 9.1 as different bullet items within the box labeled “time”) All these observations are about each person, and each person belongs to a household for which we have recorded many characteristics Then each household resides in a neighborhood (indicated as a census tract for convenience) for which we have also measured characteristics, such as number of locations where activities can be pursued, accessibility indicators for each mode among the centers of these tracts, and so forth Studies that do not account for the simultaneous influence of variables at multiple levels may lead to ecological fallacy or atomistic fallacy (for a more extensive discussion on this, see the review

in Hox (1995), who also provides the earliest known scientific papers on multilevel theories and related conceptual risks and fallacies) The consequence of these fallacies is wrong inference about the effects of policies and the relationships among observed behavioral variables Multilevel analyses, however, require data that are informative enough to unravel some of these relationships

Multilevel analysis in travel behavior and transportation planning research is greatly facilitated by the availability of a specific type of travel behavior data Data widely available to transportation planners are household survey data that contain a convenient natural hierarchy, allowing the study of context on behavior Some household surveys contain travel or activity diaries of all the members in a household Other surveys contain only a subset of the household members that satisfy some sample selection condition (e.g., in an American survey all adults and persons older than 15 years are included in the survey diary because 16 is the age at which individuals are allowed to start driving in the United States) We may also have repeated observations of each person’s behavior For example, when the survey contains a travel diary for a single day, the repetition is on behavioral indicators by different times in a day (i.e., each activity episode and trip made by the observed person) When the survey contains diaries from multiple days, the repetitions are the survey days and within each day the multiple trips made by the individual When the survey is a panel, the hierarchy becomes even more interesting because the temporal repetition contains different years at which persons are interviewed Within each year it also contains different days, and within each day different episodes (the trips and activities with clear start and end times) In this way time may be considered to contain three subdimensions (minutes and hours within a day, each individual day, and each individual year) along which change (variation from minute to minute, day to day, year to year) takes place, allowing the study of the dynamics of behavior in more detail In addition, if the households

FIGURE 9.1 Pictorial representation of one possible data hierarchy.

Number of children in the household Number of cars in the household Household income

Number of adults

Gender Age Employment Education

Wave/year in panel Month of year Day of the week Time of day

Trang 3

can be grouped into other categories (e.g., based on the sampling criteria or area of residence), we may have yet another group of hierarchical dimensions based on spatial organization One of the objectives in analyzing data of this sort is to decompose variation of a given indicator of interest (behavioral variable) into multiple dimensions to study human behavior (as shown in matrix form in Figure 9.2) For example,

we would like to know if the bulk of variation is due to reasons within a person that change over time (e.g., taste, mood, and so forth), personal characteristics that we can observe (e.g., age, gender, employ-ment), or even more stable factors such as personality We would also like to know if a portion of this variation is due to household influences (e.g., task allocation within a household) and area of residence characteristics (e.g., density of locations at which leisure activities can be pursued)

Multilevel regression models are statistical techniques that (1) account for the data hierarchy, and (2) allow us to develop functions explaining the relationships among the different variables from the different

levels in the hierarchy These regression models consider one or more variables as the dependent variables, the variation of which we are trying to explain using independent (explanatory) variables When depen-

dent variables are depicting the behavior for each individual (the person) in a given sample, key atory variables are each individual’s known characteristics (e.g., age, gender, education, employment, race) The relationships, usually represented by regression coefficients, between the explanatory variables and the dependent variables may depict all three types of relationships discussed above (micro to micro, macro to micro, and micro to macro) In most regression equations we find dependent and independent variables defined at the same level, depicting micro-to-micro relationships To reflect and capture the effect of a higher-level social unit, e.g., the household, on the individual’s behavior, we can also include explanatory variables that describe the household itself, such as number of children by age group, number

explan-of vehicles owned and available, and number explan-of employed persons among other variables This ship represents the macro-to-micro relationships (the effect of the household as a unit on each household member’s behavior) Information from units that are below (within) the individual could also be included using explanatory variables One example is when we have data from the repetition of observation of this same individual (e.g., behavior at different days of the week) or the activity and travel episodes of the same person within a day When behavior is explained by variables depicting these “within a person” behavior and we formulate models at the person level, we have an example of a model capturing a micro-to-macro relationship These techniques provide a tool to quantify social context effects while at the same

relation-time capturing the relationships among factors within the same level For this reason, the terms multilevel statistical model and multilevel regression model are used to label the techniques.

Multilevel regression techniques are superior to single-level regression models in four distinct ways First, behavioral models can be improved if proper consideration of the contexts in which people act is reflected in the models This is the regression analog (but not derived from it directly) to the activity theory approach in Chapter 1 For example, person-based models need to consider observed and unob-served within-household interactions In fact, travel behavior researchers are developing theories and testing hypotheses about the interaction of persons within households An early example and a compre-hensive literature review about persons and their role in a household can be found in Townsend (1987)

FIGURE 9.2 Joint distribution of y values.

Trang 4

As argued and demonstrated by van Wissen (1989) and much later by Golob and McNally (1997) using structural equations models, the interaction in time use decisions within a household between two persons is of paramount importance in modeling travel behavior Similarly, behavioral understanding can be improved when joint participation in activities is studied in more detail as Gliebe and Koppelman (2002), focusing on a two-person time allocation example, demonstrate In addition, as shown in Chandrasekharan (1999) and Chandrasekharan and Goulias (1999), consideration of joint activity par-ticipation and travel not only improves understanding of behavior, but also yields better estimates of some quantitative indicators (e.g., vehicle occupancy) that are used in the most popular regional fore-casting models worldwide.

Second, model misspecification (the bias introduced by excluding important explanatory factors of behavior) of these models can be attenuated when we incorporate observed and unobserved heterogeneity using models with more informative random structures For example, in models of the number of trips

a person makes in a day, including variables that describe the person’s household may capture the effects

of the role each person plays within a household (for a comprehensive definition of roles, see Townsend (1987)), diminishing the negative effects of excluding significant explanatory factors that may have not been measured during the survey process

Third, for forecasting model systems that use models in which behavioral dynamics are explicitly modeled, observed and unobserved longitudinal variation should be accounted for and explicitly repre-sented because persons with the same characteristics may follow different paths of behavioral change (for an example using latent class models in transportation, see Goulias (1999a)) As demonstrated in another paper (Goulias, 2002) multilevel models applied to the repeated observation of the same persons over time (panel survey data) allow the building of trajectories of change that, in turn, can be used as building blocks of a forecasting model system

Fourth, the usual single-level regression model assumption of independent random error terms implies that the observations used to estimate the model parameters are independent, given the explanatory variables in the regression model When groups of observations are from the same household, and when we do not have access to all the variables that explain the behavior of each person, it is likely that the error terms in this model are correlated This is similar to serial correlation (i.e., data points are correlated over subsequent time points) and spatial correlation (i.e., data points are correlated because they are from neighboring points) Neglecting this social correlation in regres-sion estimation may lead to larger standard errors of the coefficient estimates (Kennedy, 1995), increasing the risk to exclude significant explanatory variables from our model Intuitively, this inef-ficiency is due to our mistake not to consider the additional information contained in the data, which are the relationships within groups of observations

In the remainder of the chapter the basic regression model and its variants are described Then the basic multilevel model is provided with a numerical example This is followed by a section presenting

a multiequation (multivariate) multilevel model and another numerical example to illustrate pretation and use of this approach The chapter ends with a brief summary and a section on further reading material

inter-9.2 The Basic Model

Suppose we have set out to study the amount of time (y) a person j allocates in a day to some particular type of activity (e.g., leisure) as a function of a person’s characteristic (x) Also assume that we have observed each person at multiple time points and have stored this information in our database.Before proceeding with a more detailed presentation of the multilevel models, it is worth pointing out a key idea that underlies research and empirical data analysis work using regression models This

is the idea of independent and identically distributed random variables in the context of linear regression Let us focus on the random variables y1, y2, …, yn with a joint distribution f(y1, y2, …, yn; θ) θ contains all the usually unknown parameters in a regression model (µ and σ values) Let us name

Trang 5

the joint distribution above F (if we would assume that it is normal, we would write N) In matrix format we can write

(9.1)

Equation (9.1) contains n + 1/2 (n(n + 1)) unknown parameters, and usually we have only n vations from which to estimate these parameters When we add the assumption that all the n observations are independent (they do not vary jointly, but vary independently) we obtain

obser-(9.2)

Equation (9.2) requires us to estimate the n µ values and the σ11 to σnn variances If the n observations are persons from a random sample and they do not coordinate their activities in a day (or at least the day of the interview), the assumption of independent observations is reasonable; otherwise, we are neglecting a relationship by imposing zero covariances

We can simplify Equation (9.2) even further if all y values are also identically distributed with mean

µ and variance σ2:

(9.3)

This time all we need to estimate is one µ and one σ This spectacular reduction in unknown parameters

to be estimated (moving from Equation (9.1) to Equation (9.3)) is also one of the practical advantages

of the usual simple unilevel linear regression model Equation (9.3), however, is too restrictive and does not contain the relationship we are interested in, which is the link between X and Y (capital letters are used here to indicate vectors and matrices)

The relationship between Y and X using linear regression can be written as:

(9.4)

For example, the variable xj represents the age of person j and the variable ε represents a random fluctuation with mean zero and a given amount of variance (σ2) When a person’s age is zero, the intercept

yy

yF

n n

1 2

1 2

,

yF

1 2

1 2 11 22

,

µ

σσ

yF

n

1 2

2 2

,

µ

σσ

Trang 6

β0 represents the amount of time allocated to leisure When the person is 20 years old, his or her expected value of the amount of time allocated to leisure in a day will be β0 + 20 β1 This can also be written in the following format:

(9.5)

The model in Equation (9.5) is the same as the simple linear regression model based on which

we built a series of other regression models When one compares Equation (9.4) with Equation (9.5), the increase in the number of additional parameters to estimate is only one (the β0, and β1, instead of just µ) This, however, can make Equation (9.5) very flexible when additional x values are added In fact, most linear regression models we encounter in travel behavior analysis contain many more x values as explanatory variables, and each additional x increases the number of parameters to estimate by one unit, while at the same time it captures another piece of the variation

in y

A small digression is needed here to discuss centering because it is used in many multilevel models

We can also rewrite the linear regression model by transforming x as a deviation from the mean:

(9.6)Interpretation of the β coefficients is somewhat different in Equation (9.6) If this person has an age equal to the mean, indicated by , then β0 is the expected amount of time this person allocates

to leisure in a day β1 represents the effect of a unit increase in age on leisure allocation (i.e., if age is measured in years, it represents the difference in time allocation between two persons of a year difference in age; this may not be the same as the effect of aging by 1 year) Note that Equations (9.4) and (9.6) are regression equations capturing the microlevel effects of age on the time allocated to leisure by a person, which is a microlevel dependent variable According to these two equations, the effect of age on leisure is the same among persons because it does not change with a person’s index Another variant often used in multilevel model building is one that allows regression coefficients to change among the observations at hand

In fact, one can increase the flexibility of this model by allowing the base time allocation to be different among persons This can be written as:

(9.7)This model is able to capture the differences among persons as differences among the β0j values that in essence shift the regression line up and down with each individual observation Equation (9.3) is not very different from the classic linear regression model in econometrics When data are available, consistent and efficient estimates of the regression coefficients in this equation can be obtained using ordinary least squares However, a problem may arise in interpreting the intercepts

as representations of the population when we do not include all the population units, as is the usual

practice in travel behavior In addition, we need to estimate as many coefficients as the individuals

in the study, which means that we need to have more observations than the j = 1, …, n persons (the usual rule of thumb in regression models is that we should have at least ten observations per coefficient estimated) One way to resolve this is by assuming that the intercept is a randomly varying effect

yy

yF

xx

x

1 2

0 1 1

0 1 2

2 2

,

Trang 7

among the n observations, resulting in the random effects model The usual added assumption is for this random effect to have a variance that is the same among observations (in this way, both the random intercept and the random residual are assumed to have a variance that does not change with each observation — called homoskedastic random error term) Multilevel models are able to release this homoskedasticity assumption to yield richer and more informative specifications; random error terms that are not homoskedastic are called heteroskedastic.

Further, we can imagine the effect of age on time allocation to also vary with each individual If we have no information about systematic ways in which this effect may vary, we can assume that the β1

values are randomly varying A typical way of expressing this variation is the following:

(9.8)

(9.9)The γ values in the above equations represent the mean effects around which each individual’s behavior differs according to a randomly distributed variable (v for the intercept and u for the slope) The time allocation equation can then be written as:

(9.10)Equation (9.10) shows the fixed and random parts of the model The first two terms containing the coefficients γ are the intercept and slope of the fixed part The last three terms within the brackets contain the three random components of the random part If we were to neglect the complex nature of the random part, assume that it was made of independent identically distributed random variables, and apply ordinary least squares to estimate the γ values, we would obtain consistent parameter estimates but inconsistent standard errors of coefficient estimates, and most likely inefficient estimates

In econometrics, the study of this type of models has focused on the issues raised by Balestra and Nerlove (1966) in their demand for energy study among the American states, providing a first formulation

of a model with random effects In terms of the levels we discuss here, each state is observed at different time points (years) leading to a two-level data hierarchy The number of observations in this case is the number of states in their study, N, times the calendar time points, T The Balestra and Nerlove study also introduced a plethora of other models that go beyond the focus of this chapter A key contribution, however, to the analysis of data with hierarchies was the demonstration that observations of this type contain information that may not be captured by the observed explanatory variables in a regression model, and for this, requiring the use of information in their heterogeneous random error terms In this way unobserved heterogeneity, in its heteroskedasticity form, is viewed as a source of additional infor-mation instead of a problem to eliminate

Subsequently, in another fundamental contribution, that Swamy offered 30 years ago, emphasis was given to random coefficients, as in Equation (9.4) (creating the random coefficient regression model) This type of model is discussed extensively with other random coefficient models in Swamy (1974) In addition, different versions of Equation (9.4) that are based on repeated observations of the same groups of persons (known as panel data) led to a populous group of methods known in econometrics as models of panel data (Greene, 1997), econometric analysis of panel data (Baltagi, 1995), and analysis of panel data (Hsiao, 1986) The emphasis in this type of analysis is given to the individual (a person, firm, or state) and discrete time points at which the behavioral unit is measured or surveyed A review book on models and methods for panels with many transportation examples from around the world is the edited volume by Golob etþal (1997) In an earlier experiment using a database similar to the one used in this chapter, Liao (1994) identified, discussed, and illustrated some estimation issues for the random coefficient model and the need for data variation within groups (e.g., for each person across time) when estimating models of this type Similar issues

Trang 8

are key to the multilevel models as well, and we will discuss them later in the chapter It should be noted, however, that instead of using the typical econometric approach to model building, the following section describes multilevel models using conventions and an exposition that has been used in applied statistics.

9.3 The Basic Multilevel Model

The multilevel models described here are more general than panel data models because they allow many more dimensions than the two dimensions, individual unit and time, of the panels Unlike more tradi-tional multilevel presentations, we will start with panel data models and then move to more complex multilevel models, but first let us define a few terms that are specific to multilevel models

The models and the type of regression analysis used here are known by different names in different fields of research for different reasons For example, they have been named random coefficient models (Longford, 1993; Greene, 1997, p 669) because emphasis is given to the varying nature of the regression coefficients and their specific pattern of variation, as shown in Equations (9.8) and (9.9) They have also been named multilevel models (Goldstein, 1995) to emphasize the measurement of the dependent variable at different levels (e.g., income can be measured for each person, but also as a household or

neighborhood average or median value) Another group of researchers name these models mixed models

(Searle etþal., 1992) to emphasize the presence of fixed and random coefficients in the same regression

model Bryk and Raudenbush (1992) use the name hierarchical models to indicate that the data structures

are from hierarchies Some of the labels in this family of models indicate subtle but important differences

revealing the researchers’ modeling emphasis All models share one element — the arrangement of data into groups and the exploitation of group membership to unveil hidden aspects of data variation However,

some of these labels are also confusing because some adjectives in the labels have also been used to indicate different classes of models or their properties For example, Searle etþal (1992) use the term hierarchical model to indicate a model that is specified in a sequence of hierarchical stages In addition,

the term mixed model can be easily confused with the term mixture in statistics, indicating a different

family of statistical models

To avoid confusion and to be consistent with a few of the key references used here and the software

employed to estimate the examples in this chapter, the term hierarchical data is used to indicate the nested nature of the data at hand and multilevel models to indicate:

1 Models containing an explicit recognition in their formulation of the hierarchical, multiple-level, and nested structure of the data to analyze

2 Model specification that uses three groups of regression components in the same regression model (fixed coefficients, random components of coefficients, and random error term residual)The first group, fixed coefficients, assumes constant sensitivity to explanatory variables among the units

of analysis, representing the mean effect of an explanatory variable on the dependent variable (we use the Greek letter γ for these coefficients) The second group, random coefficients, assumes a random deviation around this mean as in Equations (9.4) and (9.5) (we use u, v, and w to indicate these components) The third group is the usual random error term(s) of the regression equation (we use the Greek letter ε for this component) If we want to examine many dependent variables in a system of equations, we will have as many random errors (ε values) as the dependent variables

To demonstrate the differences with other regression models, we rewrite the regression equation in a somewhat different way by introducing a second index and eliminating the centering (deviation from the mean) of the explanatory variable Assume we have two levels: persons for whom we use the index

j Each person was observed at a few time points, and for the time points we use the index i

(9.11)

y = β0 x +0 β x1 +γ2x2 +γ3x3 +γ4x4

Trang 9

Equation (9.11) indicates that we have five explanatory variables The variable x0ij is the equivalent

of the intercept (constant) in regression models that takes the value of 1 for all observations when

we consider the person level alone As we will see below, it is its random coefficient that contains some interesting components A second explanatory variable (x1ij) also has a random coefficient that changes with the person index (randomly varying across persons) The other three explanatory variables have coefficients γ that are neither functions of other variables nor randomly varying (i.e., they take one single unknown value for each observation) In addition, the two random coefficients can be written as

(9.12)

(9.13)Equation (9.12) indicates that all observations have one common fixed intercept γ0, a randomly varying intercept among persons (that we also assume has E(vj) = 0 and Var(vj) = σ2) and a randomly varying component with time and with persons (that we also assume has E(εij) = 0 and Var(εij) = σ2

ε), which is the usual regression residual Therefore, E(β0ij) = γ0

Equation (9.13) contains two components, the fixed slope γ1, indicating that all observations have one common slope (multiplier) for variable x1, but that they differ in their behavior according to a random

u (with E(uj) = 0 and Var(uj) = σ2

u) In addition, the random part of this slope and the random part of the intercept are assumed to be correlated with Cov(vj uj) = σvu Note that in Equations (9.11) to (9.13)

we have modeled the variation in behavior among persons, and the only entities varying with time (and within persons) are the x values and the residual ε

In the example here the model defined by Equations (9.11) to (9.13) is called model C (for reasons that will become clear later) In Equation (9.13), we can define the random slope as fixed (β1j = γ1), eliminating its randomly varying part with persons and the correlation with the random component of the intercept (u) This is called model B If we eliminate all explanatory variables (x values), we obtain

a third model (model A) that contains only an intercept defined by Equation (9.12) The parameters to

be estimated for each model are:

The estimates from model A can be used to compute a useful quantity called the intraclass correlation,

ρ, using the following (Hox, 1995):

2

v v

Trang 10

overview and a discussion about software and Internet websites with additional information (see also, the end of this chapter) van der Leeden (1998) also mentions the use of Bayesian techniques and one application

of a data augmentation technique (see also Schafer, 1999) to the estimation of multilevel models

In this chapter, Goldstein’s (1995) iterative generalized least squares (IGLS) approach is used; it separates estimation of the fixed from the random parameters at different steps in sequence repeatedly until no change is observed in the estimates in subsequent steps Goldstein (1995) has also improved the IGLS algorithm when based on FIML using a modified IGLS called RIGLS In fact, this method provides standard errors of coefficient estimates that are conservative (larger), and for this, leading to more parsimonious models In a series of experiments performed in a few studies using this same data set and reported elsewhere (Goulias, 2002), IGLS and RIGLS gave similar results and identical conclusions about the significance of variables

For each estimate standard errors can also be computed (e.g., as an output of a maximum likelihood estimation) and hypotheses tests about their significance performed A general agreement seems to exist

in the multilevel literature that we can test for significance of the fixed coefficients using a test that is based on the ratio between a coefficient estimate and its estimate of its standard error (also known as the Wald test in honor of the first developer in the 1940s) Bryk and Raudenbush (1992) suggest the use

of a t-test instead of a z-test In practice, however, and because in the travel behavior examples we have

a large number of observations, the two tests would yield very similar indications about significance

In contrast, testing for significance of the random parameters (variances) is not as straightforward and simple, particularly for variances that are very small As explained by Bryk and Raudenbush (1992) and Hox (1995), a solution to hypothesis testing for the significance of these variances is to use a test based on the likelihood ratio (the same ratio used in many other models such as the discrete choice models in travel behavior when models can be considered to have a nested specification structure).Maximum likelihood estimation is the derivation of parameter estimates by finding the maximum of the function called likelihood using an iterative method Most maximum likelihood algorithms produce

a series of iterations that are stopped based on a rule of convergence to a solution, which is the maximum

of the likelihood, beyond which no improvement in the parameters and value of the maximum are observed (e.g., computing numerically the first derivatives and finding them to be very close to a computable zero) At the end of the iterations that find the maximum of the likelihood function, the deviance is computed and defined as –2 logarithm of the likelihood evaluated at the maximum If we estimate two models that have the same specification in terms of explanatory variables, but differ in the number of variances (let us assume that one model has k variances to be estimated and the other model has k–q), then each model will yield a deviance that we will indicate as Dk and Dk–q, respectively The difference of these two quantities is χ2 distributed with degrees of freedom equal to q If the inclusion

of the q parameters leads to a significantly better goodness of fit (a deviance that is much smaller in a statistical sense), then we should prefer the model with the q additional parameters; otherwise, we should prefer its competitor with k–q parameters

9.3.1 Data Example 1: Time Allocation to Leisure Activities

In this chapter data from the one and only current (general-purpose) panel survey specifically designed for transportation planning in the United States are used This survey, called the Puget Sound Transpor-tation Panel (PSTP) and described in Murakami and Watterson (1990), Goulias and Ma (1996), and Murakami and Ulberg (1997), is a unique source of data for regional travel demand forecasting Unfor-tunately, its potential has not been put to good use in practical applications yet The Puget Sound Regional Council has plans, however, to use models derived from this data set in its regional forecasting model system In addition, the recent addition of questions about information technology and traveler infor-mation use leads to unprecedented possibilities for studying traffic management strategies in Seattle and the surrounding region, as illustrated in this chapter in a later example

A panel is a survey administered repeatedly on the same observations over time Each survey, conducted

at each point in time (in PSTP a year of interview), is called a wave PSTP contains three groups of data:

Trang 11

household demographics, people’s social and economic information, and reported travel behavior in a 2-day travel diary (additional details are available in Goulias and Ma (1996) for the first four waves of PSTP) The data used in this paper are from the first five waves of PSTP conducted in 1989, 1990, 1992,

1993, and 1994 These travel diaries cover a period of 48 h Each person was interviewed on the same 2 days in all waves, and the travel diary includes every trip a person made during these 2 days For each trip reported we have the trip purpose, mode used, departure time, arrival time, travel duration minutes and miles, origin, and destination Activity participation information can be derived for all out-of-home activity engagement events using the trip purposes and for a portion of the in-home activities pursued between the first departure from home (e.g., in the morning) and the last arrival at home (e.g., in the evening) The duration of each activity episode (d) is computed by the difference between the start time

of the next trip (t + d, departure from a given location) and the end time of the current trip (arrival at

a given location, t), giving the sojourn time at an activity location (d)

In the first few waves of the PSTP database, trip purposes are classified into nine different types: work, school, college, shopping, personal business, appointments, visiting (other persons), free time, and home during the day In past analyses by Ma (1997) using this same data set, activities were grouped in subsistence (work, school, college), maintenance (shopping, personal business, appointments), leisure (visiting, free time, home during the day), and travel In this example we use data from five time points (first day of each wave) for 1201 persons in 758 households whose characteristics are provided in Table 9.1 For simplicity, only the stayers (persons who participated in all five waves) are used for model estimation

in this example However, the models presented in this chapter do not require an equal number of observations for each person

A first group of three two-level models (models A, B, and C) are estimated using the data above to illustrate a few aspects of multilevel modeling Table 9.2 shows the estimates (fixed and random) for these three models At each level, time, and person, we have level-specific variance–covariance terms (the

σ values for ε, u, and v in model A) The significance of the elements in each of the three matrices can

be tested using goodness-of-fit measures based on the deviance, which is the difference in the –2 likelihood at convergence between two nested (in terms of specification) models In addition, the γ values

log-TABLE 9.1 Average Sample Characteristics of the Data Used Here (Standard Deviation in Parentheses)

Leisure (minutes/day) by a person 120.0 (159.2) 105.8 (155.9) 103.7 155.5) 109.7 (158.0) 99.5 (157.8)

# of children ages 1 to 5 in household 0.213 (0.53) 0.200 (0.51) 0.158 (0.48) 0.147 (0.46) 0.133 (0.48)

# of children ages 6 to 17 in household 0.437 (0.80) 0.440 (0.80) 0.450 (0.82) 0.438 (0.80) 0.433 (0.80)

TABLE 9.2 Leisure Time in a Day: Models A, B, and C

Note: SE = standard error.

Ngày đăng: 05/05/2018, 09:29

TỪ KHÓA LIÊN QUAN