Discrete choice and portfolio optimization under limited distributional information

mixed-This thesis studies a few optimization problems with uncertain parameters inthe context of discrete choice and ﬁnancial portfolio allocation when limiteddistributional information

Trang 1

OPTIMIZATION UNDER LIMITED DISTRIBUTIONAL INFORMATION

VINIT KUMAR MISHRA

NATIONAL UNIVERSITY OF SINGAPORE

2012

Trang 2

OPTIMIZATION UNDER LIMITED

DISTRIBUTIONAL INFORMATION

VINIT KUMAR MISHRA

(Bachelor of Technology, Indian Institute of Technology Bombay)

A THESIS SUBMITTEDFOR THE DEGREE OF DOCTOR OF PHILOSOPHYDEPARTMENT OF DECISION SCIENCES

NATIONAL UNIVERSITY OF SINGAPORE

2012

Trang 3

I extend my sincere gratitude towards my supervisor Professor Teo ChungPiaw Interacting with him over the past few years has been a blissful expe-rience I learnt several lessons from him in academic as well as non-academicdomains Once he said, “you can learn from everyone.” This is probably one

of the most-cherished lessons from him

I would like to thank Assoc Prof Karthik Natarajan from SingaporeUniversity of Technology and Design, who has been a valued coauthor andwho has always motivated me for research during tough times I would alsolike to thank Assoc Prof Melvyn Sim who has inspired me as a researcher

I would like to thank my teacher and thesis committee member ProfessorSun Jie for his teaching and comments I would also like to thank my thesiscommittee member Assoc Prof Trichy V Krishnan for his comments

I would like to thank my colleagues in the Department of Decision ences Mabel Chou, Wang Tong and Lucy Chen whose research was takingshape while I was here and who have always inspired me

Sci-I would like to thank the collaborators in industry Joseph Wong, shan and Manish Gupta from Agilent Technologies, Singapore who gave methe opportunity to learn about their operations I would also like to thankDhanesh Padmanabhan from General Motors R & D, India Science Lab who

Trang 4

Yan-took care of me during my visit there.

I would like to thank Ph.D and Research oﬃce, especially Hamidah andCheow Loo who handled several important matters very smoothly over thepast few years I would also like to thank Dorothy, Chwee Ming and SiewGeok from the Department of Decision Sciences for the same reason

Finally, I would like to thank my wife Parama Bal Mishra for being

a wonderful partner Without her support this thesis work would not bepossible I would also like to thank my parents Krishna Murari Mishra andKamala Mishra for having faith in me

Vinit Mishra

Singapore, May 2012

Trang 5

1 Introduction 1

1.1 Classical Parametric Approach to Choice Modeling 2

1.2 Choice Probabilities under Limited Distributional Information 7 1.3 Problems in Finance under Limited Distributional Information 12 1.4 Organization and Contributions 14

2 On Theoretical and Empirical Aspects of Marginal Distribution Choice Models 18

2.1 Choice Prediction under MDM 19

2.2 Estimation under MDM 24

2.2.1 A Convexity Result under MDM 25

2.2.2 Estimating the Asymptotic Variance of the Maximum Log-likelihood Estimators (MLE) 30

2.3 Pricing Multiple Products under MDM 34

2.4 Computational Experiments 41

2.4.1 Data 42

2.4.2 MNL Comparison 45

2.4.3 Mixed logit Comparison 48

2.4.4 Managerial Insights 53

Trang 6

3 Choice Prediction with Semidefinite Optimization When Utilities are

Correlated 57

3.1 The Cross Moment Model 61

3.1.1 Choice model representation of CMM 68

3.2 Examples 73

3.3 Flexible Packaging Design Problem 84

3.3.1 Data 90

3.3.2 Computational Results 96

4 A Reduced Formulation For CMM and Applications in Finance 100

4.1 Semideﬁnite Programming Formulation 102

4.1.1 Reduced Formulation 105

4.1.2 Multi-asset European call option pricing example 108

4.2 Robust Portfolio Choice Under Regret Criterion 111

4.2.1 Computational Experiments 118

4.3 Extensions 123

4.3.1 Reduced Formulation for the Probability Bound of Boy-d, Comanor and Vandenberghe 123

4.3.2 Reduced Formulation and Joint Chance-Constraints Approximation 127

4.3.3 Reduced Formulation and Choice Probabilities 129

5 Conclusion 133

Trang 7

Appendix 148

A Appendix 149

A.1 Estimation of asymptotic variance of MLE under the MMM 149

Trang 8

mixed-This thesis studies a few optimization problems with uncertain parameters inthe context of discrete choice and ﬁnancial portfolio allocation when limiteddistributional information of random parameters is available to the decisionmaker The Marginal Distribution Model (MDM) proposed by Natarajan,Song and Teo [62] is studied in the context of discrete choice MDM is based

on the assumption that the marginal distributions of random parameters, asopposed to complete distributional information, is available Several theoret-ical results relating the MDM to classical choice models such as GeneralizedExtreme Value (GEV) and Multinomial Logit (MNL) are provided Theo-retical properties of the MDM choice models are studied for a multi-productpricing problem, and further results are proposed for the parameter estima-tion problem using loglikelihood with MDM The use of MDM as a discretechoice model is exhibited using computational experiments on a safety fea-tures data set provided by General Motors

Following the approach of the MDM, we build another choice modelwhen mean and cross-moment information of random parameters is known

It is shown that this problem can be casted as a semideﬁnite program DP), giving choice probabilities under an extremal distribution as optimalsolution of some of the decision variables We call this model the Cross Mo-

Trang 9

(S-ment Model (CMM) We test this model using several examples from routechoice, random walk etc We further embed this model in a flexible packagingdesign problem to compare the designs suggested by the CMM with MNLand Multinomial Probit Although CMM is a parsimonious model that useslimited distributional information, in most examples we find its performancevery close to sophisticated models such as cross-nested logit, probit etc Fur-ther, prediction is done using an easy to solve convex semidefinite programleading to computational advantages.

Since CMM is a SDP and existing solvers can’t solve problems withlarge number of parameters, we propose a reduced but exact formulation for

CMM The new formulation is O(n2) in variables as opposed to the CMM,

which is O(n3) in variables This result is used to solve the problem of ﬁndingbounds on a multi-asset European call option prices and portfolio allocation

Trang 10

2.1 A sample choice task 44

3.1 Comparison of choice probabilities in binary choice case 74

3.2 Route choice network with three paths 75

3.3 Comparison of CMM and MNP 76

3.4 Route choice network with four paths 77

3.5 Absence of IIA property in CMM 81

3.6 Comparison of Choice Probabilities under Arcsine Law and CMM with n = 80 85

3.7 An example of a box with low volume usage 86

3.8 A ﬂexible box with 3 adjustable heights 87

3.9 Dimensions of various item-boxes 91

3.10 Destination-wise volume weight distribution for orders 92

3.11 A typical shipping cost curve for freight-forward services (dashed line) and express services (solid line) 93

3.12 A sample of packing using 3D loadpacker 94

3.13 View of packing generated in the sample of Figure 3.12 95

4.1 Computation times of Reduced & BL formulations in option pricing 110

Trang 11

4.2 Returns of asset 1 starting Jan 1999 to Dec 2009 (2767 datapoints) 119

Trang 12

2.1 Attribute and level codes 45

2.2 Estimation results for MMM and MNL-I 48

2.3 Estimation results for MMM and MNL-II 49

2.4 Fit and Prediction Statistics 49

2.5 Estimation results for mixed-MMM and mixed-logit model-I 53 2.6 Estimation results for mixed-MMM and mixed-logit model-II 54 3.1 Comparison of choice predictions 78

3.2 Laptop Choice Set 79

3.3 Overcoming the IPS property in CMM 82

3.4 Base sets selected by MNL, CMM and MNP 97

3.5 Simulated utilities and costs for MNL and CMM 98

3.6 Performance comparison of CMM and MNP 98

4.1 Realized returns, cvar of regret and variance for sample-based and robust models for Jan 2009-Jun 2009 using past data of 2008 returns 121

4.2 N SB and N Robust for various measures 121

4.3 Aggregate results for SB and Robust approaches 123

Trang 13

Consider the following zero-one optimization problem:

When U is deterministic, the solution to this problem is trivial, and

under optimality y j = 1 for j corresponding to the maximum of U j, andoptimal value is maxj ∈N U j When U is a random vector, however, the

optimal solution as well as optimal value are random themselves Lets denoterandom vector by ˜U When parameters are random, we are often interested

in ﬁnding the expected optimal value E θ (Z( ˜ U )) and probability P θ (y j ∗ = 1)

for j ∈ N , under joint distribution θ of random vector ˜ U This latterprobability is sometimes refered to as persistency value as in [62], and calledchoice probability in discrete choice literature

In discrete choice, let N = {1, 2, , n} be the ﬁnite set of alternatives.

A customer facing these n choices, chooses the alternative with the highest

utility, and would essentially solve the problem 1.1 This discrete choiceproblem arises in areas including but not limited to operations management,marketing and transportation Looking at past choices of decision-makers,

Trang 14

the statistician (modeler) predicts their future choices Often interest lies

in the behavior of a population rather than an individual For this reasonlets deﬁne the set of all customers as I In the following, we review the classical parametric approach to choice modeling, where distribution θ of

random utilities is assumed a priori

1.1 Classical Parametric Approach to Choice Modeling

Classical approach to discrete choice modeling considers parametric models,

where the distribution θ of utilities is assumed to be known Since the

ear-ly work of McFadden [55] on conditional logit anaear-lysis, several parametricmodels, such as multinomial probit, nested logit etc, have been proposed.Since our interest lies in the business setting, we often refer to alternatives

as products and decision makers as customers

Under a parametric approach, random utility models (RUMs) are built

as follows:

1 Let customer i’s utility from choosing alternative j ∈ N be expressed

in the additive form:

˜

U ij = V ij + ˜ϵ ij j ∈ N , (1.2)

where V ij is the deterministic component of random utility that

cap-tures modeler’s belief about the utility from the observed product and customer attributes The linear form V ij = β ′ x ij is most common in

literature, where β is the vector of preference weights (part-worths)

Trang 15

deﬁned over the set of product and customer attributes embedded in

vector x ij ˜ϵ ij is the random component of the utility that captures theeﬀects which are unobserved and not considered in the model

2 Assume a joint distribution for vector of error components ˜ϵ i =

(˜ϵ i1 , ˜ ϵ i2 , , ˜ ϵ in ) with density f (ϵ i)

3 Assume that customer i has complete knowledge of U i =

(U i1 , U i2 , , U in), while making the choice Under utility-maximizing

behavior, she solves (1.1), where Z(U i) is the maximum utility forcustomer and is simply max

j ∈N U ij Since utilities are unknown to the

modeler, Z( ˜ U i ) and the optimal solution y i ∗( ˜U i) of (1.1) for customer

i can be viewed as random variables Prediction of customer i’s choice

is done by evaluating choice probability P ij , j ∈ N :

The integral involved in the evaluation of choice probabilities is a

multidi-mensional integral over the density f (ϵ i) Discrete choice models are derived

based on choices of the density f , and only under certain cases does this

integral have a closed-form This includes the generalized extreme value(GEV) models, which are derived under the assumptions that the error-termdistribution is generalized extreme value Multinomial logit (MNL) is a well-known special case of these models The multi-dimensional integral does not

Trang 16

have a closed-form under most other cases Examples include probit wherethe error-terms are assumed to have a multivariate normal distribution, andmixed logit which assumes that random component of utility has two parts,one part is distributed according to a distribution speciﬁed by the researcher,and the other part is i.i.d extreme value Evaluation of the integration inthese cases relies on exhaustive simulation.

In GEV models, customer i’s choice probability for product j is given as

where G ij = ∂G(δ i1 , , δ in )/∂δ ij , δ ij = e V ij and the function G(δ i1 , , δ in)

is a non-negative diﬀerentiable function which satisﬁes a set of propertieslisted in McFadden [56] The joint distribution of the error terms is given as

P (˜ ϵ i1 ≤ ε1, , ˜ ϵ in ≤ ε n ) = e −G(e −ε1 , ,e −εn).

For the special case of MNL (McFadden [55] and Luce [50]), error-terms arei.i.d with the distribution:

P (˜ ϵ ij ≤ ε) = e −e −ε

Trang 17

and choice probabilities are described using the following neat expression.

simu-a computsimu-ationsimu-ally chsimu-allenging tsimu-ask The mixed logit model (see for exsimu-ampleTrain [74]), also called random coeﬃcient logit, considers the model parame-ters to have a random component apart from product and customer speciﬁcrandom component ˜ϵ ij For example, in the most popular case of linear util-ities, the mixed logit model considers part-worth to have two components, a

deterministic term (β) and a random term (˜ ϵ a) with the utility described as

˜

U ij = (β + ˜ ϵ a)′ x ij+ ˜ϵ ij By considering the randomness in model parameters,this model captures consumer taste variation When ˜ϵ ij are iid extreme value

Trang 18

distributed, the MNL formula applies:

parametric models either the assumption on known functional form of V ij orthe distribution of random utility component ˜ϵ ij, or both is relaxed Semi-parametric and nonparametric choice models is itself a well-researched area

Trang 19

Among these models are the maximum score method proposed by Manski[51] and [52], smoothed maximum score method for binary choice of Horowitz[42], Cosslett’s [20] distribution-free maximum likelihood estimator, Han’s[39] maximum rank correlation estimator, and the recent nonparametric ap-proach of Farias and Jagabathula [33] A critical issue in semiparametric andnonparametric choice models is the eﬃciency of estimators For example, tothe best of our knowledge, little is known regarding the asymptotic distribu-tion of models proposed in Manski [51], Han [39], Cosslett [20], and Fariasand Jagabathula [33] This leads to diﬃculty in statistical inference of esti-mators and often bootstrapping methods are used to gain some idea aboutthe variability of the estimators In several likelihood-based methods (para-metric and semi/nonparametric) such as mixed logit, multinomial probit,etc asymptotic distribution can be found using the asymptotic normalityproperty For mixed-logit and probit this is done using numerical methods

by ﬁnding information matrix numerically

1.2 Choice Probabilities under Limited Distributional

Information

Motivated by the work of Meilijson and Nad´as [58], Weiss [77], and

Bert-simas, Natarajan and Teo [5], [6] and [7] who propose convex optimizationformulations to ﬁnd tight bounds on the expected optimal value of certaincombinatorial optimization problems, Natarajan, Song and Teo [62] have re-cently proposed a semiparametric approach for choice modeling using limited

Trang 20

information of joint distribution of the random utilities Under these models,the choice prediction is performed in the following manner:

1 A behavioral model such as (1.2) for random utility is speciﬁed

2 Unlike the parametric approach to choice modeling, the distribution

of the vector of error-terms ˜ϵ i is not assumed a priori It is, however,assumed that the modeler has some limited information regarding thisdistribution, such as the marginal distributions or marginal moments of

utility-ventional approach lies in the evaluation of choice probabilities Ratherthan ﬁnding the values of choice probabilities for an assumed distribu-tion by evaluating a potentially diﬃcult to evaluate integral, the choice

probabilities are estimated at an extremal distribution θ ∗ that satisﬁespre-speciﬁed conditions (for example marginal distribution or marginalmoment information) This is done by maximizing the expectation ofcustomer’s maximum utility over the set Θ:

max

θ ∈Θ E

(

Z( ˜ U i))

Trang 21

where the extremal distribution is

is the choice probability of jth product under some extremal distribution.

These choice probabilities are found by maximizing the right hand side ofthe last equation under distributional constraints Examples of such modelsapplied to choice modeling are the Marginal Distribution Model (MDM) andMarginal Moment Model (MMM) MDM assumes that only the marginaldistributions of random utilities ˜U ij are known, and MMM is built under aneven more relaxed assumption that ﬁrst two marginal moments of ˜U ij areknown for the choice prediction problem For a detailed discussion on thesemodels, readers are referred to [62], who derive these models and exhibittheir application in discrete choice modeling The key result of Natarajan,Song and Teo [62] for the MDM is as follows

Theorem 1 (Natarajan, Song and Teo [62]) For customer i, assume that

Trang 22

the marginal distributions of the error terms are continuous and known as

˜

ϵ ij ∼ F ij (.) for j ∈ N The following concave maximization problem solves

(1.5) and the choice probabilities are obtained as the optimal solution P ∗ i

under an extremal distribution θ ∗ of (1.6):

j ∈N

P ij = 1, P ij ≥ 0 ∀j ∈ N

}

(1.10)

Trang 23

In this case, optimality conditions generate the choice probabilities:

P ij = 12

For a utility speciﬁcation such as ˜U ij = β ′ x + ˜ ϵ ij, one needs to know

customer i’s preference weights (part-worth) β on the product and customer

attributes in order to evaluate the choice probability To this end, a rameter estimation problem is solved, given the data on the actions taken

pa-by the decision-maker in similar choice situations This data can be

stat-ed preference data such as choice-basstat-ed conjoint data or revealstat-ed preference

data such as data on choice history of the customer The parameter tion is performed by maximizing the likelihood function Given choice data

estima-z ij , i ∈ I, j ∈ N , where z ij = 1, if customer i chooses product j from the

choice set N , zero otherwise, the parameters β are estimated by solving the

following maximum log-likelihood problem:

Trang 24

paramet-The semiparametric approach of MDM or MMM has the advantage thatchoice probabilities can be found by solving easy to solve convex optimizationproblems This avoids the evaluation of multidimensional integrals as is done

in several parametric models such as probit and mixed-logit

1.3 Problems in Finance under Limited Distributional

Information

Several problems in mathematical ﬁnance require computation of expected

value E θ [f (x, ˜ r)], where x is a parameter or decision vector, ˜ r is a random

vector having a joint distribution θ and f is a real-valued function For

in-stance, a derivative on assets is priced using a no-arbitrage argument, where

˜

r is the price vector of the underlying assets at the termination of the

con-tract Function f in these problems can take various forms, for example,

f (K, ˜ r) = max {max{˜r1, , ˜ r n } − K, 0} is the payoﬀ in a multi-asset pean max call option on n underlying assets with strike price K Another

Euro-instance is portfolio selection problem, where x is the portfolio allocation

in given assets and ˜r is the return vector of these assets In such problems given constraints on the decision vector x, one is interested in ﬁnding an

optimal portfolio to minimize E θ [f (x, ˜ r)] where f (x, ˜ r) is a loss function.

Function f can take various forms and can be used in risk measures such as

value at risk (VaR) and conditional value at risk (CVaR) Regret functionsare an example where one tries to minimize the expected regret with respect

to benchmark portfolios using some sort of risk measure

Trang 25

In certain simple cases, E θ [f (x, ˜ r)] can be computed in closed form.

For example, using a no-arbitrage argument, in European call option pricingcontext on a single asset, one obtains the well-known Black-Scholes formulaunder the assumption that the price follows a geometric Brownian motion

Function f here takes a simple two-piece linear form max {˜r−K, 0} Another

example is portfolio selection in the VaR framework under Gaussian butions It is well-known that in portfolio selection using VaR, when thedistribution of return vector ˜r is gaussian, one would essentially indulge in a

distri-convex optimization problem of the form minx ∈X −Φ −1 (ϵ) √

x ′ Γx −µ ′ x, where

Γ is the covariances matrix of random returns, ϵ ∈ (0, 1] a given parameter,

X a convex set, and Φ(.) is the cumulative normal distribution function A fundamental assumption in preceding examples is that the distribution θ is known In practice, the distribution θ of returns or asset prices etc is often

ambiguous to the investor(Natarajan et al [61]) To avoid this restriction,

a stream of research has focussed on the bounds on E θ [f (x, ˜ r)] over a set

of distributions θ ∈ Θ Rather than assuming a distribution itself, these

papers look into the bounds on expectation under a set of distributions thatare consistent with the limited information about the distributions availablefrom the data (see for example, Boyle and Lin [12], Bertsimas and Popescu[8]) Advantages of this approach are that we can use the limited informa-tion regarding distributions, typically moments information, to ﬁnd usefulbounds on the desired expectations, and if we need to solve an optimiza-tion problem as in portfolio allocation problems, many a times we deal witheasily solvable instances of convex optimization (see for example, El Ghaoui

et al [31]) The solutions obtained using this approach are distributionally

Trang 26

robust in the sense that under the class of distributions satisfying momentsconditions, one makes decision to protect against the worst case ([25], [63],[43]).

This thesis also presents some new theoretical results for the followingtwo problems, and discusses implications in the areas of option pricing andportfolio optimization

1 The upper bound problem:

sup

˜∼(µ,Σ) E θ

[max

where b k (x) : R m → R n and c k (x) : R m → R are aﬃne functions of

the decision vector x.

1.4 Organization and Contributions

This thesis contains three essays contributing to the literature of optimizationunder uncertainty when limited distributional information is known, theory

Trang 27

of discrete choice, and portfolio optimization Following is the organizationand key contributions of this work.

1 In Chapter 2, we discuss theoretical and empirical properties of theMarginal Distribution Model (MDM) in the context of discrete choice.More speciﬁcally, we show interesting connections of this approach tothe classical discrete choice models such as Multinomial Logit (MNL)and a more general class of choice models: Generalized Extreme Value(GEV) models We further show that the Marginal Moment Model(MMM) can also be replicated by MDM We also study the parameterestimation problem using loglikelihood under the MDM This estima-tion problem is known to be convex for only a few special cases such

as MNL and Nested Logit model for speciﬁc parameter choices Weshow that under linear utility speciﬁcation, the estimation problem isconvex in part-worths under appropriate conditions for special classes

of exponential distribution This includes the MNL and Nested Logitresults as special cases Further, using the asymptotic normality prop-erty of loglikelihood, we present a method to ﬁnd conﬁdence intervals

of estimated parameters under MDM We provide an application ofthe choice probabilities from MDM in the seller’s proﬁt maximizationproblem We show that the optimal prices for a set of diﬀerentiatedproducts can be found under MDM by solving a concave maximizationproblem when the marginal probability density functions are logcon-cave This provides a new class of choice models for which the multipleproduct pricing problem is tractable Finally, a conjoint choice data-

Trang 28

set for vehicle features is used to conduct experiments using the MDM,MNL, and mixed logit.

2 Chapter 3 extends the theory of persistency approach to a class of tributions where mean and covariance matrix of the random utilities

dis-is assumed to be known We refer to thdis-is model as the Cross MomentModel (CMM) and show that the choice probabilities can be found us-ing a semideﬁnite program (SDP) We test CMM using a few examples

in route choice and random walk, and compare the quality of choiceprediction with the other models such as multinomial probit (MNP),Nested logit, MNL etc Finally, we use the CMM to solve a packagingdesign problem using a data-set provided by a local service provider

in Singapore and compare the solutions with those suggested by MNLand MNP

3 The Cross Moment Model also yields upper bounds on the expectedvalue of the maximum of finite random vectors This model can be ex-tended to solve problems in finance We can use this approach to findbounds on the price of call options on the maximum of several assetreturns (see [9], [12], [49]) The theory also extends easily to portfoliooptimization under limited distributional information, yielding distri-butionally robust portfolio allocations (See [31] and [80])

The CMM formulation used in Chapter 3 is O(n3) in variables In

Chapter 4, we present a reduced formulation, which is O(n2) in ables This formulation is exact, and can lead to potential beneﬁts inﬁnding bounds on option prices and portfolio allocation problems in

Trang 29

vari-ﬁnance, which we study in Chapter 4.

4 Chapter 5 is reserved for conclusion and future work

Trang 30

MARGINAL DISTRIBUTION CHOICE MODELS

In this chapter, we study the Marginal Distribution Model (MDM) in discretechoice context Results for choice prediction as well as parameter estimationare presented For this reason, our presentation of the MDM will be based

on a linear form of random utility ˜U We would assume that the marginal

distributions of random error-terms are prespeciﬁed

Our main contributions for the choice prediction problem are presented

in Section 2.1, where we ﬁnd connections of MDM with the classical choicemodels such as multinomial logit (MNL) and GEV More speciﬁcally, weidentify the marginal error-term distributions under which the MDM ap-proach begets these classical choice probability formulas Further it is shownthat the choice probabilities of Marginal Moment Model (MMM) can bereplicated by the MDM

For the parameter estimation problem, in Section 2.2, we present a vexity result for the loglikelihood problem under MDM Using the asymptoticnormality property of loglikelihood, we present the method to find confidenceintervals under the MDM This is important, since the earlier work of [62]finds parameter estimates but doesn’t develop methods to find error esti-

Trang 31

con-mates of these parameters.

As an application of MDM choice probabilities, we study the product pricing problem in Section 2.3 From earlier literature, it is knownthat this problem has a convex optimization formulation for MNL Recently[48] have proved convexity of this problem for nested logit model under someassumptions We show that for the marginal distributions with log-concavedensity, this problem has a convex optimization formulation Further resultsare provided for MMM as well

multi-In the last section of this chapter, we present our computational periments on a conjoint choice data set of vehicle features A comparison ofMMM with MNL and mixed-logit is provided We further present managerialinsights that our experiments entail

The MDM choice probabilities as given by (1.8) and (1.9) are quite general in

the sense that the choice of marginal distributions F ij of error-terms ˜ϵ ij leads

to diﬀerent choice models This model can be related to the MNL modelunder a special case as we show in the next theorem Recall that MNL

is derived from RUM (1.2) assuming that error-terms are i.i.d and extremevalue distributed The MNL choice probability formula is provided in (1.4)

Theorem 2 Say customer i ∈ I has random utility given by (1.2) der the Marginal Distribution Model (MDM) when error terms ˜ ϵ ij , j ∈ N are identically distributed, choice probabilities are multinomial logit choice

Trang 32

Un-probabilities if and only if ˜ ϵ ij , j ∈ N is exponentially distributed.

Proof: Let error-terms ˜ϵ ij , j ∈ N be exponentially distributed with eter α > 0 Then, from (1.8) under MDM, the choice probability of product

which is the MNL choice probability with scale parameter α.

Next, let the choice probabilities from the MDM be of the form (2.2),and the error-terms ˜ϵ ij , j ∈ N have the identical CDF H(.) Then the ratio

of the choice probabilities of any two products j, k ∈ N is

Trang 33

Since this equation holds true for any arbitrary V ij and V ik, the following

must be true for any x, y:

Trang 34

case with the distribution function deﬁned as the generalized exponentialdistribution:

F ij (x) = 1 −e −x G ik (e V i1 , , e V ij , , e V in ) for x ≥ ln(G ik (e V i1 , , e V ij , , e V in )).

(2.3)

Note that F ij (x) is a valid distribution function for ˜ ϵ ij under the assumptionslisted in McFadden [56] In this case, the Lagrange multiplier satisﬁes thecondition:

ik (e V i1 , , e V ik , , e V in ), choice probabilities under the MDM are same as the ones given by the classical GEV model.

To guarantee that the choice probabilities are consistent with utilitymaximization in the GEV model, assumptions on the signs of higher order

Trang 35

cross partial derivatives of the function G( ·) need to be made (see McFadden

[56]) On the other hand, this assumption need not be imposed in (2.3)

thereby generalizing the class of functions G( ·) for which the formula (2.4)

is valid under MDM

As a third example of choice probabilities that can be replicated byMDM, consider the Marginal Moment Model (1.11) and (1.12) derived underthe assumption that the ﬁrst two marginal moments of the error terms ˜ϵ ij areknown To show that MDM can be used to obtain these choice probabilities,consider the three-parameter form of t-distribution with density functiongiven by:

f ij (x) = Γ(

ν+1

2 )Γ(ν2)

(

λ πν

where µ is a location parameter, and λ is an inverse scale parameter, and ν

is the number of degrees of freedom Set µ = 0, λ = 2/σ ij2, and ν = 2, the

density function takes the form:

Trang 36

Preceding observation leads us to the following result.

Theorem 4 Say customer i ∈ I has random utility given by (1.2) Under Marginal Distribution Model (MDM) when error terms ˜ ϵ ij , j ∈ N have the marginal distribution F ij(·) as given by (2.5), choice probabilities under the MDM are same as the ones given by the MMM with the variance of ˜ ϵ ij , j ∈ N equal to σ2ij and mean zero.

In this section, we provide results on the parameter estimation problem underthe MDM We ﬁrst identify examples where the maximum log-likelihoodproblem under MDM is a convex optimization problem Next, we establish

a method to evaluate standard-error estimates of maximum log-likelihoodestimators using the asymptotic theory of maximum log-likelihood It isknown that the estimation of the standard error of optimal max-log-likelihoodestimators can be done by evaluating the information matrix, which needs

us to evaluate second partial derivatives of the log-likelihood function MNLhas analytical expressions for these derivatives, which come handy during the

Trang 37

evaluation of standard errors For models such as multinomial probit andmixed-logit this is done using the method of simulated log-likelihood Wedescribe a method to ﬁnd almost analytical expressions of partial derivatives

of the log-likelihood function in the case of MDM

2.2.1 A Convexity Result under MDM

Convexity of the maximum log-likelihood optimization problem (1.13) is ahighly desired property It implies, ﬁrst, that the computational search ofthe maximum likelihood estimates (MLE) is easy, and second the globaloptimality of MLE is guaranteed Only a few results are known in thisregard McFadden [55] showed that the maximum log-likelihood problemunder MNL is a convex optimization problem For the Nested Logit Model,

Daganzo and Kusnic [21] show that the problem is convex in part-worth β parameters for a choice of scale parameters if the mean utility is linear in β.

Note that in nested logit, scale parameters are also estimated but the problem

is not jointly convex in part-worth parameters and scale parameters

The maximum log-likelihood problem under the MDM is:

Trang 38

oth-utility ˜U ij The maximization is performed over the estimated parameters β and Lagrange multipliers λ = (λ1, , λ |I| ) Part-worth β are contained in

V ij, and other estimated parameters such as scale parameters etc may be

described in the form of distribution F ij In the following theorem we present

a convexity result for maximum likelihood problem under MDM

Theorem 5 Suppose the error components of the random utility have marginal

distributions F ij (y) = 1 − e −a ij y+ln(G ij (e Vi1 , ,e Vin))

, i ∈ I, j ∈ N , a ij > 0, and deterministic component V ij of utilities are linear in estimated parameters.

If the following two conditions are satisﬁed:

(a) ln(G ij (e V i1 (β) , , e V in (β) )) is concave in β,

(b) e (ln(G ij (e Vi1( β), ,e Vin( β)))+a ij (V ij (β) −λ i)) is convex in (β, λ i ),

then the maximum log-likelihood problem (2.6) under MDM is a convex timization problem.

op-Proof: If F ij (y) = 1 − e −a ij y+ln(G ij (e Vi1 , ,e Vin))

, then maximum log-likelihoodproblem (2.6) reduces to:

Trang 39

We consider the following relaxation of (2.7):

For linear utilities of the form V ij (β) = β ′ x ij, (2.8) is clearly a convex

opti-mization problem in the decision variables (β, λ) if conditions (a) and (b) of

Theorem 5 are satisﬁed To see that it is equivalent to (2.7), we note that

both the objective function and the (unique) constraint involving λ i are both

decreasing in λ i So in the optimal solution, λ i is chosen to be the minimumvalue such that

∑

j ∈N

e ln(G ij (e Vi1( β), ,e Vin( β) ))−a ij (λ i −V ij (β))

= 1.

Therefore the formulations (2.7) and (2.8) are equivalent

When a ij is independent of j ∈ N and is constant (possibly diﬀerent) for each i ∈ I , above result leads to a convexity result for the GEV model Clearly, under identical exponential marginal distributions F ij , j ∈ N , (2.6) is

well-behaved as we essentially get the MNL due to Theorem 2 The classical

logit model reduces to the simple case when G i (y1, , y n) = ∑

k y k So we

have G ij (y1 , , y n ) = 1 for all j In this case, the conditions in Theorem 5

hold and hence the estimation for the classical logit reduces to the followingconvex formulation:

Trang 40

Theorem 6 The estimation problem (2.10) is a convex optimization problem

if V ij (β) are aﬃne in β and θ l ∈ (0, 1].

Định dạng
Số trang	166
Dung lượng	5,66 MB