mixed-This thesis studies a few optimization problems with uncertain parameters inthe context of discrete choice and financial portfolio allocation when limiteddistributional information
Trang 1OPTIMIZATION UNDER LIMITED DISTRIBUTIONAL INFORMATION
VINIT KUMAR MISHRA
NATIONAL UNIVERSITY OF SINGAPORE
2012
Trang 2OPTIMIZATION UNDER LIMITED
DISTRIBUTIONAL INFORMATION
VINIT KUMAR MISHRA
(Bachelor of Technology, Indian Institute of Technology Bombay)
A THESIS SUBMITTEDFOR THE DEGREE OF DOCTOR OF PHILOSOPHYDEPARTMENT OF DECISION SCIENCES
NATIONAL UNIVERSITY OF SINGAPORE
2012
Trang 3I extend my sincere gratitude towards my supervisor Professor Teo ChungPiaw Interacting with him over the past few years has been a blissful expe-rience I learnt several lessons from him in academic as well as non-academicdomains Once he said, “you can learn from everyone.” This is probably one
of the most-cherished lessons from him
I would like to thank Assoc Prof Karthik Natarajan from SingaporeUniversity of Technology and Design, who has been a valued coauthor andwho has always motivated me for research during tough times I would alsolike to thank Assoc Prof Melvyn Sim who has inspired me as a researcher
I would like to thank my teacher and thesis committee member ProfessorSun Jie for his teaching and comments I would also like to thank my thesiscommittee member Assoc Prof Trichy V Krishnan for his comments
I would like to thank my colleagues in the Department of Decision ences Mabel Chou, Wang Tong and Lucy Chen whose research was takingshape while I was here and who have always inspired me
Sci-I would like to thank the collaborators in industry Joseph Wong, shan and Manish Gupta from Agilent Technologies, Singapore who gave methe opportunity to learn about their operations I would also like to thankDhanesh Padmanabhan from General Motors R & D, India Science Lab who
Trang 4Yan-took care of me during my visit there.
I would like to thank Ph.D and Research office, especially Hamidah andCheow Loo who handled several important matters very smoothly over thepast few years I would also like to thank Dorothy, Chwee Ming and SiewGeok from the Department of Decision Sciences for the same reason
Finally, I would like to thank my wife Parama Bal Mishra for being
a wonderful partner Without her support this thesis work would not bepossible I would also like to thank my parents Krishna Murari Mishra andKamala Mishra for having faith in me
Vinit Mishra
Singapore, May 2012
Trang 51 Introduction 1
1.1 Classical Parametric Approach to Choice Modeling 2
1.2 Choice Probabilities under Limited Distributional Information 7 1.3 Problems in Finance under Limited Distributional Information 12 1.4 Organization and Contributions 14
2 On Theoretical and Empirical Aspects of Marginal Distribution Choice Models 18
2.1 Choice Prediction under MDM 19
2.2 Estimation under MDM 24
2.2.1 A Convexity Result under MDM 25
2.2.2 Estimating the Asymptotic Variance of the Maximum Log-likelihood Estimators (MLE) 30
2.3 Pricing Multiple Products under MDM 34
2.4 Computational Experiments 41
2.4.1 Data 42
2.4.2 MNL Comparison 45
2.4.3 Mixed logit Comparison 48
2.4.4 Managerial Insights 53
Trang 63 Choice Prediction with Semidefinite Optimization When Utilities are
Correlated 57
3.1 The Cross Moment Model 61
3.1.1 Choice model representation of CMM 68
3.2 Examples 73
3.3 Flexible Packaging Design Problem 84
3.3.1 Data 90
3.3.2 Computational Results 96
4 A Reduced Formulation For CMM and Applications in Finance 100
4.1 Semidefinite Programming Formulation 102
4.1.1 Reduced Formulation 105
4.1.2 Multi-asset European call option pricing example 108
4.2 Robust Portfolio Choice Under Regret Criterion 111
4.2.1 Computational Experiments 118
4.3 Extensions 123
4.3.1 Reduced Formulation for the Probability Bound of Boy-d, Comanor and Vandenberghe 123
4.3.2 Reduced Formulation and Joint Chance-Constraints Approximation 127
4.3.3 Reduced Formulation and Choice Probabilities 129
5 Conclusion 133
Trang 7Appendix 148
A Appendix 149
A.1 Estimation of asymptotic variance of MLE under the MMM 149
Trang 8mixed-This thesis studies a few optimization problems with uncertain parameters inthe context of discrete choice and financial portfolio allocation when limiteddistributional information of random parameters is available to the decisionmaker The Marginal Distribution Model (MDM) proposed by Natarajan,Song and Teo [62] is studied in the context of discrete choice MDM is based
on the assumption that the marginal distributions of random parameters, asopposed to complete distributional information, is available Several theoret-ical results relating the MDM to classical choice models such as GeneralizedExtreme Value (GEV) and Multinomial Logit (MNL) are provided Theo-retical properties of the MDM choice models are studied for a multi-productpricing problem, and further results are proposed for the parameter estima-tion problem using loglikelihood with MDM The use of MDM as a discretechoice model is exhibited using computational experiments on a safety fea-tures data set provided by General Motors
Following the approach of the MDM, we build another choice modelwhen mean and cross-moment information of random parameters is known
It is shown that this problem can be casted as a semidefinite program DP), giving choice probabilities under an extremal distribution as optimalsolution of some of the decision variables We call this model the Cross Mo-
Trang 9(S-ment Model (CMM) We test this model using several examples from routechoice, random walk etc We further embed this model in a flexible packagingdesign problem to compare the designs suggested by the CMM with MNLand Multinomial Probit Although CMM is a parsimonious model that useslimited distributional information, in most examples we find its performancevery close to sophisticated models such as cross-nested logit, probit etc Fur-ther, prediction is done using an easy to solve convex semidefinite programleading to computational advantages.
Since CMM is a SDP and existing solvers can’t solve problems withlarge number of parameters, we propose a reduced but exact formulation for
CMM The new formulation is O(n2) in variables as opposed to the CMM,
which is O(n3) in variables This result is used to solve the problem of findingbounds on a multi-asset European call option prices and portfolio allocation
Trang 102.1 A sample choice task 44
3.1 Comparison of choice probabilities in binary choice case 74
3.2 Route choice network with three paths 75
3.3 Comparison of CMM and MNP 76
3.4 Route choice network with four paths 77
3.5 Absence of IIA property in CMM 81
3.6 Comparison of Choice Probabilities under Arcsine Law and CMM with n = 80 85
3.7 An example of a box with low volume usage 86
3.8 A flexible box with 3 adjustable heights 87
3.9 Dimensions of various item-boxes 91
3.10 Destination-wise volume weight distribution for orders 92
3.11 A typical shipping cost curve for freight-forward services (dashed line) and express services (solid line) 93
3.12 A sample of packing using 3D loadpacker 94
3.13 View of packing generated in the sample of Figure 3.12 95
4.1 Computation times of Reduced & BL formulations in option pricing 110
Trang 114.2 Returns of asset 1 starting Jan 1999 to Dec 2009 (2767 datapoints) 119
Trang 122.1 Attribute and level codes 45
2.2 Estimation results for MMM and MNL-I 48
2.3 Estimation results for MMM and MNL-II 49
2.4 Fit and Prediction Statistics 49
2.5 Estimation results for mixed-MMM and mixed-logit model-I 53 2.6 Estimation results for mixed-MMM and mixed-logit model-II 54 3.1 Comparison of choice predictions 78
3.2 Laptop Choice Set 79
3.3 Overcoming the IPS property in CMM 82
3.4 Base sets selected by MNL, CMM and MNP 97
3.5 Simulated utilities and costs for MNL and CMM 98
3.6 Performance comparison of CMM and MNP 98
4.1 Realized returns, cvar of regret and variance for sample-based and robust models for Jan 2009-Jun 2009 using past data of 2008 returns 121
4.2 N SB and N Robust for various measures 121
4.3 Aggregate results for SB and Robust approaches 123
Trang 13Consider the following zero-one optimization problem:
When U is deterministic, the solution to this problem is trivial, and
under optimality y j = 1 for j corresponding to the maximum of U j, andoptimal value is maxj ∈N U j When U is a random vector, however, the
optimal solution as well as optimal value are random themselves Lets denoterandom vector by ˜U When parameters are random, we are often interested
in finding the expected optimal value E θ (Z( ˜ U )) and probability P θ (y j ∗ = 1)
for j ∈ N , under joint distribution θ of random vector ˜ U This latterprobability is sometimes refered to as persistency value as in [62], and calledchoice probability in discrete choice literature
In discrete choice, let N = {1, 2, , n} be the finite set of alternatives.
A customer facing these n choices, chooses the alternative with the highest
utility, and would essentially solve the problem 1.1 This discrete choiceproblem arises in areas including but not limited to operations management,marketing and transportation Looking at past choices of decision-makers,
Trang 14the statistician (modeler) predicts their future choices Often interest lies
in the behavior of a population rather than an individual For this reasonlets define the set of all customers as I In the following, we review the classical parametric approach to choice modeling, where distribution θ of
random utilities is assumed a priori
1.1 Classical Parametric Approach to Choice Modeling
Classical approach to discrete choice modeling considers parametric models,
where the distribution θ of utilities is assumed to be known Since the
ear-ly work of McFadden [55] on conditional logit anaear-lysis, several parametricmodels, such as multinomial probit, nested logit etc, have been proposed.Since our interest lies in the business setting, we often refer to alternatives
as products and decision makers as customers
Under a parametric approach, random utility models (RUMs) are built
as follows:
1 Let customer i’s utility from choosing alternative j ∈ N be expressed
in the additive form:
˜
U ij = V ij + ˜ϵ ij j ∈ N , (1.2)
where V ij is the deterministic component of random utility that
cap-tures modeler’s belief about the utility from the observed product and customer attributes The linear form V ij = β ′ x ij is most common in
literature, where β is the vector of preference weights (part-worths)
Trang 15defined over the set of product and customer attributes embedded in
vector x ij ˜ϵ ij is the random component of the utility that captures theeffects which are unobserved and not considered in the model
2 Assume a joint distribution for vector of error components ˜ϵ i =
(˜ϵ i1 , ˜ ϵ i2 , , ˜ ϵ in ) with density f (ϵ i)
3 Assume that customer i has complete knowledge of U i =
(U i1 , U i2 , , U in), while making the choice Under utility-maximizing
behavior, she solves (1.1), where Z(U i) is the maximum utility forcustomer and is simply max
j ∈N U ij Since utilities are unknown to the
modeler, Z( ˜ U i ) and the optimal solution y i ∗( ˜U i) of (1.1) for customer
i can be viewed as random variables Prediction of customer i’s choice
is done by evaluating choice probability P ij , j ∈ N :
The integral involved in the evaluation of choice probabilities is a
multidi-mensional integral over the density f (ϵ i) Discrete choice models are derived
based on choices of the density f , and only under certain cases does this
integral have a closed-form This includes the generalized extreme value(GEV) models, which are derived under the assumptions that the error-termdistribution is generalized extreme value Multinomial logit (MNL) is a well-known special case of these models The multi-dimensional integral does not
Trang 16have a closed-form under most other cases Examples include probit wherethe error-terms are assumed to have a multivariate normal distribution, andmixed logit which assumes that random component of utility has two parts,one part is distributed according to a distribution specified by the researcher,and the other part is i.i.d extreme value Evaluation of the integration inthese cases relies on exhaustive simulation.
In GEV models, customer i’s choice probability for product j is given as
where G ij = ∂G(δ i1 , , δ in )/∂δ ij , δ ij = e V ij and the function G(δ i1 , , δ in)
is a non-negative differentiable function which satisfies a set of propertieslisted in McFadden [56] The joint distribution of the error terms is given as
P (˜ ϵ i1 ≤ ε1, , ˜ ϵ in ≤ ε n ) = e −G(e −ε1 , ,e −εn).
For the special case of MNL (McFadden [55] and Luce [50]), error-terms arei.i.d with the distribution:
P (˜ ϵ ij ≤ ε) = e −e −ε
Trang 17and choice probabilities are described using the following neat expression.
simu-a computsimu-ationsimu-ally chsimu-allenging tsimu-ask The mixed logit model (see for exsimu-ampleTrain [74]), also called random coefficient logit, considers the model parame-ters to have a random component apart from product and customer specificrandom component ˜ϵ ij For example, in the most popular case of linear util-ities, the mixed logit model considers part-worth to have two components, a
deterministic term (β) and a random term (˜ ϵ a) with the utility described as
˜
U ij = (β + ˜ ϵ a)′ x ij+ ˜ϵ ij By considering the randomness in model parameters,this model captures consumer taste variation When ˜ϵ ij are iid extreme value
Trang 18distributed, the MNL formula applies:
parametric models either the assumption on known functional form of V ij orthe distribution of random utility component ˜ϵ ij, or both is relaxed Semi-parametric and nonparametric choice models is itself a well-researched area
Trang 19Among these models are the maximum score method proposed by Manski[51] and [52], smoothed maximum score method for binary choice of Horowitz[42], Cosslett’s [20] distribution-free maximum likelihood estimator, Han’s[39] maximum rank correlation estimator, and the recent nonparametric ap-proach of Farias and Jagabathula [33] A critical issue in semiparametric andnonparametric choice models is the efficiency of estimators For example, tothe best of our knowledge, little is known regarding the asymptotic distribu-tion of models proposed in Manski [51], Han [39], Cosslett [20], and Fariasand Jagabathula [33] This leads to difficulty in statistical inference of esti-mators and often bootstrapping methods are used to gain some idea aboutthe variability of the estimators In several likelihood-based methods (para-metric and semi/nonparametric) such as mixed logit, multinomial probit,etc asymptotic distribution can be found using the asymptotic normalityproperty For mixed-logit and probit this is done using numerical methods
by finding information matrix numerically
1.2 Choice Probabilities under Limited Distributional
Information
Motivated by the work of Meilijson and Nad´as [58], Weiss [77], and
Bert-simas, Natarajan and Teo [5], [6] and [7] who propose convex optimizationformulations to find tight bounds on the expected optimal value of certaincombinatorial optimization problems, Natarajan, Song and Teo [62] have re-cently proposed a semiparametric approach for choice modeling using limited
Trang 20information of joint distribution of the random utilities Under these models,the choice prediction is performed in the following manner:
1 A behavioral model such as (1.2) for random utility is specified
2 Unlike the parametric approach to choice modeling, the distribution
of the vector of error-terms ˜ϵ i is not assumed a priori It is, however,assumed that the modeler has some limited information regarding thisdistribution, such as the marginal distributions or marginal moments of
utility-ventional approach lies in the evaluation of choice probabilities Ratherthan finding the values of choice probabilities for an assumed distribu-tion by evaluating a potentially difficult to evaluate integral, the choice
probabilities are estimated at an extremal distribution θ ∗ that satisfiespre-specified conditions (for example marginal distribution or marginalmoment information) This is done by maximizing the expectation ofcustomer’s maximum utility over the set Θ:
max
θ ∈Θ E
(
Z( ˜ U i))
Trang 21where the extremal distribution is
is the choice probability of jth product under some extremal distribution.
These choice probabilities are found by maximizing the right hand side ofthe last equation under distributional constraints Examples of such modelsapplied to choice modeling are the Marginal Distribution Model (MDM) andMarginal Moment Model (MMM) MDM assumes that only the marginaldistributions of random utilities ˜U ij are known, and MMM is built under aneven more relaxed assumption that first two marginal moments of ˜U ij areknown for the choice prediction problem For a detailed discussion on thesemodels, readers are referred to [62], who derive these models and exhibittheir application in discrete choice modeling The key result of Natarajan,Song and Teo [62] for the MDM is as follows
Theorem 1 (Natarajan, Song and Teo [62]) For customer i, assume that
Trang 22the marginal distributions of the error terms are continuous and known as
˜
ϵ ij ∼ F ij (.) for j ∈ N The following concave maximization problem solves
(1.5) and the choice probabilities are obtained as the optimal solution P ∗ i
under an extremal distribution θ ∗ of (1.6):
j ∈N
P ij = 1, P ij ≥ 0 ∀j ∈ N
}
(1.10)
Trang 23In this case, optimality conditions generate the choice probabilities:
P ij = 12
For a utility specification such as ˜U ij = β ′ x + ˜ ϵ ij, one needs to know
customer i’s preference weights (part-worth) β on the product and customer
attributes in order to evaluate the choice probability To this end, a rameter estimation problem is solved, given the data on the actions taken
pa-by the decision-maker in similar choice situations This data can be
stat-ed preference data such as choice-basstat-ed conjoint data or revealstat-ed preference
data such as data on choice history of the customer The parameter tion is performed by maximizing the likelihood function Given choice data
estima-z ij , i ∈ I, j ∈ N , where z ij = 1, if customer i chooses product j from the
choice set N , zero otherwise, the parameters β are estimated by solving the
following maximum log-likelihood problem:
Trang 24paramet-The semiparametric approach of MDM or MMM has the advantage thatchoice probabilities can be found by solving easy to solve convex optimizationproblems This avoids the evaluation of multidimensional integrals as is done
in several parametric models such as probit and mixed-logit
1.3 Problems in Finance under Limited Distributional
Information
Several problems in mathematical finance require computation of expected
value E θ [f (x, ˜ r)], where x is a parameter or decision vector, ˜ r is a random
vector having a joint distribution θ and f is a real-valued function For
in-stance, a derivative on assets is priced using a no-arbitrage argument, where
˜
r is the price vector of the underlying assets at the termination of the
con-tract Function f in these problems can take various forms, for example,
f (K, ˜ r) = max {max{˜r1, , ˜ r n } − K, 0} is the payoff in a multi-asset pean max call option on n underlying assets with strike price K Another
Euro-instance is portfolio selection problem, where x is the portfolio allocation
in given assets and ˜r is the return vector of these assets In such problems given constraints on the decision vector x, one is interested in finding an
optimal portfolio to minimize E θ [f (x, ˜ r)] where f (x, ˜ r) is a loss function.
Function f can take various forms and can be used in risk measures such as
value at risk (VaR) and conditional value at risk (CVaR) Regret functionsare an example where one tries to minimize the expected regret with respect
to benchmark portfolios using some sort of risk measure
Trang 25In certain simple cases, E θ [f (x, ˜ r)] can be computed in closed form.
For example, using a no-arbitrage argument, in European call option pricingcontext on a single asset, one obtains the well-known Black-Scholes formulaunder the assumption that the price follows a geometric Brownian motion
Function f here takes a simple two-piece linear form max {˜r−K, 0} Another
example is portfolio selection in the VaR framework under Gaussian butions It is well-known that in portfolio selection using VaR, when thedistribution of return vector ˜r is gaussian, one would essentially indulge in a
distri-convex optimization problem of the form minx ∈X −Φ −1 (ϵ) √
x ′ Γx −µ ′ x, where
Γ is the covariances matrix of random returns, ϵ ∈ (0, 1] a given parameter,
X a convex set, and Φ(.) is the cumulative normal distribution function A fundamental assumption in preceding examples is that the distribution θ is known In practice, the distribution θ of returns or asset prices etc is often
ambiguous to the investor(Natarajan et al [61]) To avoid this restriction,
a stream of research has focussed on the bounds on E θ [f (x, ˜ r)] over a set
of distributions θ ∈ Θ Rather than assuming a distribution itself, these
papers look into the bounds on expectation under a set of distributions thatare consistent with the limited information about the distributions availablefrom the data (see for example, Boyle and Lin [12], Bertsimas and Popescu[8]) Advantages of this approach are that we can use the limited informa-tion regarding distributions, typically moments information, to find usefulbounds on the desired expectations, and if we need to solve an optimiza-tion problem as in portfolio allocation problems, many a times we deal witheasily solvable instances of convex optimization (see for example, El Ghaoui
et al [31]) The solutions obtained using this approach are distributionally
Trang 26robust in the sense that under the class of distributions satisfying momentsconditions, one makes decision to protect against the worst case ([25], [63],[43]).
This thesis also presents some new theoretical results for the followingtwo problems, and discusses implications in the areas of option pricing andportfolio optimization
1 The upper bound problem:
sup
˜∼(µ,Σ) E θ
[max
where b k (x) : R m → R n and c k (x) : R m → R are affine functions of
the decision vector x.
1.4 Organization and Contributions
This thesis contains three essays contributing to the literature of optimizationunder uncertainty when limited distributional information is known, theory
Trang 27of discrete choice, and portfolio optimization Following is the organizationand key contributions of this work.
1 In Chapter 2, we discuss theoretical and empirical properties of theMarginal Distribution Model (MDM) in the context of discrete choice.More specifically, we show interesting connections of this approach tothe classical discrete choice models such as Multinomial Logit (MNL)and a more general class of choice models: Generalized Extreme Value(GEV) models We further show that the Marginal Moment Model(MMM) can also be replicated by MDM We also study the parameterestimation problem using loglikelihood under the MDM This estima-tion problem is known to be convex for only a few special cases such
as MNL and Nested Logit model for specific parameter choices Weshow that under linear utility specification, the estimation problem isconvex in part-worths under appropriate conditions for special classes
of exponential distribution This includes the MNL and Nested Logitresults as special cases Further, using the asymptotic normality prop-erty of loglikelihood, we present a method to find confidence intervals
of estimated parameters under MDM We provide an application ofthe choice probabilities from MDM in the seller’s profit maximizationproblem We show that the optimal prices for a set of differentiatedproducts can be found under MDM by solving a concave maximizationproblem when the marginal probability density functions are logcon-cave This provides a new class of choice models for which the multipleproduct pricing problem is tractable Finally, a conjoint choice data-
Trang 28set for vehicle features is used to conduct experiments using the MDM,MNL, and mixed logit.
2 Chapter 3 extends the theory of persistency approach to a class of tributions where mean and covariance matrix of the random utilities
dis-is assumed to be known We refer to thdis-is model as the Cross MomentModel (CMM) and show that the choice probabilities can be found us-ing a semidefinite program (SDP) We test CMM using a few examples
in route choice and random walk, and compare the quality of choiceprediction with the other models such as multinomial probit (MNP),Nested logit, MNL etc Finally, we use the CMM to solve a packagingdesign problem using a data-set provided by a local service provider
in Singapore and compare the solutions with those suggested by MNLand MNP
3 The Cross Moment Model also yields upper bounds on the expectedvalue of the maximum of finite random vectors This model can be ex-tended to solve problems in finance We can use this approach to findbounds on the price of call options on the maximum of several assetreturns (see [9], [12], [49]) The theory also extends easily to portfoliooptimization under limited distributional information, yielding distri-butionally robust portfolio allocations (See [31] and [80])
The CMM formulation used in Chapter 3 is O(n3) in variables In
Chapter 4, we present a reduced formulation, which is O(n2) in ables This formulation is exact, and can lead to potential benefits infinding bounds on option prices and portfolio allocation problems in
Trang 29vari-finance, which we study in Chapter 4.
4 Chapter 5 is reserved for conclusion and future work
Trang 30MARGINAL DISTRIBUTION CHOICE MODELS
In this chapter, we study the Marginal Distribution Model (MDM) in discretechoice context Results for choice prediction as well as parameter estimationare presented For this reason, our presentation of the MDM will be based
on a linear form of random utility ˜U We would assume that the marginal
distributions of random error-terms are prespecified
Our main contributions for the choice prediction problem are presented
in Section 2.1, where we find connections of MDM with the classical choicemodels such as multinomial logit (MNL) and GEV More specifically, weidentify the marginal error-term distributions under which the MDM ap-proach begets these classical choice probability formulas Further it is shownthat the choice probabilities of Marginal Moment Model (MMM) can bereplicated by the MDM
For the parameter estimation problem, in Section 2.2, we present a vexity result for the loglikelihood problem under MDM Using the asymptoticnormality property of loglikelihood, we present the method to find confidenceintervals under the MDM This is important, since the earlier work of [62]finds parameter estimates but doesn’t develop methods to find error esti-
Trang 31con-mates of these parameters.
As an application of MDM choice probabilities, we study the product pricing problem in Section 2.3 From earlier literature, it is knownthat this problem has a convex optimization formulation for MNL Recently[48] have proved convexity of this problem for nested logit model under someassumptions We show that for the marginal distributions with log-concavedensity, this problem has a convex optimization formulation Further resultsare provided for MMM as well
multi-In the last section of this chapter, we present our computational periments on a conjoint choice data set of vehicle features A comparison ofMMM with MNL and mixed-logit is provided We further present managerialinsights that our experiments entail
The MDM choice probabilities as given by (1.8) and (1.9) are quite general in
the sense that the choice of marginal distributions F ij of error-terms ˜ϵ ij leads
to different choice models This model can be related to the MNL modelunder a special case as we show in the next theorem Recall that MNL
is derived from RUM (1.2) assuming that error-terms are i.i.d and extremevalue distributed The MNL choice probability formula is provided in (1.4)
Theorem 2 Say customer i ∈ I has random utility given by (1.2) der the Marginal Distribution Model (MDM) when error terms ˜ ϵ ij , j ∈ N are identically distributed, choice probabilities are multinomial logit choice
Trang 32Un-probabilities if and only if ˜ ϵ ij , j ∈ N is exponentially distributed.
Proof: Let error-terms ˜ϵ ij , j ∈ N be exponentially distributed with eter α > 0 Then, from (1.8) under MDM, the choice probability of product
which is the MNL choice probability with scale parameter α.
Next, let the choice probabilities from the MDM be of the form (2.2),and the error-terms ˜ϵ ij , j ∈ N have the identical CDF H(.) Then the ratio
of the choice probabilities of any two products j, k ∈ N is
Trang 33Since this equation holds true for any arbitrary V ij and V ik, the following
must be true for any x, y:
Trang 34case with the distribution function defined as the generalized exponentialdistribution:
F ij (x) = 1 −e −x G ik (e V i1 , , e V ij , , e V in ) for x ≥ ln(G ik (e V i1 , , e V ij , , e V in )).
(2.3)
Note that F ij (x) is a valid distribution function for ˜ ϵ ij under the assumptionslisted in McFadden [56] In this case, the Lagrange multiplier satisfies thecondition:
ik (e V i1 , , e V ik , , e V in ), choice probabilities under the MDM are same as the ones given by the classical GEV model.
To guarantee that the choice probabilities are consistent with utilitymaximization in the GEV model, assumptions on the signs of higher order
Trang 35cross partial derivatives of the function G( ·) need to be made (see McFadden
[56]) On the other hand, this assumption need not be imposed in (2.3)
thereby generalizing the class of functions G( ·) for which the formula (2.4)
is valid under MDM
As a third example of choice probabilities that can be replicated byMDM, consider the Marginal Moment Model (1.11) and (1.12) derived underthe assumption that the first two marginal moments of the error terms ˜ϵ ij areknown To show that MDM can be used to obtain these choice probabilities,consider the three-parameter form of t-distribution with density functiongiven by:
f ij (x) = Γ(
ν+1
2 )Γ(ν2)
(
λ πν
where µ is a location parameter, and λ is an inverse scale parameter, and ν
is the number of degrees of freedom Set µ = 0, λ = 2/σ ij2, and ν = 2, the
density function takes the form:
Trang 36Preceding observation leads us to the following result.
Theorem 4 Say customer i ∈ I has random utility given by (1.2) Under Marginal Distribution Model (MDM) when error terms ˜ ϵ ij , j ∈ N have the marginal distribution F ij(·) as given by (2.5), choice probabilities under the MDM are same as the ones given by the MMM with the variance of ˜ ϵ ij , j ∈ N equal to σ2ij and mean zero.
In this section, we provide results on the parameter estimation problem underthe MDM We first identify examples where the maximum log-likelihoodproblem under MDM is a convex optimization problem Next, we establish
a method to evaluate standard-error estimates of maximum log-likelihoodestimators using the asymptotic theory of maximum log-likelihood It isknown that the estimation of the standard error of optimal max-log-likelihoodestimators can be done by evaluating the information matrix, which needs
us to evaluate second partial derivatives of the log-likelihood function MNLhas analytical expressions for these derivatives, which come handy during the
Trang 37evaluation of standard errors For models such as multinomial probit andmixed-logit this is done using the method of simulated log-likelihood Wedescribe a method to find almost analytical expressions of partial derivatives
of the log-likelihood function in the case of MDM
2.2.1 A Convexity Result under MDM
Convexity of the maximum log-likelihood optimization problem (1.13) is ahighly desired property It implies, first, that the computational search ofthe maximum likelihood estimates (MLE) is easy, and second the globaloptimality of MLE is guaranteed Only a few results are known in thisregard McFadden [55] showed that the maximum log-likelihood problemunder MNL is a convex optimization problem For the Nested Logit Model,
Daganzo and Kusnic [21] show that the problem is convex in part-worth β parameters for a choice of scale parameters if the mean utility is linear in β.
Note that in nested logit, scale parameters are also estimated but the problem
is not jointly convex in part-worth parameters and scale parameters
The maximum log-likelihood problem under the MDM is:
Trang 38oth-utility ˜U ij The maximization is performed over the estimated parameters β and Lagrange multipliers λ = (λ1, , λ |I| ) Part-worth β are contained in
V ij, and other estimated parameters such as scale parameters etc may be
described in the form of distribution F ij In the following theorem we present
a convexity result for maximum likelihood problem under MDM
Theorem 5 Suppose the error components of the random utility have marginal
distributions F ij (y) = 1 − e −a ij y+ln(G ij (e Vi1 , ,e Vin))
, i ∈ I, j ∈ N , a ij > 0, and deterministic component V ij of utilities are linear in estimated parameters.
If the following two conditions are satisfied:
(a) ln(G ij (e V i1 (β) , , e V in (β) )) is concave in β,
(b) e (ln(G ij (e Vi1( β), ,e Vin( β)))+a ij (V ij (β) −λ i)) is convex in (β, λ i ),
then the maximum log-likelihood problem (2.6) under MDM is a convex timization problem.
op-Proof: If F ij (y) = 1 − e −a ij y+ln(G ij (e Vi1 , ,e Vin))
, then maximum log-likelihoodproblem (2.6) reduces to:
Trang 39We consider the following relaxation of (2.7):
For linear utilities of the form V ij (β) = β ′ x ij, (2.8) is clearly a convex
opti-mization problem in the decision variables (β, λ) if conditions (a) and (b) of
Theorem 5 are satisfied To see that it is equivalent to (2.7), we note that
both the objective function and the (unique) constraint involving λ i are both
decreasing in λ i So in the optimal solution, λ i is chosen to be the minimumvalue such that
∑
j ∈N
e ln(G ij (e Vi1( β), ,e Vin( β) ))−a ij (λ i −V ij (β))
= 1.
Therefore the formulations (2.7) and (2.8) are equivalent
When a ij is independent of j ∈ N and is constant (possibly different) for each i ∈ I , above result leads to a convexity result for the GEV model Clearly, under identical exponential marginal distributions F ij , j ∈ N , (2.6) is
well-behaved as we essentially get the MNL due to Theorem 2 The classical
logit model reduces to the simple case when G i (y1, , y n) = ∑
k y k So we
have G ij (y1 , , y n ) = 1 for all j In this case, the conditions in Theorem 5
hold and hence the estimation for the classical logit reduces to the followingconvex formulation:
Trang 40Theorem 6 The estimation problem (2.10) is a convex optimization problem
if V ij (β) are affine in β and θ l ∈ (0, 1].