4 displays the observed and predicted equilibrium cumulative probabilitydistributions of arrival time staying out decisions are treated as arrivals at time18:00.. Observed and predicted
Trang 1210 Experimental Business Research Vol II
the queue at times 6:40, 6:50, , 8:00 with probability 0.0117 and stay out atintermediate times 6:45, 6:55, , 8:05 This periodicity is due to a combination of
the discretization of the strategy space, fixed service time, and fixed opening (To T T )
and closing times (Te T)
3.3 Results
Observed Arrival Time Distributions: Aggregate Results Using several different
statistics, RSPS reported no significant differences among the four groups in dition 1 In particular, although the “sophisticated” subjects in Group 4 were paidtwice as much as the other subjects (and took about twice as much time to completethe session), their results did not differ from those of the other three groups There-fore, the results of all four groups were combined (4× 20 × 75 = 6000 observations).Fig 4 displays the observed and predicted (equilibrium) cumulative probabilitydistributions of arrival time (staying out decisions are treated as arrivals at time18:00) The statistical comparison of observed and predicted arrival time distribu-tions is problematic because of the dependencies between and within players Strictlyspeaking, the group is the unit of analysis, resulting in only four degrees of freedomfor the statistical comparison The one-sample two-tailed Kolmogorov-Smirnov(K-S) test (df = 4) could not reject the null hypothesis of no difference between
Con-Figure 4 Observed and predicted distribution of arrival time and staying out decisions in Condition 1.
Trang 2the observed and predicted distributions of arrival time Assuming independencebetween (but not within) subjects yielded df = 80 But even with this considerably
more conservative test, the same null hypothesis could not be rejected (p> 0.05) RSPSdetected three minor discrepancies between observed and predicted probabilities ofarrival time in all four groups (see Fig 4): 1) the observed proportion of arriving atexactly 8:00 was smaller (by 0.02) than predicted; 2) the observed proportion ofarriving between 8:01 and 9:03 was 0.031 compared to the theoretical value of zero;3) the proportion of staying out was smaller than predicted A more detailed analysisthat broke the 75 trials into three blocks of 25 trials each shows that the first twodiscrepancies decreased across blocks in the direction of equilibrium play
SPSR similarly reported no significant differences between the two groups inCondition 2G Of the four tests used in this comparison, two yielded statisticaldifferences between the two groups in Condition 2P Nevertheless, the results werealso combined across these two groups Using the same format as Fig 4, Fig 5exhibits the observed and predicted cumulative distributions of arrival time for Con-dition 2P (upper panel) and Condition 2G (lower panel) Similarly to Condition 1,the K-S test could not reject the null hypothesis of no difference between the
observed and predicted distributions of arrival time (D = 0.059 for Condition 2G,
Figure 5 Observed and predicted distribution of arrival time and staying out decisions in Condition 2.
Trang 3212 Experimental Business Research Vol II
and D = 0.069 for Condition 2P; n = 40 and p > 0.05 in each case) even under the
conservative assumption of independence between subjects Notwithstanding theseresults, Fig 5 shows two minor but systematic discrepancies between observed andpredicted distributions of arrival time: 1) the observed proportion of entry before7:35 was smaller than predicted; 2) approximately 4% of all the decisions were tostay out compared to 0% under equilibrium play A more detailed analysis thatbreaks the 75 trials into three blocks shows that the former discrepancy decreasedacross trials but the latter did not Analyses of individual data show that a fewsubjects stayed out on 6 or more (out of 75) trials either in an attempt to take time to
consider their future decisions or to increase their cumulative payoff (by g) after a
sequence of losses
Turning next to Condition 3, SPSR also reported no significant differencesbetween the two groups in Condition 3G and no significant differences between thetwo groups in Condition 3P The two groups in each of these two conditions wereseparately combined to compute the aggregate distributions of arrival times Fig 6portrays the observed and predicted cumulative distributions of arrival time forCondition 3P (upper panel) and 3G (lower panel) The K-S test once again could not
reject the null hypothesis of no differences between the two distributions (D= 0.061
Figure 6 Observed and predicted distribution of arrival time and staying out decisions in Condition 3.
Trang 4and D = 0.121 for Conditions 3G and 3P, respectively; n = 40 and p > 0.05 in each
case) Nevertheless, the upper panel shows that subjects in Condition 3P did not stayout as frequently as predicted A further analysis that focuses on the staying outdecisions shows that the percentage of staying out decisions in Condition 3G stead-ily increased from 30% in trials 1–25 through 35.5% in trials 26–50 to 40.5% intrials 51–75 Compare the latter percentage to the equilibrium percentage of 40.96%
In contrast, there was no evidence for learning across blocks of trials in Condition3P As the subjects in Condition 3P received no information on the number ofsubjects staying out on any given trial, they had no way of determining whether theirpayoff for the trial – which was typically negative – was due to a poor choice ofentry time or insufficient number of staying out decisions This was not the case inCondition 3G, where Group Outcome Information was provided Subjects in Condi-tion 3G, who often lost money on the early trials, used this information to slowlyrecover their losses by having more (but not necessarily the same) subjects stayingout on each trial In contrast, most of the subjects in Condition 3P entered the queuemore frequently than predicted and consequently almost never recovered their losses
Observed Arrival Time Distributions: Individual Results In contrast to the aggregate
distributions of arrival time that show remarkable consistency across groups andare accounted for quite well by the equilibrium solution, the individual distributions
of arrival time differ considerably from one another, show no support for strategy equilibrium play, and defy a simple classification One representative group– Group 1 of Condition 1 – was selected to illustrate the contrast between theconsistent patterns of arrival on the aggregate level and heterogeneous patterns ofarrival on the individual level Fig 7 exhibits the individual arrival times of all the
mixed-20 subjects in Group 1 of Condition 1 We have opted to display the arrival times bytrial rather than combine them into frequency distributions Thus, the horizontal axis
in each individual display counts the trial number from 1 through 75, and the verticalaxis shows the arrival time on a scale from 6:00 (bottom) to 18:00 (top) A shortvertical line that extends below the horizontal axis (i.e., below 0) indicates no entry
We observe that Subject 5 (first from left on row 2), after switching her entry time,entered at 8:00 on all trials after trial 25 In contrast, Subject 13 (first from left onrow 4) never entered the queue at 8:00 Subject 9 (first from left on row 3) stayedout on 10 of the 75 trials, whereas Subjects 1, 2, 5, 6, 7, 8, 11, 13, 14, 17, and 18never stayed out Most of the staying out decisions is due to Subjects 9 and 15
4 QUEUING LEARNING MODEL: DESCRIPTION AND
PARAMETER ESTIMATION
Alternative approaches have been proposed to account for learning in games (see,e.g., Camerer, 2003 for an excellent review) They include evolutionary dynamics,various forms of reinforcement learning (McAllister, 1991; Roth & Erev, 1995;Sarin & Vahid, 2001), belief learning (Cheung & Friedman 1997; Fudenberg &Levine, 1998), learning direction theory (Selten & Stocker, 1986), Bayesian learning
Trang 5214 Experimental Business Research Vol II
600 400 200 0 600 400 200 0 600 400 200 0 600 400 200 0 600 400 200 0
600 400 200 0 600 400 200 0 600 400 200 0 600 400 200 0 600 400 200 0
Figure 7 Individual decisions of all twenty subjects in Group 1 of Condition 1.
(Jordan, 1991), experience-weighted attraction (EWA) learning (Camerer & Ho,1999), and rule learning (Stahl, 1996) Without making additional assumptions,these models are not directly applicable to our data.1
We report below a simplelearning model, which was constructed to account for the individual and aggregate
patterns of our data reported above This is clearly an ad-hoc model that does not
have the generality of the approaches to learning mentioned above
Basic Assumptions The learning model uses a simple reinforcement learningmechanism to update arrival times based on historical play It is derived from twoprimitive assumptions:
• Decisions to enter the queue are based on previous payoffs: as the agent’s payoff
on trial t− 1 decreases, the agent is less likely to enter the queue
• Once an agent has decided to enter the queue on trial t, its entry time is based on
its entry times and payoffs on previous trials
Both of these assumptions are consistent with the experimental data Next, wedescribe a formal model that is derived from these assumptions
Trang 6Sketch of the Learning Model The intuition underlying our learning algorithm is
quite simple On each trial t, the agent makes a decision either to enter the queue or not If her payoff on trial t− 1 is high, then the agent enters with a higher probability
than if the payoff was low Put differently, the agents are more likely to stay out of t
the queue on a given trial if they did poorly on the previous trial The agent’s
decision regarding when to enter the queue (conditional on her decision to enter) is
based on her past decisions and the payoffs associated with those decisions If an
agent enters the queue at trial t− 1 and receives a good payoff, then she is likely to
enter around that time again on trial t; on the other hand, if the agent receives a poor
payoff for that entry time, then she is likely to change her entry time by quite a bit.Furthermore, if an increase (decrease) in arrival time consistently yields higherpayoffs, then the agent is going to consistently increase (decrease) her arrival time.Increases (decreases) in arrival time that lead to poorer payoffs will cause the agent
to decrease (increase) her arrival time These learning mechanisms are formallyspecified in the following section
Formal Specification of the Learning Model Denote the entry time and payoff of
agent i on trial t by t A t and πt , respectively If the queue is entered, then with
probability 1 − ε entry times on the next trial are based on the following motionεequations:
,+
individual subject results quite inconsistent with the individual subject experimentalresults.) The parameter βi
(0<βi < 1) denotes the agent’s learning rate, T Tminis theearliest time the agent can enter the queue,τi
is the agent’s payoff sensitivity, and
r is the payoff for completing service.
r
As for trial 1, by assumption A i
1is sampled from a uniform discrete probability
distribution defined on the interval [T T o − T T T , Tmin T e − d], δ i
(t = 1, 2) are sampledindependently and with equal probability from the set {−1, +1}, and πi
is sampled
Trang 7216 Experimental Business Research Vol II
with uniform probability from [0, r] This initialization is conducted independently for each agent i If the queue is not entered on trial t, then A i = A i
−1 Thus, queuearrival time updates are always based on the most recently updated arrival time;arrival times are not updated during periods in which the agent does not enterthe queue
Decisions to enter the queue are made probabilistically; specifically, in the absence
of group information (Conditions 1, 2P, and 3P), the probability of agent i entering the queue on trial t is given by t
p t= exp[λλλ (πi i
The parameter λλλ > 0 is the agent’s entry propensity Note that as λ i λλ approaches 0,i
the agent’s entry probability goes to 1; and asλλλ goes to infinity, the entry probabil-i
ity goes to 0 (when, of course, πi < r) The probability expressed in Eq 4 is
transformed in the Group Information Conditions (2G and 3G) as follows:
where ncap denotes the queue capacity In Conditions 2 and 3, ncap = 20 and ncap= 13,
respectively The actual number of agents entering the queue on trial t is denoted t
by nt According to Eq 5, entry probabilities are increased if the queue has too few
entrants on the previous trial and are decreased if it has too many The magnitude ofthe adjustment is determined by the parameter 0 < αi< 1
Model Parameter Estimation To test the model’s ability to capture the important
properties of the experimental data, we first found best fitting parameters for the
model using a grid search (brute force) algorithm Goodness of fit was estimated bycomparing the model’s arrival time distributions to those from the experimentalsubjects
Let CT denote the proportion of arrival times less than or equal to T T Model fit T
was measured as the root-mean-square deviation of the model arrival time tion from the subject’s arrival time distribution:
optimal fitting involves finding the vector V = (a, b, τ, λ, α) such that α RMSD is
minimized, where a and b are the parameters of the beta distribution B(a, b) from
Trang 8which theβi
are independently sampled for each simulated subject i In the results
reported here, for all simulated agents we assume that τi=τ for all τ i (i = 1, , N),
and likewise for λλλ and αi i
(A study of the model output suggested that allowing βi
to be a random variable, while making all other model parameters constant, wasnecessary to capture important properties of the experimental results Allowing forall of the parameters to be random variables simply introduces too many parameters(as the distribution of random variables must be parameterized, which, in the case
of, say, a beta distribution, introduces two distribution parameters for a singlemodel parameter) It is our contention that the model results support this approach.)Since the agents only receive private information in Conditions 1, 2P, and 3P,
α is constrained to equal 1 in these conditions We fixed
when we estimated all other parameters (the objective function was relatively flatwith respect to ε, making estimating ε using monte carlo methods very difficult).Thus, we must estimate four parameters in Conditions 1, 2P, and 3P; all five para-meters must be estimated in Conditions 2G and 3G For each experimental condi-
tion, C M
C M was
estimated for each V by aggregating the arrival times from 100 V
independent simulations of 75 trials of play of the queuing game with 20 agents.Since our objective function can only be estimated through simulation, one concern
is that we might obtain inconsistent estimates of V; however, multiple replications
of the grid search algorithm produced highly consistent results
5 TESTING THE LEARNING MODEL
5.1 Condition 1
Aggregate Arrival Time Distributions The cumulative arrival time distributions forthe experimental subjects and the simulated learning agents, as well as the equilib-rium cumulative arrival time distribution, are displayed in Fig 4 With the exception
of the aggregate arrival time at 8:00 (where the model under-predicts the probability
of arrival), the model results closely agree with those of the human subjects
Individual Arrival Times Fig 7 exhibits the individual arrival times of the 20
sub-jects in Group 1 of Condition 1 The decisions to stay out are represented by thedownward ticks on the horizontal axis Individual arrival time distributions for 20
simulated agents in Condition 1 are shown in Fig 8 Observe that both the human d
subjects and simulated agents display heterogeneous arrival time behavior Somesubjects switch their arrival times quite often and quite dramatically, while othersmake less frequent and less dramatic switches There is no simple way of tellingwhich figure displays the individual arrival times of the genuine subjects and which
of the simulated agents
Switching Behavior Fig 9 shows the mean switching probabilities and mean switch
magnitudes across trials for the human subjects on Condition 1 Here, a switch
obtains on trial t when the subject (or simulated agent) enters on both trials t t − 1
Trang 9600 400 200 0 600 400 200 0 600 400 200 0 600 400 200 0 600 400 200 0
600 400 200 0 600 400 200 0 600 400 200 0 600 400 200 0 600 400 200 0
Trang 10Figure 10 Switch probabilities and mean switch magnitudes across trials for four
simulated groups in Condition 1.
5.2 Conditions 2 and 3
Arrival Time Distributions Figs 5 and 6 display the cumulative arrival time butions for Conditions 2 and 3, respectively The distributions for the private out-come information (Conditions 2P and 3P) are displayed on the upper panels, andthose for the group outcome information (Conditions 2G and 3G) on the bottompanels The learning model results and the experimental data are in close agreement
distri-In fact, the learning model accounts better for the results of Conditions 2 and 3 thanCondition 1 The only notable discrepancy is in Condition 3P, where the modelentry probability is about 0.05 greater than that of the human subjects As the results
Trang 11220 Experimental Business Research Vol II
for individual arrival time distributions, mean probability of switching, and meanmagnitude of switching are similar to those in Condition 1, they are not exhibitedhere Again, we observe a higher probability of switching and smaller mean switchmagnitude in the simulated agents
6 DISCUSSION AND CONCLUSION
RSPS and SPSR have studied experimentally how delay-averse subjects, whopatronize the same service facility and choose their arrival times from a discrete set
of time intervals simultaneously, seek service Taking into account the actions ofothers, whose number is assumed to be commonly known, each self-interested sub-ject attempts to maximize her net utility by arriving with as few other subjects aspossible She is also given the option of staying out of the queue on any particulartrial Using a repeated game design and several variants of the queueing game, RSPSand subsequently SPSR reported consistent patterns of behavior (arrival times andstaying out decisions) that are accounted for successfully by the symmetric mixed-strategy equilibria for these variants, substantial individual differences in behavior,and learning trends across iterations of the stage game Our major purpose hasbeen to account for the major results of several different conditions by the samereinforcement-based learning model formulated at the individual level
Our “bottom-to-top” approach to explain the dynamics of this repeated tion calls for starting the analysis with a simple model that has as few parameters aspossible, modify it, if necessary, in light of the discrepancies between theoretical andobserved results, and then apply it to other sets of data The focus of the presentanalysis has been on the distributions of arrival time on both the aggregate andindividual levels Although our learning model has been tailored for a class ofqueueing games with endogenous arrivals, it has some generality as it is designed toaccount for the results in five different conditions (1, 2P, 3P, 2G, 3G) that vary fromone another on several dimensions
interac-The performance of the model is mixed It accounts quite well for the aggregatedistributions of arrival time in four of the five conditions (The main exception isthe aggregate arrival time at 8:00 in Condition 1.) For many learning models, this isthe major criterion for assessing the model performance The model also producesheterogeneous patterns of individual arrival times that are quite consistent with those
in which a decision on trial t only depends on past decisions and outcomes, could be t
Trang 12accounted for by increasing the complexity of the model Although we only focus ontesting a single learning model, our position is that in a final analysis the predictivepower, utility, and generalizability of a learning model could better be assessed bycomparing it to alternative models.
ACKNOWLEDGMENT
We gratefully acknowledge financial support by NSF Grant No SES-0135811 to
D A Seale and A Rapoport and by a contract F49620-03-1-0377 from the AFOSR/MURI to the Department of Industrial Engineering and the Department of Manage-ment and Policy at the University of Arizona
NOTE
1 We verified this for a Roth-Erev-type reinforcement-based learning model With our implementation,
we have been unable to reproduce most of the regularities we observe in the experimental data.
Cheung, Y-W., and Friedman, D (1997) “Individual learning in normal form games: Some laboratory
results.” Games and Economic Behavior, 25, 34–78.
Fudenberg, D and Levine, D (1998) The Theory of Learning in Games Cambridge: Mass: MIT Press Hassin, R and Haviv, M (2003) To Queue or Not to Queue: Equilibrium Behavior in Queueing Systems.
Boston: Kluwer Academic Press.
Jordan, J S (1991) “Bayesian learning in normal form games.” Games and Economic Behavior, 3,
Rapoport, A., Stein, W E., Parco, J E., and Seale, D A (in press) “Strategic play in single-server queues
with endogenously determined arrival times.” Journal of Economic Behavior and Organization.
Roth, A E and Erev, I (1995) “Learning in extensive-form games: Experimental data and simple
dynamic models in the intermediate term.” Games and Economic Behavior, 8, 164–212.
Sarin, R and Vahid, F (2001) “Predicting how people play games: A simple dynamic model of choice.”
Games and Economic Behavior, 34, 104–122.
Seale, D A., Parco, J E., Stein, W E., and Rapoport, A (2003) Joining a queue or staying out: Effects
of information structure and service time on arrival and exit decisions Department of Management and Policy, University of Arizona, unpublished manuscript.
Selten, R and Stocker, R (1986) “End behavior in sequences of finite Prisoner Dilemma’s supergames:
A learning theory approach.” Journal of Economic Behavior and Organization, 7, 47–70.
Stahl, D O (1996) “Boundedly rational rule learning in a guessing game.” Games and Economic Behavior, 16, 303–330.