Luận án tiến sĩ: Negotiating contracts in multiagent societies

Sabyasachi Saha Doctor of Philosophy in Computer Science Negotiating Contracts in Multiagent Societies Directed by Professor Sandip Sen conflict-we focus on the following major aspects o

MULTI-ISSUE NEGOTIATION PROTOCOLS FOR

Existing Negotiation Protocols

In this section, we present existing negotiation protocols for allocation of indivisible resources among agents:

Strict alteration: In this protocol, agents take alternate turns and in each turn an agent selects one resource from the set of resources not yet allocated After an agent selects a resource, the resource will be removed from the set [9] The advantage of this protocol is its simplicity and the time required to reach an agreement The agreements reached, however, are often very inefficient.

Balanced alteration: This protocol is used to improve fairness Here it is assumed that an agent who first chooses a resource has an advantage over the agent who chooses second Hence, the second agent will have the opportunity to choose third, and so on [9] Therefore, one agent gets to choose in turns 1,4,5,8, , and the other agent in turns 2,3,6,7, This protocol has similar advantages and disadvantages as the strict alternate protocol.

Exchange auctions: This protocol is an exchange-based extension of the Contract- Net protocol [35] This protocol assumes an initial allocation of the resources to the agents The protocol allows the agents to exchange resources to improve the allocation An agent announces one or more resources for exchanges and other agents bid one or more resources that they are ready to exchange But this protocol does not guarantee Pareto optimality unless repeated infinitely.

Protocol to Reach Optimal Agreement in Negotiation over

In this section we present our proposed protocol: Protocol to reach Optimal agreement in Negotiation Over Multiple Indivisible Resources (PONOMIR) We first define some concepts required to describe our protocol.

Negotiation tree: We assume that the issues or resources are ordered in some way, e.g., lexicographically Here, we conceptualize the allocations of the resources as a tree, known as negotiation tree: For a negotiation over H resources, the maximum possible depth of the negotiation tree is H, the root being at level 0. The root node represents a null allocation to the agents and then each succes- sive level represents allocation of the next resource in the order chosen For a bilateral negotiation, the negotiation tree is a binary tree® The left and right branches at the /'° level imply that the 1’ resource will be allocated to agent 1 and 2 respectively Each leaf node at level H represents one possible allocation of the resources and the path to reach that leaf node from the root specifies the3]f n agents are negotiating, the tree will be n-ary. allocation of all the resources Such a negotiation tree is shown in Figure 4.1 for the negotiation scenario presented in Table 4.1 A negotiation tree is created by the negotiating agents in a distributed manner It starts with a root with id = 0° The tree is then created in a top-down process, where at any level agent 1 can only create the right child of a node in the previous level of the tree Similarly, agent 2 can only create the left child nodes Each agent, however, may choose not to create any node, which it can, at any level If an agent does not create a node, that node will be marked as black node and no further sub-tree will be created from that black node Therefore, a black node prunes that node and its descendants The distributed formation of a negotiation tree can be implemented using message-passing At any level, each agent knows the nodes created at the previous level and the nodes that either of the agents can create at this level Hence both agents can send a message to the other agent stating the nodes it has created at this level before proceeding to the next level.

Best possible agreement (BPA): At each node of the negotiation tree, each agent has a best possible agreement which is the allocation where the resources until the current level are allocated according to the path from the tree root to this node and the remaining resources are allocated to this agent If at any node, an agent finds that the BPA utility is less than the utility it can receive otherwise, there will not be any allocation under the subtree starting from that node which will be acceptable to that agent, if it is individually rational.

Now we present our three-phase protocol, PONOMIR The first phase consists of a primary allocation procedure using anyone of strict alteration or balanced alteration protocol Here, we have used strict alteration protocol However, any other existing protocols can be used during the first phase The second phase consists of distributed formation of the negotiation tree by the negotiating agents This phase discards all the agreements where at least one agent’s utility is strictly less in that 4The node id of any other nodes in the tree will be 2 x id(parent) + 1 if it is a left node and

2 x id(parent) + 2 if it is a right node. allocation and therefore, the other agent have no incentive to accept the allocation. After the second phase, the agents will be left with few probable agreements In the third phase, agents reach the final Pareto optimal solution by exchanging offers.

Step 1: A random device chooses one of the two agents to start: mark this agent agent as S Denote the set of resources yet to be negotiated by G Initially, G=R.

Step 2: Now, S will choose one of the remaining resources, say C € G C is allocated to S.

Step 3: Mark the other agent as Š and G is changed to G — {C} If |G] > 1 return to Step 2, Otherwise stop.

After the first phase, there is an initial allocation, L, of the resources as suggested by the strict alteration protocol For this allocation L, agents have corresponding utilities U#*,¡ = 1,2 If no mutual improvement is possible in the subsequent phases, agents will have to agree on this allocation® This phase ensures that the agreement reached will be at least as good as L.

Second phase of PONOMIR: This phase involves the distributed generation of the negotiation tree by the negotiating agents.

Step 1: Let / denotes the level of the negotiation tree Set 1 = 0 The root node is created with id(root) = 0.

Step 2: Agents 1 and 2, respectively, can create the right and left child nodes of each node in the level, 7, of the tree and send a message to the other agent. ®We assume that both agents prefer to agree on this allocation over disagreement.

Step 3 Increase / by one If] < H and no node is created at level ỉ, the negotiation terminates and the final allocation will be L If! < A and there is at least one node in this level of the tree, go to Step 2 If 1 = H, stop the second phase, collect all the allocations corresponding to the nodes at level H and proceed to the third phase We refer to this set of leaf nodes (or allocations) as Q.

After this phase, the agents will be left with a small number of probable agreements in Q The final agreement will be chosen in the third phase Note that, at each level agent 1 can create only right child node and agent 2 can create only left child node.

A right child implies that the resource representing that level is allocated to agent 2.

Since we assume a monotonic scenario, BPA of agent 2 will not decrease for allocating a resource to agent 2, but BPA of agent 1 may decrease So, it is sufficient if agent 1 only checks whether if it is interested to extend this subtree Similarly agent 2 will check for the left child At any level, an agent may choose not to create a child node, which it can, if it feels that it is not interested in the agreements under the subtree starting at that node If a node is not created at any level by the corresponding agent, no further subtree will be generated from it This implies that all the allocations under that subtree are discarded If the utility of the BPA at that node is less than its utility under the initial allocation, L, it is sure that any allocation under the subtree starting at that tree node will always produce utility less than its utility under initial allocation LZ An individually rational agent has no incentive to create that node If agents are individually rational, for each allocation in Q the utilities of both agents are at least as much as their corresponding utilities in L, because all the allocations which produce less utility to any of the agents will be discarded by that agent.

Third phase of PONOMIR: In this phase, agents will select the final agreement from allocations in the set @ Let us define a set F, as the set of final agreements. Initially, it contains only the initial allocation L, i.e, F = {L}.

Step 1: One agent is chosen randomly to start Mark this agent as S and the other agent as S’ Now, 5 needs to propose an allocation g from Q.

Step 2: S’ can remove any other allocation Ó from the set Q and F if U9 < U%, Update F to FU {q}.

Step 3: If Q is not empty, then swap the agents S and 6” and go to Step 2 Otherwise, the set F contains the set of final agreements If only one element remains in

F, it will be selected as the final agreement Otherwise, any one of them will be chosen randomly as the final agreement.

Properties of the PONOMIR Protocol

to be Pareto optimal and to increase fairness as much as possible As a measure of fairness, we use egalitarian social welfare.

Our proposed negotiation protocol, PONOMIR, is not strategy-proof and it does not guarantee to reach any Pareto optimal agreement if the agents are arbitrar- ily risk seeking However, we consider a completely uncertain negotiation scenario, where participating agents do not have any idea about the preferences of the other agents and no agent want to reveal their utilities for different allocations Under such complete uncertainty, the rational behavior of the agents depends on their risk atti- tudes Bossert introduced and argued for uncertainty aversion for decision making under such complete uncertain scenarios [7] In this negotiation scenario, we assume that the rational agents will be cooperative-individually rational We define an agent as cooperative-individually rational if it follows two properties: i) it does not take any risky action that can lead to an agreement which produces less utility than what it is already assured of, and, ii) if there exists two agreements which produces same utility to it but different utility to the opponent, then it will agree to accept any of the agreement proposed by the opponent PONOMIR guarantees to reach Pareto optimal agreement if the participating agents are cooperative-individually rational.The agreements reached also guarantee to produce at least as much egalitarian social welfare as the agreements reached by the other protocols Since the agents are individually rational, both of them will discard all the agreements that produce utilities less than the utility produced for the initial allocation, Z, obtained in the first phase and hence the egalitarian social welfare will never decrease.

Proposition 4.1: The agreement reached by cooperative-individually rational agents using the PONOMIR protocol is Pareto optimal.

To prove this proposition, we first prove the following lemmas:

Lemma 4.1: An allocation O will not be discarded in the second phase, only UP >

UP, Vi = 1,2, where, L is the initial allocation after first phase.

Proof: There are two parts of this proof In the first part we will prove that any allocation O’, for which there is at least one agent 7 such that UO’ < UJ, will be discarded in the second phase In the second part we will show that the remaining allocations will not be discarded Since, UO < UP, at least one resource is not allocated to agent ¿ and agent z will decide whether to create the tree nodes at those levels allocating the resources to the other agent Therefore, for an allocation like O’, there will be at least one level representing a resource not allocated to 7, when agent 7 will understand that if it creates the tree node, which implies the resource representing that level is being allocated to the opponent, its utility of the BPA from that node is less than its utility from the initial allocation Therefore, the agent has no incentive to create that node and hence the allocation will be discarded The proof of the next part of the lemma is clear from the properties of the agents Since, they are cooperative- individually rational, they do not want to discard any possibility which can lead to an allocation which produces utility at least as much as the utility produced by the initial allocation.

Lemma 4.2: While proposing an allocation in the third phase, an agent always propose the allocation that produces highest utility among the set of remaining allocations Q.

Proof: In a complete uncertain environment, the likelihood of the opponent accepting any allocation is not known a priori Hence, an allocation of higher utility to an agent should be proposed before an allocation with a lower utility There- fore, a cooperative-individually rational agent proposes the allocation with the highest utility from the set of remaining allocations Q.

Lemma 4.3: After an agent proposes an allocation in the third phase, the other agent will remove all allocations, which produce less utility to it, from the set Q and +.

Proof: An individually rational agent will remove all allocations that produce less than utility to it compared to the offer proposed by the other agent as it is guaranteed to get the allocation proposed by the other agent If it does not remove any one of those it may end up with one of them which will not be individually rational It will also remove the allocations which produce equal utility to it According to the previous lemma a rational opponent proposes the offer which produces highest utility to the opponent This implies that an allocation which is not offered by the opponent produces less than or equal utility to the opponent Therefore even if it produces equal utility to this agent, it is either equivalent or Pareto dominated by the allocation proposed by the opponent So, the agent should remove this from the set.

Also, the agent will not remove other allocations with higher utility as this may eliminate possible agreements which have better utility for it.

Proof of Proposition 4.1: From Lemmas 4.1, 4.2 and 4.3, it is evident that the allocation in the final set F is not Pareto dominated by any of the other possible allocations Therefore, any one of the allocations is Pareto optimal.

Experiments 0.0 00 cee ee es 75

In this section, we present the experimental results to show the effectiveness of PONOMIR to reduce search requirement for the negotiating agents Before that we present an illustration of our proposed protocol.

4.5.1 An Mlustration of the PONOMIR Protocol

In this subsection, we demonstrate the execution of PONOMIR on the example presented in Table 4.1 In the first phase, agents take alternate turns to choose one resource Let us assume that an agent, at first, will choose the resource which produces highest utility to it among the resources which is not yet allocated Then onwards, it will choose the resource, from the set of resources not yet allocated which produce highest utility together with other resources already allocated to it Agents with this strategy produce an initial allocation, L, where L = ({B, D},{A,C}), ¿.e., resources

B and D are allocated to agent 1 and resources A and C' are allocated to agent 2. The corresponding utilities to the agents are 11 and 13 respectively.

Figure 4.1 shows the formation of the negotiation tree in the second phase. For example, consider the formation at level 1 While verifying the left child of the root node, which implies that the resource A is allocated to agent 1 (as shown in the corresponding edge of the graph), agent 2 observes that if resource A is allocated to agent 1, the best possible agreement for agent 2 will be ({A},{B, C, D}) where all other resources will be allocated to it and the corresponding utility is 12, less than U#, the utility to it of the initial allocation, L So, it decides not to create the left child and therefore it is marked as a black node which is pruned and no subtree will be generated from it Hence, all allocations under this subtree are discarded from the set of possible agreements Note that in the Figure 4.1, only one utility is given at each node which is the utility of the BPA of the agent who is deciding whether to create that node The corresponding utility for the other agent is blank as it is not known to the deciding agent At the end of the second phase only two allocations are chosen for the third phase: ({B, D}, {A, C}) and ({C, D}, {A, B}) Observe that

Figure 4.1: Negotlation tree formed for example in Table 4.1. both agents need to find the utility for only 6 nodes, instead of 24 = 16 nodes In the third phase one agent is chosen to propose one allocation The chosen agent proposes the second allocation, Oo, as that produces highest utility to it and the other agent will remove ; from the final set of chosen agreements, # So, the final agreement will be ({C, D}, {A, B}), which is Pareto optimal and with an egalitarian social welfare 12, which is the highest possible egalitarian social welfare in the entire space of allocations.

We tested the PONOMIR protocol on a large number of scenarios varying agents’ preferences and number of issues We show the average reduction in the search effort required by each agent We consider monotonic utility scenarios, which implies that for any agent the utility of a bundle of n resources is more than any of the n possible subsets of (n—1) resources We vary the number of resources n For each n, we generate 10,000 random examples of agent preferences and execute the negotiations using the PONOMIR protocol We observed that they always reach Pareto optimal agreements and significant percentage of these agreements produce maximum possible egalitarian social welfare For the agreements where the fairness are not maximum, they are close to this maximum value But, this result depends on the strategies used by the agents in the first phase of PONOMIR It is, however, guaranteed that the agreement reached will be Pareto optimal We have also calculated the average percentage of allocations that each agent searched during each negotiation process compared to all allocations that an agent needs to search to find a Pareto optimal outcome using protocols like one-step monotonic concession protocol under complete information Table 4.2 shows significant reduction in average search by each individual agents using PONOMIR.

# of resources | % of allocations searched by each agent

Table 4.2: Reduction in search by agents

LEARNING OPPONENT DECISION MODEL IN

Buyer Agent Behaviors 0 000008 82

In this section, we describe four buyer agent strategies for bargaining with the seller Each buyer agent knows a lower limit, Jow, and a higher limit, high, such thatTAny range can be mapped into this range by translation and scaling transformation. the probability that the seller will accept any offer greater than or equal to high is 1 and any offer less than or equal to low is 0, i.e., offer greater than or equal to high will always be accepted and offer less than or equal to low will always be rejected.

Chebychev buyer (CB): This strategy initially explores by playing random offers in the interval [Ío, high] After exploring for some interactions it approximate the opponent decision function by Chebychev polynomials whose coefficients have been computed based on the past observations We discuss this computa- tion in Appendix B The buyer then makes the offer that maximizes its expected utility We define the utility of an offer as as high minus the offer.

Risk averse buyer (RA): This strategy has an aversion to rejection of an offer. Though an agent with this strategy starts offering with low, if the seller rejects its offer it increases its offer by incrin the next interaction We can interpret this strategy as a safe strategy which tries to maximize the probability of acceptance.

Risk seeking buyer (RS): In this strategy, agent starts bargaining with an offer of low and increases offer by incr in the next interaction if its current offer is rejected consecutively for the last & times But if the seller accepts its offer even once, it will never increase its offer subsequently Here, we take k = 5.

Step buyer (SB): In this strategy, the buyer agent models the seller’s probabilistic decision function by relative frequencies It partitions the interval [low, high] in T equal intervals, (z;¿_¡, z;¿|, where i = 1, , 7 and initially makes random offers in the interval [low, high] for exploration, similar to the Chebychev buyer strategy Then it approximates the opponent’s decision function as the proportion of success in each interval, i.e., if there are 0; number of offers in the i*” interval (z;_Ă, ứ;] and the seller accepted s¿ of those offers, then the probability of acceptance for that interval is estimated to be p; = ai, The Step buyer then offers Lop where opt = argmaz;((high — x;) * pi).

Chebychev Polynomials 0.4 84

Chebychev polynomials are a family of orthogonal polynomials [33] Any function f(z) may be approximated by a weighted sum of these polynomial functions with an appropriate selection of the coefficients. a œ ƒ(#) = + ô T;(z) where

Working with an infinite series is not feasible in practice We can, however, truncate the above series and still obtain an approximation, ƒ(z), of the function [21].

The Chebychev polynomial converges faster than the Taylor series for the same function For a rapidly converging series, the error due to truncation, i,e., using only the first n terms of the series, is approximately given by the first term of the remainder, anT, (x) We have chosen Chebychev polynomials for function approximation because truncation points can be chosen to provide approximate error bounds.

Function approximation in mathematics is obtained using orthogonal polynomials This requires the knowledge of the value of the function at certain input. values In our case, however, instead of exact values of the probability function, we observed only True or False decisions based on a sampling from the probability function at different input values This, then, constitutes a novel application of Chebychev polynomials.

A viable mechanism for model development based on Chebychev polynomials would require the development of an algorithm for calculating the polynomial coeff- cients We would also need to prove that this algorithmic updates would result in a convergence of the learned function to the actual probability function underlying the

Figure 5.1: Approximating ƒ(z) =1— ites) Number of Chebychev Polynomials used is 5. sampled data Such an algorithm and the associated convergence theorem is discussed in the Appendix B.

Modeling Sample Probability Functions

In this section, we evaluate the performance of the algorithm presented in Appendix B when limited amount of decision samples are available As a sample target function to test the approximation algorithm, we used a sigmoidal function: f(z) =1- apie: The actual function and the polynomial approximation obtained for this functions obtained with different number of samples are presented inFigure 5.1 We focus on the effect of the number of decision samples available on the accuracy of the model developed We vary the sample size, k = 70, 150, and 400.Sample x values were first randomly selected from a set of points in the range [-1,1].The underlying probability function was used to calculate True or False decisions for these generated points Corresponding input-decision pairs were then sequentially fed into the algorithm It is clear from Figure 5.1 that increasing the number of samples improves the approximation Though 70 samples produce only a rough approximation of the target function , a fairly close approximations obtained with about 150 samples and with 400 samples it almost coincide with the target function.

Figure 5.2: Approximating f(x) = con Number of Chebychev Polynomials used is 5.

Now we try to approximate a different function given by f(z) = (2)°" This function is a representative decision function which has probability 0 at the lowest offer and 1 at the highest offer In Figure 5.2, we can see the approximation by Chebychev’s polynomial with different number sample size For the sample size 400, the approximation is very close to the original function We also test this algorithm for different types of functions and observe that for reasonable sample size, it closely approximates all the functions.

Another factor that plays an important role to approximate the function is the degree of polynomial In Figure 5.3, we have shown the approximation of the function f(z) = 1— Trang: using 400 samples varying the number of Chebychev polynomials used We have seen that even with the same sample size, approximation with 2 Chebychev’s polynomials is much worse than that of with 5 Chebychev’s polynomials The reason is that even with reasonable sample size, it is not possible to capture the shape of the curve by using 2 Chebychev’s polynomials Using higher Chebychev’s polynomial makes possible But here is an interesting observation that, if the number of Chebychev’s polynomials used is increased to 10 the approximation is worse compared to that of with 5 Chebychev’s polynomials The reason is it requires

Figure 5.3: Approximating f(z) = 1— epee) using different number of Chebychev

Polynomials Number of sample size is 400. higher number of sample to approximate the target function closely In Figure 5.4,

1000 samples gives a very close approximation of the function using 10 Chebychev’s polynomial but with 400 samples it cannot approximate the function closely So, there is a trade off between the number of polynomials used and number of samples available We have observed that for all the functions, using 5 Chebychev’s polynomial a fair approximation can be achieved using 300 to 400 samples.

Discussion 2 và kg và 95

In this chapter, we discuss the problem of designing efficient protocols for negotiation over multiple continuously divisible issues As we have discussed in Chapter 1.1, research in multi-issue negotiation received less attention of research community compared to single-issue negotiation until recent years Recent growth of systems using automated negotiation for conflict resolution in several application _ domains triggers the necessity for developing efficient mechanisms for multi-issue negotiation Designing a framework for multi-issue negotiation is an inherently more complex problem than that of single-issue negotiation Single-issue negotiation corresponds to a competitive situation where a gain for one participant implies a loss for the other In multi-issue negotiations, however, there can be win-win situations and a desirable property of an efficient framework is to guide rational agents to such outcomes But when agents’ preferences are not common knowledge, using existing protocols, self-interested agents often fail to explore win-win possibilities and end up with inefficient agreements Some existing multiagent negotiation frameworks require agents to reveal their complete preferences to a trusted mediator [31] But in real life situations, such a third party, trusted by both the agents, may not be found.

We present protocols for bilateral multi-issue negotiation that leads rational agents to optimal or near-optimal agreements without any intervention of trusted third parties

We use the concept of envy-freeness, Pareto optimality, and social welfare to describe efficiency and optimality of the negotiated solutions [8] An outcome is Pareto-efficient if there exists no other outcome which is at least as good as this outcome for both the agents and is strictly better for at least one agent For a Pareto-

ADAPTIVE NEGOTIATION IN AGENT SOCIETIES 97

Negotiation in the Marketplace ơ 100

We consider a marketplace with N agents These agents repeatedly negotiate shares of resources with other agents All negotiations are bilateral, i.e., each negotiation instance involves two agents negotiating over a unit resource In every time period, each agent is involved in bilateral negotiations with each of the remaining

N —1 agents 5o, at each time period 6U resources are negotiated in the market. Such agent interactions continue for a negotiation period consisting of 7` time periods. The value of 7' is not known to the agents Later, we use two variations in the market structure where an agent can interact with at most M agents, where (M < N — 1), in each time period In one variation, agents can choose their negotiation partners.

In this scenario, before each negotiation both agents must agree to participate In the other variation, each agent is randomly matched with M other agents [71] and an agent cannot refuse to participate in any negotiation for which it is selected as a partner.

At any time period t, the valuation of an agent ¿ for a resource 7 is denoted by d;, The utility obtained by agent 7 from the negotiation for resource j, Uj;, is defined as:

Uig = diz * Yij (6.1) where yj, € [0,1] is the share of resource j obtained by agent i after negotiation in the ” time period Another interpretation of valuation dj, is that it is the utility if agent 2 receives the entire unit of resource 7 We assume that di; €{1, , HN}, for all i,j We also assume that the resource valuations are independent ¿.e., for any agent the valuation of any resource 7 does not depend on the valuation or possession of any other resource 7’

Negotiation ProtoCỌ Q.22 101

In this negotiation protocol, agents communicate before each bilateral negotiation to simultaneously reveal their valuation for the resource being negotiated. Thereafter, to maximize social welfare, the entire resource is allocated to the agent with higher valuation If the revealed valuations are equal, the agents share the resource equally According to this protocol, if two agents 7 and 7’ are negotiating over an unit of resource j at time ý, and d* and đị, are the corresponding stated valuations of agent 7 and 7’ respectively, agent 7 receives a share of yj, given in equation 6.2 and CHẾ =l— PHẾ where

Therefore, if all agents truthfully reveal their valuations, this negotiation should lead to maximum social welfare which implies optimal system performance Equally im- portantly, this will improve the profit of each individual agent if its valuation is more than its negotiation partner a sufficient number of times This is because an agent will receive more share of the resources which has higher valuation to it by giving away shares of resources of lower valuation.

Agents have incentive to report high valuations and receive higher profits. When one agent always reports highest valuation everytime, a rational agent does not have any incentive to report anything other than the highest valuation If all agents always report highest valuation H, for each resource each of the resources will be equally split, which is very inefficient We therefore propose a probabilistic reciprocity based agent strategy that can utilize cooperation possibilities in the environment by sharing truthful information with other helpful agents and at the same time can avoid exploitation by exploitative agents who always wants to receive share from the other agents stating false high valuations.

| Agent strategy in the above protocol corresponds to selecting the resource valuation an agent will reveal in a negotiation If all agents use the strategy of always revealing the true valuation, social welfare will be maximized If a particular negotiation is considered in isolation, however, there is an incentive to report an inflated valuation In fact, revealing the highest valuation H is the dominant strategy as it guarantees at least 50% of the resource and can, if the other agents are reporting any lesser valuation, procure the entire resource But if all agents report the highest valuation H, all resources will be shared equally and neither social welfare, nor individual utility will be optimized To counter such myopic exploitative behavior we introduce a reciprocative scheme where the likelihood of revealing true valuation to another agent increases with the cumulative utility of resource shares it has received over and above the utility it would have received if resources were always split equally when negotiating with that agent in the past Such a reciprocative strategy serves two purpose: (a) if both agents are using this strategy and preferences are typically asymmetric, with roughly likelihood that one or the other agent values the negotiated resource more, then the agents will always reveal their true valuation, thereby both optimizing social welfare and significantly improving individual utilities over equal- split outcomes, and (b) exploitative strategies which always report high valuations will be discouraged as reciprocative agents will also adapt to report high valuations resulting in unattractive equal-split solutions.

In the following, we present more details the above-mentioned reciprocative strategy, a strategy that always reveal true valuations and different exploitative strategies}.

Naive-social agents (NS): Agents who, always reveal their true valuation This is similar to the pro-social agents discussed earlier who always wants to maximize social welfare.

Selfish agents (S): These agents always want to get help from the other agents but never wants to relinquish any share for other agents They always report the highest valuation, H, both to garner help and to avoid giving help Selfish agents can benefit in the presence of naive-social agents by exploiting their benevolence.

Reciprocative agents (R): Agents that uses probabilistic reciprocity to decide the valuation it will report If it decides not to help, it will state the highest valuation H, and otherwise state the true valuation and help if a cooperation possibility is found When a reciprocative agent ¿ is negotiating with another agent 7, the decision mechanism of agent 7 depends on its previous interactions with agent j If in a negotiation at time ¢, ¿ relinquishes 2}, share? of resource k when 7s stated valuation is dj, and j’s stated valuation is đi, (di, > dj,), then hi, = tj, * dủy is the amount of help agent i has done to agent j at interaction ý The total help offered by agent 2 in all previous interactions with agent 7 is lỊn this context, when an agent relinquishes its share to another agent with higher valuation, we call it a help given by the first agent to the second one.

*According to this protocol, if an agent 7 has stated a lower valuation than the other agent J,agent j gets the entire share of the resource Agent ¿ relinquishes its half of the entire share So,)jy = 0.5. defined as Ci; = )},hị, Similarly, the total help received by agent i from all the previous interactions with agent 7, denoted by Cj, is defined as Cj = Cy.

The difference (Cs — Cj;) is known as the balance, ;;, of agent ¡ from agent j.

The probability that agent 7 will state the true valuation to explore cooperation possibility is given by

Pi,j) = 1, if By > G or Cy =0 (6.3)

In Equation 6.3, y is a constant greater than one and @; and a; are private parameters of agent 7 that can respectively be interpreted as upper and lower threshold of the agent’s trust If its balance with the opponent agent is more than (;, it considers that the other agent is completely trustworthy Similarly, if the balance is less than a; then it considers the other agent as exploitative.

Valuation Offset Selfish agents (VO): These exploitative agents do not always report the highest valuation H, but trying to exploit other agents by reporting a higher than actual valuation by adding a constant offset amount to its true valuation The goal of such an agent is to create cooperation possibilities most of the time in its favor by inflating its valuation But unlike selfish agents, they can be an impression of a reciprocative agent as they provide some help.

So, the other agents who have no knowledge of the true preferences of the VO agents, may consider them to be reciprocative agents and reveal truthfully in some interactions.

Differential Selfish agents (DS): This agent is similar to the VO agents But, a DS agent, 2, when negotiating with another agent j, will always state its valuation to be H, if the number of help it has received from agent 7 is less than & times the number of helps it has provided to agent j These agents are more exploitative in nature and try to exploit each agent by receiving at least k times the number help it has offered.

Experimental Results .Ặ

In this section, we present the results of our simulations that evaluate the performance of agent groups under different environmental conditions The performance of an agent in a group depends on the distribution of different agent behaviors in that group, number of interactions, frequency of cooperation possibilities, etc We want to analyze the group-effect on agents’ performance and identify dominant agent behaviors under different group configurations and other environmental settings We refer to the reciprocative agents as R, naive social agents as NS, selfish agents as S, valuation offset selfish agents as VO, and differential selfish agents as DS.

In these experiments, all negotiations are bilateral and each negotiation involves one divisible unit of resource In each negotiation, participating agents’ valuation of the resource are two numbers from the set {1, ,H} Here we take H = 5. The valuations for two negotiating agents are generated before each negotiation using different probability distributions Let P(v) be the probability that the resource being negotiated has a valuation v to an agent, ?.e., >> P(v) = 1 We assume that this probability distribution depends on the marketplace and does not change during interactions over time periods We vary the probability distributions for different experiments to verify the effectiveness of different negotiation behaviors The utility obtained by an agent ¡ from a negotiation for a resource 7 is as defined in Equa- tion 6.1 The total payoff to an agent is the sum of all the utilities it receives from all its negotiations At the start of a run, each reciprocative (R) agent 2 chooses its trust bias @; from a Gaussian distribution with mean — and standard deviation 0.5 and a; is taken as —3 + H For the DS agents, we have used k = 2.

We first consider the marketplace where each agent involves in a bilateral negotiation with each of the other agents at each time period for one resource Note that in this setting, an agent negotiates different resources in negotiations with different agents Therefore, at any time period, each agent participates in parallel and independent bilateral negotiations with each of the remaining agents in the society Unless mentioned otherwise, we discuss experiments with 90 agents interacting with each others over 200 time periods In the first experiment, we have experimented with a mixed group of 45 reciprocative (R) and 45 selfish (S) agents and P(v) = 0.2, Vv = 1, ,5.

We vary the length of the negotiation period, T, and observe its effect on the agents’ performance From the results of this experiment (see Figure 6.1), we find that when the value of T is less than 35 the performance of the selfish (5) agents is better than the performance of the reciprocative (R) agents, but when T is more than 38, R agents completely outperform the S agents When the value of T is between 35 and

38, the difference in performance is not statistically significant An R agent initially tries to cooperate in the negotiations by revealing truthfully and giving up its own share when it perceives any existence of cooperation possibilities In this period it is exploited by the Š agents before it can recognize and stop helping S agents and it cannot recoup these losses if the length of the negotiation period is small But when agents are negotiating for a longer period, R agents identify the Š agents and stop helping them and therefore, the initial losses of the R agents can be compensated with the gains obtained from the mutually beneficial negotiations with other R agents.

In the second experiment, we use the following probability distribution for generating resource valuations before negotiations: P(1) = 0.8, and P(v) = 0.05, Vu 2, ,5 So, in most cases, both agents have the lowest valuation for the resource and there are few cooperation possibilities We vary the initial population configuration by changing the proportion of N.S agents in the population The remaining agents are of types S and R, present in equal numbers From the results in Figure 6.2, we find that when the proportion of NS agents is less, reciprocative agents perform better compared to S and NS agents But as the proportion of NS agents increases to 0.3,

S agents outperform other agents When the number of NS agents was low, the R agents perform better as they recognize the non-cooperative attitude of the S agents and stop helping them At the same time, they form cooperative relationships with

Figure 6.1: Average payoff earned by reciprocatives and selfish agents for varying length of negotiation periods. other R and NS agents and utilize the cooperation possibilities existing in repeated negotiation situations But with a large enough number of NS agents, the Š agents continuously exploit their naive, cooperative attitude to receive entire unit of share in most of the corresponding negotiations R agents, however, do not exploit NS agents like S agents Therefore, in the presence of significant number of NS agents, the selfish agents outperform other agents.

In the next experiment, we consider different probability distributions for generating agent valuations that correspond to different extent of cooperation possibilities present in the environment The agent population is similar to the last experiment We vary the proportion of NS agents with the rest of the population consisting of equal numbers of S and R agents and compare the performances of the agent behaviors in the mixed group under different conditions The first situation, ST1, is the situation described in the second experiment, 7.e., P(1) = 0.8, andP(v) = 0.05, Vu = 2, ,5 In this situation S agents dominate R agents when more than 30% agents in the population are NS agents In the second situation, ST2,P(v) = 0.2, Vụ = 1, ,5 With a uniform probability distribution, there are frequent

Figure 6.2: Average payoffs of different agent types for varying proportion of NS agents in the population. cooperation possibilities in the negotiations as negotiating agents can often have very different resource valuations From Figure 6.3, we find that for ST2, S agents can dominate R agents only when at least 50% of agents are NS agents So for ST2,

5 agents require much more NS agents, compared to that in ST1, to be present in the environment to dominate R agents In ST2, the R agents utilize the cooperation possibilities much more than the earlier situation In the third situation, ST3, we take P(5) = 0.8 and P(v) = 0.05, Vụ = 1, ,4 Here, most of the time the valuation of a resource to an agent is 5 and hence cooperation possibilities are infrequent But as the valuation is high, each of the agents earn higher payoff compared to the previous conditions The fourth situation is an extreme one, where P(5) = 1 Since each agent has the highest possible valuation for a resource in every negotiation, there are no cooperation possibilities and no help is given and performances of all agents are identical Note that the average payoff to each agent is maximum here because an agent’s valuation is always the highest in this case.

From the experiments described above, we have identified the situations when reciprocative agents can dominate selfish agents and vice versa Now, we investigate

Figure 6.3: Average payoffs of different agent types in four different negotiation scenarios for varying proportion of NS agents in the population. the effectiveness of the reciprocative agent strategy in the presence of several types _ of differential selfish agents For the next experiment onwards, we use a uniform probability distribution, 7.e., P) = 0.2,Vi = 1, ,5 In the next experiment, we experiment with VO agents We observe that though reciprocative agents initially help VO agents and then stop truthful revelation after some time A VO agent, however, continues to help the reciprocative agents whenever their stated valuation is less than the stated valuation of the reciprocative agent As a result, after some number of helps obtained from such an VO agent, a reciprocative agent will again start co- operating with that VO agent and this cycle repeats Even though these VO agents do not always exploit the reciprocative agents, they always increase their valuation to increase their chance to receive help As a result, against a reciprocative agent which is truthfully revealing its valuation in the initial interactions, they receive help in a significantly higher number of negotiations So, VO agents exploit reciprocative agents in early negotiations but after some time reciprocative agents recognize this exploitation and start reporting highest valuation, H, instead of revealing the true valuations In this stage, whenever a VO agent reports any valuation less than

Figure 6.4: Average payoff earned by reciprocatives and VO agents for varying length of negotiation periods.

H, it relinquishes its share even though the reciprocative agent’s true valuation may have been less than the true valuation of the VO agent So, in these cycles, both agents giving away their shares to the opponent but not necessarily when cooperation possibilities really exist So social welfare is not maximized and the cumulative profit of both agents are low compared to what they could be if both agents were reciprocative or if they had revealed their valuations truthfully In Figure 6.4, the performances of VO and reciprocative agents do not have any significant difference until the length of negofiafion period is 140 In the long run, reciprocative agents perform better, because they receive better payoff from the negotiations with other reciprocative agents The difference in performance, however, is slim compared to that when S agents were used in place of VO agents In other experiments, we have observed that with homogeneous agent societies the payoffs of reciprocative agents are significantly higher compared to the payoffs of VO agents.

Next, we experimented with a mixed group of reciprocative agents and differential selfish (DS) agents From Figure 6.5, we observe that the DS agents are able to exploit the reciprocative agents quite effectively for shorter negotiation periods The

Figure 6.5: Average payoff earned by reciprocatives and differential selfish agents for varying length of negotiation periods.

DS agents dominate the reciprocative agents until the length of negotiation period is at least 60 In Figure 6.1, we have shown that for selfish agents S, this number was about 35.

In another experiment, we use the settings of situation ST2 but replace S$ agents by DS agents From the corresponding results in Figure 6.6, we see that when the percentage of N.S agents is more than 35%, the DS agents start dominating the reciprocative agents For S agents, this percentage has to be around 50% under the same settings (See Figure 6.3) This shows that the differential selfish agents are able to exploit the reciprocative agents in more situations than the ordinary selfish agents. The DS agents pretend to behave like reciprocative agents by offering some help. Since such an agent offers help in some negotiation, the cooperative agents needs more time to recognize its differential exploitation Eventually, with more interaction experience, our robust reciprocative mechanism is able to recognize this behavior and outperform them.

In the next experiment, we vary the proportion of DS agents in a mixed group of R and DS agents In Figure 6.7, we present the average payoff of different agent

Adaptation by Negotiating Agents

A restrictive assumption of the work that we have discussed so far has been that agents have fixed strategies, i.e., agents cannot change their strategies over time.

A more realistic scenario would be to allow an agent to change their strategies based on experience An agent can be inclined to adopt a strategy if agents using that strategy is observed to be performing better than others in the population Such a strategy adoption method leads to an evolutionary process with a dynamically changing group composition of agent strategies [S7, 94] We believe that such an evolutionary process mimic real-life situations where users monitor the environment to identify and adapt successful behaviors The following results help us understand and predict the corresponding market dynamics The equilibrium reached via such dynamic evolutionary process provides better characterization of real-world environments composed of heterogeneous adaptive agents We believe that a study of such evolutionary equilibrium is useful for selecting a preferred long-term decision mechanism.

Here we assume that agents are negotiating over generations Each generation consists of a number, ý, of time periods, known as the length of the generation.

At each time period, as in the previous experiments, all the agents negotiate with a fixed number of agents Agents’ performances are tallied after all the negotiations take place in a generation The strategies adopted by the agents in the next evaluation period is determined by a performance-proportionate selection scheme where the probability with which an agent adopts a strategy increases with the average performance of agents employing that strategy in the most recent evaluation period’.

As a result, if a strategy produces better performance in the current generation compared to other strategies, more individuals are likely to adopt that strategy in the next generation This generational scheme is semantically equivalent to every agent periodically selecting its strategy based on the current relative performance of the set of available strategies This generational approach to represent market dynamics is akin to work on identifying “evolutionary stable strategy” [22], which is dominant strategy in the long run.

Our goal is to identify the long-term dominant strategy under different environmental conditions It is evident that if € = 1, i.e., the number of interactions between two agents were at most one, selfish agents will outperform reciprocative agents On the other hand, if the agents interact with each other infinitely often, the reciprocative strategy will dominate the selfish strategy as after some time only reciprocative agents will receive help from each other We want to find out the long-term dominant agent behavior under different environmental scenarios, i.e., for different initial population distributions, different length of generation €, etc.

6.5.1 Experimental Results with Adapting Agents

In this section, we present a set of experimental results when agents can change their behavior to improve performance We have used a set of 90 agents with different initial distributions of agent behaviors We vary the length of the negotiation period

5New agent strategy assignments are made as follows: for each agent i, two agents are selected randomly from the population without replacement Then, of these two selected agents, the strategy of the one with higher performance is adopted by agent 7 Selection of the best from a set of randomly selected candidates is known as tournament selection in the genetic algorithms literature [23]. proportion of agents generations

Figure 6.12: Proportion of agents of different types over generations Initially all agents are present in equal number € = 20. and observe the dynamics of the agent population over generations for different initial agent population distributions In these experiments, we consider P(v) = 0.2, Vv =

In the first experiment, we use a mixed group of reciprocative (R), naive selfish (NS) and selfish (S) agents, where all types are initially present in equal numbers.

In every generation, each agent interacts with other agents for 20 time periods, 2.e.,

€ = 20, and in each time period each agent negotiates with all other agents in the population In Figure 6.12, we have shown the corresponding evolution of the agent population Since agents interact for a small time interval, S agents exploit both the R and NS agents to maximize their performance Within few generations, all the agents adopt selfish behavior Therefore in this situation, selfish is the long-term dominant and evolutionary stable strategy.

In the next experiment we increase € to 50 From the corresponding results presented In Figure 6.13, we see that initially, in the first two generations, the percentage of S agents increases and the percentage of other agent types decrease In these generations, there are some NS types of agents and the S agents exploit these

Figure 6.13: Proportion of agents of different types over generations Initially all agents are present in equal number € = 50. agents and achieve a higher profit But after the second generation, the NS agents become extinct as a result of poor performances in the prior generations and the agent population is left with only R and S types of agents Now, with sufficient interactions, 50 in this case, R agents are able to recognize the S agents and stop helping them At the same time, R agents build mutually beneficial relationship with other

R agents and utilize the cooperation possibilities present in the environment So, R agents outperform the S agents and hence more agents adopt R strategy over generations We observe a snowballing effect as more R agents lead to a further increase of the performance of R agents and thereafter more agents switch to that strategy We observed that after seventh generation, the entire population become reciprocative.

In the next two experiments, we start with a population of equal numbers of R and S agents only In the first experiment, £ = 20 and from the results in Figure 6.14 we observe similar dynamics in the population as shown in Figure 6.12. Since with only 20 interactions, S agents exploit R agents When we increase € to

50, the evolution of population is as shown in Figure 6.15 Since there are no NS agents present in the population, the proportion of R agents continues to grow from proportion of agents k + -

Figure 6.14: Proportion of R and S agents over generations € = 20. the first generation which is unlike the results from the previous experiment shown in Figure 6.13, where the presence of NS agents help the S agents to grow in the beginning.

In all of the above experiments, we can see that reciprocative agents dominate the population when agents negotiate for a longer period of time in each generation and selfish agents dominate when the length of the generation, €, is small The switch of dominance takes place for some intermediate value of € We are interested in finding out, for each initial population configuration, the minimum value of € for which reciprocative behaviors becomes the long-term dominant To get all initial population distributions, we vary the proportion of NS and S agents from 0.1 to 0.8 in steps of 0.1 The surface in Figure 6.16 shows the minimum value of € required for the reciprocative agents to be dominant for given initial agent strategy distributions.

ARGUMENTATION-BASED NEGOTIATION 128

Argumentation Scenarios 00000008 130

In this section, we use the running example of a negotiation scenario between a travel agent and a customer We describe the generation of arguments or counter proposals from the travel agent’s perspective We first present a conversation under- lining the necessity of the argumentation in negotiation process Then we produce three more conversations to clarify the importance of modeling opponent’s belief.

Consider the following conversation between our domain expert, a travel agent (TA), and a buyer agent (A) who has contacted TA for a ticket from Tulsa, USA to Calcutta, India on the 4th February, 2007.

Conversation 1: TA: Ticket Offer: O = ( price = $1400, # stops = 1, waiting hrs = 5, Date = 02/04/2007 ).

A: Reject because price is high.

TA: I can offer deals as cheap as $1200 but if you purchase the previous offer you will get a free round-trip within continental USA.

A: That’s cool I accept the previous offer.

In this conversation, T'A has influenced the preference of A by rewarding him with a free round-trip offer which was not in the original negotiation context This is an example of negotiation based on arguments To produce convincing arguments, it is crucial to know the opponent’s belief model as same arguments may have different impacts for different opponents Consider the following three conversations:

Conversation 2: In response to the request for a cheaper deal by another agent B for the same itinerary in Conversation 1, the travel agent responds

TA: Unfortunately no ticket below $1400 is available for February 4 and if you wait to confirm your booking, the price will go up to $1600.

B: OK, I will book this offer.

Conversation 3: The travel agent tries the same “threat” as in Conversation 2 for the same itinerary with another agent C’ who responds

Conversation 4: In response to the request for a cheaper deal by another agent D for the same itinerary in Conversation 1, the travel agent responds

TA: I am afraid I cannot give you any ticket below $1400 on February 4 but if you take this deal I can give you a bonus of 15,000 frequent flier miles.

D: OK, then I will purchase the ticket.

In conversations 2 and 3, we find that the same argument can result in opposite results The agent TA fails to clinched the deal in conversation 3 The fear that the price of the ticket may increase dominates the decision of agent B whereas agent C believes that better deals are available For agent D, the reward offer clinches the sale Notice that the travel agent needs to concede some utility in Conversation 4 to seal the deal Which of these arguments the TA should use will depend on available offers, TA’s utility for those deals, and the opponent models that can be used as a predictor for offer/argument acceptance In reality, even though it is unlikely that the

TA will have an exact knowledge of the user agent’s belief model, such models can be approximated from domain knowledge, interactions with other agents and previous interactions with this agent We propose a Bayesian network based approach for opponent agent modeling.

The above-mentioned negotiations are based on a set of issues or attributes, e.g., price, # stops, waiting time, departure date, destination city, departure city, etc Some of these issues are negotiable and some of them are constraints and can be determined from domain knowledge In conversation 4, though bonus miles was not part of the original set of issues being negotiated, the TA may have the model that it can be used as a leverage on agent D In other scenarios an agent may have incorrect belief about some attributes For example, a customer agent G may have a belief that airlines # has poor luggage handling record When an offer is rejected based on this premise, the TA will need to argue to correct this misconception This may convince G to accept the proposed deal.

Definitions 0.0

In this section, we present the formal definitions of different arguments or offers A is the entire set of attributes in the environment We denote name; and state, to be the name and jth possible value, respectively, of the zth attribute in the environment, 7 = 1, ,n; We assume that each agent will be aware of all possible values of name; and state;;, j = 1, ,m; The state;; values may be numbers or discrete values like high, low, good, etc We now define notations used in this work on argumentation-based negotiation eZ | C A to be the set of current context attributes Z is a null set before the negotiation starts. e € Cc (A—- 7) to be the set of additional attributes which are not in the current negotiation context but can be included in Z during the process of negotiation That means an attribute a € € can be removed from the set € and included in the set Z. e Pc (A— (7U€)) to be the set of non-negotiable, persuasive attributes which can be used for argumentation, e.g., bonus-miles, increase-future-price, etc. e ⁄ is any set of attribute name and value pairs in the outgoing proposals, ¿.e.,

VY = {(name;, state;;) : name; € A} For each set V, we define a corresponding set

Vị, which consists of the attributes, name;, if (name;, state;;) € V.

We broadly categorize the locutions used in the conversation into following types: request for proposal (or request), offer, argument, accept, reject, finalize and terminate Within each categorization, there are different types of locutions Here we discuss only the important locutions.

Request(V): This is asked at the beginning of the conversation where the asking agent requests the other agent to make an offer In order to request an offer, an agent needs to construct a set V containing attributes from the set (ZU €) and their desired values After an agent send Reguest(V) to another agent, set £ = (LUE) — VY, and then set J = Vy.

Add-request(V): This is an request to add one or more attributes and specify their values during the negotiation process The attributes in Y belong to the set

€ After an agent send this Add-request to the other agent, attributes in ! are included in Z and will be removed from €.

Offer(V): An offer, O, consists of a set V of attributes from the set (ZU £) and TCY, C (LUE), ¿e., it has to contain at least all the attributes of 7.

Accept(O): This is used to accept any previous offer O from the opponent.

Reject(O, V): This is used to reject a proposal, O, from the opponent V contains the name-state pairs of the attributes for which the offer O is rejected V may contain attributes in A which may not be in the set (ZU €).

Terminate(): This is used to terminate a conversation.

Finalize(O): If the other agent has accepted an offer O, this message is to notify that agent that negotiation is finalized with the offer O and the negotiation ends.

Arguments: Argument also consists of a set of attributes We define three different types of arguments:

Conflict-argument(O,V): This type of arguments are presented to the opponent to express the conflicts in belief the agent has about the attributes in V It states (name;, state;) as its belief about the ith attribute in V. Conflict arguments are sent if the other agent rejects an offer and the agent has some different belief about the state of the attributes which are marked as the reason for the rejection Suppose, the other agent has earlier rejected its offer O and sent Reject(O, V’) and this agent believes that some attributes in Y’ have different states than what is mentioned in V’, then this agent will send Conflict-argument(O, V), where,V C V'.

Justification(O,V): This argument is used to justify state of one or more attributes in the previous offer, O Suppose after sending the offer O, the other agent reject the offer and send name-state pairs of the attribute for which the offer is rejected If this agent does not have any conflict with the name-state pairs but finds that one or more attributes in the set (A -—(ZUEUP)) which is the reason for the occurrence of such name- state pairs Then it justifies its offer with those attributes Here, V is such that V, C (A-—(ZUEUP)) For example, if the other agent rejects an offer because of a high price and this agent finds that peak-season is the obvious reason then this agent can justify it by sending the argument Justification(O, ( peak-season?: yes)).

Persuasive-argument(O, VY): This argument is used to persuade the opponent An agent can propose other agent future rewards, or it can threat the other agent of terminating the negotiate, etc Persuasive-argument consists of persuasive attributes Here, }⁄ C P.

In Figure 7.1, we show an example of argumentation-based negotiation using the locutions defined above.

Request(From: Tulsa, To: Calcutta,Date:02/04/2006)

Offer(O = Airline: United, From: Tulsa, To: Calcutta, Price: $1400, Date: 02/04/2006, #Stop: 2)

Conflict-argument(Luggage Handling: Good)

Persuasive-argument(bonus—miles: 15,000) Accept(O)

Figure 7.1: An example of a negotiation using arguments.

Architecture of Argumentation Based Agent

In this section, we present the architecture of our agent Ag (henceforth we refer to the agent presenting arguments by Ag and the opponent agent by OpAg) for negotiation using arguments Figure 7.2 shows the different components in the agent architecture We now discuss the different components.

Proposal Analysis: This component receives proposals sent by the opponent agent.

The opponent agent’s proposals can be of the following types: a request for an offer, a counter offer, rejection of an offer, an argument, or termination This component identifies opponent’s proposal type and update its own belief model and the opponent’s model If the opponent has requested for an offer, the Request Processing component is informed If the opponent has sent a counter- offer, the Counter Offer Processing component is informed Similarly, if the opponent has sent a rejection, arguments, or termination notice to end negotiation, this component informs the Rejection Processing, Argument Processing, or Termination component respectively.

Request Processing: If the opponent has sent a Request(V) message, this component is informed It communicates with the Offer Generator, a subcomponent

F—>_ Argument - Persuasive P aol — Processing Argument Toposa

Figure 7.2: Decision Architecture of the arguing agent. of Offer/Argument Generator which we have discussed later, to generate an offer according to the requirements mentioned in V.

Rejection Processing: If opAg rejects the previous offer, O, and send a Reject(O, V) message to agent Ag, the Proposal Analysis component of Ag sends the set V for which the other agent has rejected the offer O to the Rejection Process- ing component Note that VY can be an empty set This component sends V to the Conflict Argument Generator, a subcomponent of the Argument/Offer Generator.

Counter Offer Processing: It communicates with the Offer Evaluator subcomponent to find out the utility of this offer and then sends the offer along with the utility to the Acceptance/Rejection Selector component to decide whether to accept or reject this offer.

Argument processing: If the opponent agent has sent an argument, this component finds if that argument changes its own belief model If no change is found,which implies that no improved offer is sent after the rejection of the previous offer, O, then this component sends a request to the Proposal constructor to send a Reject(O) message to the opponent But if there is a change in its belief model, it sends the previous offer, O, to the Acceptance/Rejection Selector component.

Termination: The Proposal Analysis component notifies this component if the other agent has sent either a Terminate(), Accept(O), or Finalize(O) message If it is an Accept(O) message then it finalizes the negotiation with the offer O, and sends a request to the Proposal constructor component to send the other agent a Finalize(O) message If it receives a Finalize(O) message, it finalizes the negotiation with the offer O and ends the negotiation On the other hand if the received message is a Terminate() message, this component terminates the negotiation process immediately.

My Model: This is a collection of the agent’s own belief about the domain attributes For example, this agent may have knowledge of the following airlines services: luggage handling, flight security, crew service quality, insurance facil- ities, etc Belief about an attribute may be strong or weak We have assumed that an agent changes its belief only if the belief is weak Strong beliefs cannot be changed during negotiation with arguments.

Opponent Model: Opponent model consists of Constraint information and Oppo- nent’s belief model We believe that, in any negotiation it is important to recognize which attributes are strict constraints and which are weak Using domain knowledge, we initially classify the constraints into strict and weak ones This is updated based on the response from the opponent We represent the Oppo- nent’s belief model in the form of a Bayesian network and it will be discussed in the next section.

Offer /Argument Generator: This component has five subcomponents which are responsible for evaluating and generating offers and arguments They are Con- flict Argument Generator, Justification Generator, Persuasive Argument Gen- erator, Offer Evaluator, and Offer Generator We discuss these subcomponents below.

Conflict Argument Generator: This subcomponent decides if the opponent agent rejects the earlier offer because of any “wrong” or conflicting belief about some attribute If that is the case, it uses the negotiation history to decide whether or not to argue with this conflict If it decides to argue with this conflict it informs the Offer/Argument Selector, which is described later in this section, to generate a Conflict-argument with the conflicting attribute(s) Otherwise, it relinquishes the control to the Justification Generator For example, opAg may have a belief that airlines & has a poor luggage handling reputation So, when a travel agent Ag offers a ticket with F airlines for a reasonable price, OpAg rejects that with an argument of (luggage handling: poor) The travel agent then needs to argue that

E has earned a very good reputation for good luggage handling in recent years The opponent agent then may accept this deal if it is convinced with this argument But if it still rejects the offer, this component updates the opponent model by adding this information as a constraint.

Justification Generator: If opAg rejects the previous offer sending name- state pair of a set of attributes, say V, as the reason and the agent finds that there is one more attributes v; in the environment (v1 £ (ZUEUP)) which influence some attributes in VY where v; is not under the control of any agent, then the agent asks Offer/Argument Selector to justify the previous proposal with v; For example, if an agent rejects an offer stating that the price is high then this agent can justify it as the ( peak-season?: yes), 2.e., the price is high because of the flight is in the peak season.

Persuasive Argument Generator: This subcomponent decides, with the help of Opponent model, if there are any persuasive arguments which can influence the opponent to accept the previous offer For example, suppose a reward of bonus frequent flier miles has a significant positive influence on the opponent Then if a reward of 10,000 bonus miles is offered, the opponent may accept the previous offer This subcomponent finds all such persuasive arguments and determines which of those produces maximum expected utility using the Opponent Model Then it sends that argument with the corresponding utility to the Offer/Argument Selector.

Offer Generator: This subcomponent generates offers If opAg has sent a request for an offer, this subcomponent finds out an offer that matches the specifications provided by the other agent and produces maximum expected utility to itself It interacts with the Opponent model to determine the offer If there is no offer that matches the conditions given by the opponent then it searches among offers which satisfy the remaining constraints if some weak constraints are removed It sends the best offer found to the Offer/Argument Selector.

Offer Evaluator: This is a subcomponent of Offer/Argument Generator which finds out the expected utility of the offers or arguments based on the Op- ponent’s belief model and the agent’s own evaluations of the corresponding offer or persuasion.

Bayes Net Model of Opponent’s Belief

In section 7.3, we have discussed the architecture of a negotiating agent We have described how the decision mechanism largely depends on the agent’s approximation of the opponent’s model We have discussed that a given proposal may be quite profitable for one opponent but may be unacceptable to another opponent This makes it necessary and desirable for the negotiating agent to model its opponent In practice, one agent may have only approximate a priori estimates of the dependencies and influences of the different factors on the other agent’s behavior We propose the use of Bayesian networks to capture the causal dependencies of the different factors on the decision mechanism of the opponent Bayesian networks are useful in this context as they can capture the inherent uncertainty about the opponent preferences We use an augmentation of the Bayesian network, the influence diagram [46], to evaluate the expected utility of different actions of the modeling agent This mechanism will allow the modeling agent to choose the action that will produce maximum expected utility.

We have shown an example of modeling the opponent’s belief in Figure 7.3 We will discuss the details of the decision mechanism in the next section.

7.4.1 Bayesian Networks and Influence Diagram

A Bayesian network (BN) is a graphical method of representing causal relationships [45], z.e., dependencies among different variables that jointly define a real-world situation A BN is a directed acyclic graph with domain variables represented as nodes and causal relationship between variables represented as directed edges between the corresponding nodes In addition to its structure, a BN is also specified by a set of parameters ỉ The quantitative aspect of the influence of causal relationship is characterized by conditional probability tables (CPTs) associated with corresponding nodes.

Consider a vector X of variables and an instantiation-vector 2 If the immediate parents of a variable X; is the vector Pax,, with its instantiation pa„,, then

This defines the joint distribution of the variables in X, where each variable xX; is conditionally independent of its non-descendants, given its parents For more detailed discussion on Bayesian networks we refer to [38, 45].

We use Bayesian networks for representing belief structures, for the following reasons: e Bayesian network can readily handle incomplete data sets. e It allows one to learn about causal relationships This is useful to gain understand- ing about a problem domain A BN can efficiently represent the non-linear causal relationships of the variables. e BN can represent and reason about uncertainty in the domain. e Bayesian networks in conjunction with Bayesian statistical techniques facilitate the combination of domain knowledge and data. e It offers a method of updating the belief or the CPTs.

An influence diagram is a Bayesian network augmented with action variables and utility functions There are three types of nodes in an influence diagram: chance nodes, value nodes, and action nodes [46] The action variables represent different actions of the decision maker Utility functions are attached to the value nodes in the network Influence diagrams can be used to calculate the utilities of different values of the decision variables In the context of negotiation, we want to use such networks to find out the conditional probability of accepting a proposal given the proposal contents listed as an attribute-value vector Such calculations can enable to select an offer or argument with the maximum expected utility.

7.4.2 An Illustration of the Agent Belief Model

In our negotiation framework, we assume that the arguing agent has an approximate belief model of the opponent agent in the form of a Bayesian network We have assumed that the arguing agent knows the exact structure of the network and it has an approximate model of the conditional probability tables from domain knowledge and earlier interactions with the opponent agent For the sake of simplicity, we have assumed all variables to be discrete In Figure 7.3 we show an example of a model of a customer agent, OpAg, approximated by the travel agent Ag The agentOpAg has asked for a round-trip airline ticket from Tulsa, USA to Calcutta, India on

Tebate on ne purchase lecision accept/reje

Figure 7.3: Approximate belief model of the opponent. the 2nd February, 2007 He sends the request for proposal reg(V) where V is collection of the attributes ( from: Tulsa, To: Calcutta, Roundtrip: yes, # of stops: < 2, date: 02/04/2007) In the network shown, the decision node generates the expected utility of an offer or an argument The chance nodes, value nodes, and action nodes are represented as circles, rhombuses, and rectangles respectively An offer selects values for nodes with double circles The double circles joined with solid lines to their parents are initially in the set Z and the double circles joined with dotted lines to their parents belong to € in the negotiation An arc above a double circle signifies that the attribute is negotiable The action nodes represent the actions that can be taken by agent Ag.

Here the offer made by the agent Ag determines the value of the nodes like airlines name, # of stops, date of journey, destination city, departure city, requirement for transit visa, etc Ag also has beliefs about the values of some chance nodes for OpAg like risk attitude, payment capacity, etc which influence the decision of OpAg.

We consider different offers as different possible values for the node Offer.

For each offer available to Ag it can find out a conditional probability of acceptance by OpAg given the variable values in the offer Based on the reply of

OpAg to an offer or argument, the CPTs in the BN are updated by the sequential update rule of Bayesian network [90].

Offer or Argument Selection Procedure

In section 7.3, we discuss the different components that influence the decision of the agent Ag In this section, we will present an algorithm to precisely specify the decision procedure for selecting the outgoing proposal We will also briefly discuss how the agent evaluates and chooses from different offers and arguments.

The decision procedure is presented in the Algorithms 4 to 8 We now describe some functions used in the algorithm.

Algorithm 4 Decision(proposal(O’, V’)): Decision Algorithm of the Agent

Update-opponent-model(V’);{ Proposal Analysis component updates the opponent model.} {In proposal(O’, V’), the input variables O’, and 3” can be null } if proposal(O’, V’) is Request(V’) then

Send-Proposal(O, null);{this function chooses appropriate proposal based on its inputs and send it to the Proposal Constructor component to construct the appropriate message and send it to the opponent } else if proposal(O’, V’) is Reject(O’, V’) then

(O,V) = Process-Rejection-Processing(O’, V’);{The offer/argument (O, V) is selected by the Offer/Argument Selector.}

Send-Proposal(O, Y); else if proposal(O’, V’) is Offer(O’) then

Send-Proposal(O, V); else if proposal(O’, V’) is Argument(O’, V’) then

Send-Proposal(O, V); else if proposal(O’, V’) is Accept(O’) then

O = Finalize(O’);{Finalizes and directs proposal constructor to send finalize message to the opponent }

Send-Proposal(O, null); else if proposal(O’, V’) is Finalize(O’) then

End-negotiation();{End negotiation with offer O'.} else if proposal(O’, V’) is Terminate() then

Algorithm 5 Process-Rejection-Processing(O’, V’): finalarg = null, O = null; finalarg = Conflict-belief(V’);{ Conflict Argument Generator execute this function } if (finalarg==null) {there is no conflict in belief} then finalarg = Justify-belief(V’):{ Justification generator controls this function} end if if (finalarg==null) {no additional justification for offer} then

(O, finalarg) = OfferOrPersuasive(O’, V’);{this function generates Offer or Ar- gument which has maximum expected utility } end if return (O, finalarg);

Algorithm 6 Counter-Offer-Processing(O’): finalarg = OfferOrPersuasive(O’, V’);

(O*,u) = Compare(O’, finalarg); if O* == O’ then return (O’, null); else

Y* = Find-Reject-attributes(O’);{this function finds the name-state pairs of the attributes that can make the offer acceptable } return (null, V*); end if e Find-best-offer: Each offer sets values to different attributes An agent ' can calculate the utility for an offer based on the attribute values and the agent can calculate the probability that this offer will be accepted by the other agent from the opponent agent’s belief model presented by the Bayesian network From these two pieces of information, the agent can find the expected utility of an offer The offer generator will choose the offer with the maximum expected utility. e Find-best-persuasive-argument: Each time the opponent agent rejects an offer, agent Ag tries to find, based on the opponent’s belief model, if some persuasive arguments can increase the probability of accepting the offer by the opponent agent This has a corresponding utility Therefore, an agent can calculate the expected utility of each persuasive argument and this function chooses the one with maximum expected utility.

(O', V,) = Find-best-persuasive-argument(O’, V’);{this function finds out best persuasive argument which produces maximum expected utility }

(Oo, null) = Find-best-offer(V’);{finds out the best offer which produces maximum expected utility }

(Q, null) = Best-counter-offer(); {returns the best counter offer by the opponent so far.}

(O*, V*,u) = Compare((O’, Vi), , (Óa, null), (Q, null));{This function compares and finds the offer or argument which produces maximum expected utility and u is the corresponding expected utility } if u is positive then return (O*, Y*): else

Terminate(); {the negotiation and send a Terminate() message to the opponent } end if

Algorithm 8 Send-Proposal(O, V) if O is opponent agent’s counter offer then

Accept(O); else if O is an offer then

Offer(O); else if O is null and V is (conflict /justify/persuasive) argument then

(Conflict / Justification /Persuasive)-argument(V); else if V generated from Find-Reject-attributes function then

AUTOMATED CONTRACTING IN A SUPPLY CHAIN147

Related Work 00.0.0 00.00 2 ee 149

by mediating agents is investigated by [6] Negotiation-based supply chain management has been proposed in [18], where a virtual supply chain is established by the negotiating software agents when an order arrives Using blackboard architecture,which is a proven methodology to integrate multiple knowledge sources for problem solving, a mixed-initiative agent-based architecture for supply chain planning and scheduling is implemented in MASCOT [72].

Swaminathan et al has provided a framework for efficient supply chain formation [91] Labarthe et al [53] has presented a heterogeneous agent based simulation to model supply chains Multiagent Systems research has emphasized on the emer- gence of the optimal configuration of the supply chain Walsh e al has shown the optimal dynamic task allocation in a supply chain using combinatorial auction [97]. They have proved that, given a task, composed of a group of subtasks, the dynamic formation of the supply chain that will produce maximum profit.

Our work is different from the other works We build an automated supply chain management systems as an application of automated contracting In our proposed supply chain, contracts are split into subcontracts and awarded to the suppliers or manufacturers at the next level using auction-based automated contracting We emphasize on the usefulness of effective scheduling and pricing strategies to improve the performance of the supply chain We empirically evaluate the performance of different scheduling and pricing strategies.

Scheduling Strategies 0.000 00050 150

Suppliers can use pricing schemes to exploit market niches if the arriving task mix presents such opportunities and that too if the suppliers can create a flexible local schedule to accommodate profitable tasks Smart, predictive scheduling and bidding decisions (whether to bid) are key factors to producing market niche opportunities. Here we summarize alternative scheduling schemes we have investigated in our prior work.

We evaluate four scheduling strategies for the suppliers The goal is to allocate a task ¢ of length /(#) and deadline di(t) The following scheduling strategies have distinct motivations:

First fit: This strategy searches forward from the current time and assigns t to the first empty interval of length /(f) on the calendar This produces a compact, front-loaded schedule.

Best fit: This strategy searches the entire feasible part of the schedule (between the current time and deadline di(t)) and assigns # to one of the intervals such that there is a minimal number of empty slots around it This strategy produces clumped schedules, but the clumps need not be at the front of the schedule.

Worst fit: This strategy searches the entire feasible part of the schedule (between the current time and deadline dl(t)) and assigns ý to one of the intervals such that there is maximum empty slots around it Worst fit algorithm produces an evenly-loaded schedule.

Expected Utility Based (EU): All the strategies discussed above schedule the tasks in a greedy way They bid for a task whenever it can schedule it In reality, there are periodic patterns in the task arrival distribution Hence, it might be prudent at times to not bid for a task with the expectation of keeping the option open to bid for a more profitable task in the future In the EU strategy, agents use the knowledge of the task arrival pattern and whenever a new task arrives a decision is made whether it is worthwhile to bid on it If the task arrival pattern is not known a priori, it can be learned over time provided the pattern does not change drastically in a short time Unfortunately, there is a risk involved in such opportunistic scheduling as some production slots may remain unused if the expected high-profit tasks never materialize According to this algorithm, an agent bids for a task if its expected utility is positive.

It is negative if the expectation that it can make more profit later overpowers the risk of not bidding now So, it will not bid for this task We discuss this expected utility based strategy in the next section.

Utility of Scheduling a Task

days on the calendar C between and including est(t) and đi(?) For each of these days, the agent generates two sets of task combinations 2, 5( 2) and T; nị 4)—!0)? where, 7? = {TỊT C Tia = Dyer l(t), P(t) > k Vt € T} is the set of all those task combinations where the lengths of the tasks in each combination adds up to œ and each task has a priority higher than k We compare the utility of scheduling this task now and scheduling the remaining slots later with all possible ways of scheduling the currently empty slots We choose to schedule the task now if the corresponding utility dominates all other ways of filling up the empty slots without scheduling the current task.

Let ni, to be the number of tasks in T of length ¿ and priority 7, Pr(i, nis, d,H) is the probability that at least ni, number of tasks of length ¿ and priority 7 will ar- rive at a later time on day d, where #7 is the history of tasks that have already arrived Given the task distribution (discussed in the Section 8.5), the probabili- ties Prũi,nị,, d, T0 can be calculated by using multinomial cumulative probability function.

The expected utility of scheduling the current task ¢ on day d on calendar C given the history of task arrivals 7ý is

Pr(i, nịy, d, H)nj,u(i, k), is the average expected utility of scheduling 5 hours on day d with tasks of higher priority than task ¢ given the history of task arrivals H lox is the maximum possible length of a task, and the function den returns the density of the calendar, i.e., the percentage of slots that have been scheduled, upto a given date (to facilitate scheduling we use den(C,0) = 1).

For a given day on the calendar, the expected utility expression adds the sum of the utility of the current task to the difference of the average utility of all possible ways of filling up the calendar with higher priority tasks with or without the current task being scheduled We schedule the task provided that the expected utility is positive for at least one of the days considered and the day chosen is the one which provides the maximum expected utility, ie., argmaxgep EU(t,d,H,C).

We investigate the ability of suppliers to use adaptive pricing schemes to exploit market opportunities A supplier may find it beneficial to hike prices when it believes there is less competition for contracts and vice versa As current information about competitors may not be available, such schemes may also backfire While aggressive exploitation of sleeping market opportunities may be beneficial, a more conservative approach might be prudent in the face of competition.

We have defined three pricing strategies for the suppliers In all the three strategies, a supplier will increase the price of the bid when it is winning contracts continuously A supplier, however, cannot bid more than the reserve price of the secondary manufacturer Suppose a supplier s is asked to bid for a task t by a secondary manufacturer m, whose reservation price is R,, The length of the task is I(t) and deadline di(t)?. e Linear strategy: If the supplier won previous contract for this type of task t, it will increase its bid, by a constant a, from its previous bid, provided the new bid will not exceed the reserve price of the secondary manufacturer it is bidding to Though here, for a fair comparison, we have used the same œ for all suppliers.

1the deadline determines priority P(t): we have assumed that a task with immediate deadline is considered to be of high priority or priority 1 and a task of normal deadline is of ordinary priority or priority 0.

Similarly, if the supplier looses its previous contract for the same type of task, it will reduce its bid by @ from its previous bid, provided the bid is not less than its own reserve price R, So, if b was its bid for the last contract for the of same length and priority as that of task ý, then its bid for task t will be, bid = min(b+a,R,,), if it won the last contract max(b — a, R,), if it lose the last contract (8.1) e Defensive strategy: This is a cautious strategy It increases (or decreases) its bid when it is winning (or loosing) with a caution If the supplier with this strategy won & contract at a row for the similar type of tasks, it will increase its bid from its previous bid by incr = œ—(kxð), 1, œ>kxð

= 0, otherwise provided the increased bid will not be more than the reserve price of the secondary manufacturer it is bidding to Here, 6 is another constant parameter of the supplier. This supplier will decrease its bid by the same amount, i.e incr, from its previous bid if the supplier looses k contracts at a row of similar type tasks provided the decreased bid will not be less than its own reserve price If the last bid of the supplier for the same type of task was ử then new bid can be found out from Equation 8.1 using a — (k x 6) in stead of a. e Impatient strategy: Suppliers following this pricing strategy increases and decreases the bid sharply as a consequence of winning and loosing a bid, respectively.

If the supplier with this strategy won k contracts at a row for the similar type of tasks, it will increase its bid from its previous bid by a + (k x 6), provided the new bid does not exceed the secondary manufacturer’s reservation price Similarly, if the supplier looses & contracts at row, it will decrease its bid by a + (k x 4), provided the decreased bid will not be less than its own reserve price If it’s last bid was b the new bid can be found from Equation8.1 using a + (k x 6) in stead of a.

We have assumed that at the start of the simulation, each supplier has the same bid for a particular type of task.

8.5 Experimental Framework Our experimental framework is designed to facilitate he evaluation of the relative effectiveness of alternative scheduling and pricing schemes, described above, under varying environmental conditions In particular, we expect to identify when market niches are produced by smart scheduling strategies, enabling aggressive pricing to exploit such opportunities.

In our simulations, each period consists of a work week of five days, each day having six slots We vary the arrival rate of different task types with the percentage of priority tasks varying over different simulations Each of the tasks is generated and allocated to a manufacturer depending on whether the manufacturer can accomplish the task by using one of its suppliers Awarding a contract to one supplier over another by a manufacturer is performed through a first-price sealed-bid auction protocol.

The supply chain that we have used for our simulations consists of 3 levels, each being populated by one or more enterprises having similar functional capabilities. There is one main manufacturer in level one, six secondary manufacturers in level

2 The supply chain structure is shown in two, and twelve suppliers at level three

Figure 8.1 In our simulation we have consider a whole task as a combination of some parts We can think of a whole task T as Ù = Ly + Lem + Lsu, where, Lm, Lem and L,,, are the part of the whole task to be done by main manufacturer, secondary manufacturer and suppliers respectively The main manufacturer will contract a part of each task that it has to complete to one of the secondary manufacturers Each secondary manufacturer, in turn, contracts part of each task it wins to a supplier in some subset of all suppliers We call this subset of suppliers the supplier window2We have later changed the number of suppliers to show the effect of the presence of too many similar kind of suppliers in the environment.

LEVEL2 ( ¡ \ f5 \ coc Manufacturers supplier window /

— Suppliers of the secondary manufacturer In our experiments, we have chosen the number of secondary manufacturers and suppliers such that each supplier can receive contracts from exactly two different secondary manufacturers.

CONCLUSIONS AND FUTURE WORK 172

Tiêu đề	Negotiating Contracts in Multiagent Societies
Tác giả	Sabyasachi Saha
Người hướng dẫn	Professor Sandip Sen
Trường học	The University of Tulsa
Chuyên ngành	Computer Science
Thể loại	Dissertation
Năm xuất bản	2006
Thành phố	Tulsa

Định dạng
Số trang	208
Dung lượng	22,41 MB