Combination of Conditional and Unconditional Appro- 123docz.net

In this section, we want to present a first approach how to deal with uncertain data.

So far, we have always assumed that all estimated probabilitiespðssa0ị, i.e.,pass0and pss0, are equally reliable. However, in most applications for a states, new transitions s!s0 (usually represented as rules) are dynamically added during the process of learning. For example, ifsis a long-standing product of a web shop and recently a new product s00 was included into the assortment of the shop, then we may dynamically add the transition probabilitiespðaịss00 to the existing ones.

Letjpðssa0ịrepresent the estimated probabilitiespðssa0ịafterjupdate steps. Then the described dynamic approach means that different target statess0may have different counter valuesj. Obviously, for a state s1with a large counter j1in general, the corresponding transition probabilitiesj1pð ịssa1can be considered as more reliable than

j2pð ịssa2 for a states2with a small counterj2.

In order to calculate (5.8) meaningfully, the transition probabilitiespðaịss0 must be statistically stable, at least to some extent. In other words, adding transition

probabilities pðaịss00 of a very new product s00 may deteriorate our whole approach (5.8). To overcome this problem we will present some first ideas here.

At this, we replacenpðaịss0 with the “stabilized” probabilities

nepð ịssa0 :ẳ npð ịssa0, if nnmin

0, if n<nmin

, ð5:22ị

wherenminis a threshold value for the minimum statistical mass (usually 20 or more) and instead of (5.8) now calculate

qπðs;aị ẳepssaarssaỵ1epssaa 1epssa

s06ẳsa

pss0rss0: ð5:23ị

There remains a problem at (5.23) however because of the fact that for new transitions the conditional action valueepssaarssainitially is 0 or small. That means that its recommendations are scarcely delivered andepssaarssa can scarcely grow (unless e

qπðs;aịincreases via its unconditional action value). We then have a vicious circle.

In order to escape this, we modify (5.22) for the conditional probabilitiespass0

as follows:

np^ass0 :ẳ npass0, if nnmin

sCmpss0, if n<nmin

, ð5:24ị

wheresC∈[1,∞) is a fixed scaling factor. Here thenrefers to the counter of the conditional probability (i.e., for delivery ofa), whereas on the other handmis the counter for the unconditional probability! In this way we replace (5.23) with the final estimation

qπðs;aị ẳp^ssaarssaỵ1p^ssaa 1epssa

s06ẳsa

pss0rss0: ð5:25ị

We now come to the interpretation of (5.25). Since the unconditional probabilities pss0 are continually updated, even without delivery of s0, these have real chances of being delivered as recommendations, and the conditional probability counter increases. As soon as it reaches the threshold nmin, the initial auxiliary probabilityp^ssaais replaced by the conditional probabilitypass0.

The scaling factorsCshould be motivated in the broader sense. IfsC ẳ1 is set, there is the risk that the new recommendations are often not sufficiently strong to be shown. (Generallypssaa>pssa is the case, i.e., the probability of the transition to a productsais generally higher if it is also recommended.) It follows from this that in generalsC >1 should be selected, so that the transition probabilitiespssahave a real chance of being delivered.

5.3 Combination of Conditional and Unconditional Approaches 77

The selectionsC >1 is also useful in respect of the delivery of competing initial recommendations. For this we initially assumesCẳ1. As long asnp^ass0 is in the initialization phase, i.e.,n <nmin, under the mostly valid (and not crucial) assump- tion epss0 ẳpss0, 8s06ẳsa (i.e., the other unconditional probabilities are stable), (5.25) takes the form

qπðs;aị ẳpssarssaỵ1X

s06ẳsa

pss0rss0 ẳX

pss0rss0 ẳq0ðs;aị,

and thus q^πðs;aị is the same for all recommendations in the initial phase.

On the other hand, the introduction of the scaling factorsC>1 yields the desired behavior:

qπðs;aị>q^πðs;bị ,pssarssa >pssbrssb:

Thus the method for the initial recommendations works similarly to that of the P-Version. For methodological purposes we therefore introduce a simplified version of (5.25):

qsπðs;aị ẳp^ssaarssa: ð5:26ị This therefore combines the P-Version for unconditional and conditional probabilities. As long as n<nmin, it corresponds largely to the P-Version for the corresponding recommendation, i.e.,

qsπðs;aị ẳsCpssarssa, and forsC ẳ1 it is actually identical:

qsπðs;aị ẳpssarssaẳqPðs;aị:

As soon as the threshold value nmin is reached, it changes into a P-Version operating on the basis of conditional probability:

qsπðs;aị ẳpssaarssa:

The transition from unconditional to conditional probabilities in (5.25) or (5.26) makes sense in terms of content too: as long as the statistical mass is small, one should not operate with the complex conditional probabilities. Therefore, the unconditional probabilities are used, whose stability increases more quickly – and without requiring the delivery of recommendations. If then the necessary statistical mass is reached, we change over to the qualitatively more demanding conditional probabilities. In this way we achieve a continuous transition from the P- to the DP-Version.

The question of the best value ofsC is difficult and a subject of forthcoming investigations and will not be addressed further here.

In closing let us turn our attention to a further special problem in the conditional version. If a rule is no longer applied for recommendations after exceeding the threshold valuenmin(because in the meantime other rules have become preferred), it has – at least in the simplified version (5.26) – in general little chance to be applied again, since the conditional probabilitypass0is no longer being updated. This holds even if its potential acceptance has increased again.

In order to get around this, we introduce a special explorative delivery mode for the DP algorithm. For this, similarly to theε-greedy policy, a percentage rateεDPis specified, in which instead of being delivered according to the action-value function

qπðs;aị, the recommendations are delivered in descending order according to the following criterion:

Θðs;aị ẳ pssapssaa

rssaẳ Δpara: ð5:27ị Thus, the idea is that the difference between the unconditional probabilitypssa

and the conditional probabilitypssaais a good indicator for whether a rule has become more attractive again. For if the difference increases, the user will be more inclined toward productsaeven without a recommendation, and the necessity of its delivery increases.

Let us emphasize that the empirical approach of this section just presents some very first and simple approaches to handle the crucial problem of statistical stability of the DP-Version. Surely, much more advanced instruments can be developed.

Despite this, in Chaps. 6, 7, 8, 9, and 10 we will develop mathematically more demanding methods to increase the stability of our RL approach for recommendations.

That concludes our trip around the basic RL methods for our RE framework.

Let us now consider their experimental evaluation.

Combination of Conditional and Unconditional Approaches

Weaknesses of Current Recommendation Engines

On the Convergence and Implementation