Verification of the Environment Model

We consider the following example.

Example 5.2 We use the data of an online furniture shop We start with the off-line test method of Sect.4.4. The shop contains approximately 1,900 products. We use the data of one day as training set; it contains 9,736 sessions with 31,349 transactions. The test set consists of the data of the following day; it has 7,430 sessions with 24,161 transactions. Up to removing multiple clicks, we did not change the data.

We want to check the plausibility of Assumption 5.2. In the shop of our test data, no control sessions exist, and all sessions get recommendations of the prudsys RDE. In order to check the influence of the recommendations on the browsing behavior of the shop visitors as good as possible, the RDE varies the recommendations strongly. This was achieved by applying the softmax policy (Sect.3.3), where the control parameter τwas adjusted to select approximately 50 % of all recommendations as “greedy,” i.e., corresponding to the strongest action values, and the remaining 50 % explorative. There are always 4 recommendations displayed for each product.

We use the training data set to determine both the conditional transition probabilitiespssaaand the unconditional onespssaby means of the adaptive algorithms 5.1 and 5.2.

We now consider all product viewssthat actually received at least one recommendation a and where at least one rule s!sa exists that was learned on the training set. We call this set of product viewsrecommendation relevant.

We now follow the notation of Sect. 5.3. Let n be the number of updates of conditional probabilities npass0 and m the number of updates of unconditional probabilitiesmpss0of the rules!s0on the training data. We define

kminẳminðn;mị ð5:28ị as minimum of both updates. The higherkmin, the better the conditional probability

npass0 can be compared with the unconditionalmpss0because forsthe recommenda- tionawas sufficiently often delivered (highn) and at the same time also sufficiently often not delivered (highm).

We want to compare the probability typespssaaẳnpassa andpssaẳmpssa depending onkminin order to see how the recommendation ofaincreases the transition tosa. So we calculate the mean values of the transition probabilities over all product recommendation pairsðs;aịkminwhose update number is not smaller thankmin, i.e.,

pa ẳ 1 s;a ð ịkmin

s;a ð ịkmin

npassa, p ẳ 1 s;a ð ịkmin

s;a ð ịkmin

mpssa

and their coefficient:

rsCẳpa

p :

We use Algorithm 5.1. Since we display multiple recommendations, we calculate the conditional probabilitiespssaaadditionally by Algorithm 5.2 and denote their calculated conditional probabilities bypa. The result is shown in Table5.3.

The coefficientsrsC are graphically represented in Fig. 5.3. The main behavior looks good: the coefficients are always larger than 1, so displaying recommendations increases the corresponding transition probabilities, and they are not unrealistically large. The graph does not follow any special pattern what is expected, too, since its variations shall be distributed randomly. The only trend we might induce is a slight increase whenkminreached a number with critical statistical volume between 20 and 30.

Table 5.3 Averaged transition probabilities for differentkmin

kmin p pa pa rsC

1 0.007 0.011 0.013 1.49

2 0.012 0.017 0.022 1.49

5 0.021 0.028 0.030 1.31

10 0.024 0.029 0.033 1.21

20 0.024 0.034 0.042 1.40

50 0.024 0.042 0.051 1.78

0,00 0,20 0,40 0,60 0,80 1,00 1,20 1,40 1,60 1,80 2,00

0 10 20 30 40 50 60

Ratio

Update step min(n, m)

rsC

Fig. 5.3 Averaged ratiorsCof conditional to unconditional transition probabilities for different minimum update stepskmin

5.4 Experimental Results 81

As we see the special treatment of multiple recommendations does not seem to have great impact. Of course, the relation passa>pssaa holds but the difference is relatively small.

Now we will compare our both Assumptions 5.1 and 5.2 regarding their prediction quality of product views (clicks): for each recommendation-relevant product view, we first recommend the productss‘having the highest unconditional probabilities pss0 ẳmpss0. We use Algorithm 4.1 but applied to all product transitions (instead recommendations only). This corresponds to Assumption 5.1 and the P-Version.

For Assumption 5.2 of the DP-Version, we secondly recommend the products of the highest probabilitiespass0according to (5.3). Since we have multiple recommendations in the transaction data, we need the probabilities pass0 instead of justpass0. Their computation was done by Algorithm 5.2. In order to estimate the efficiency of our approach, we will include the unconditional probabilities pss0 calculated by Algorithm 5.2 in the comparison which we will denote bypf gssa0 in order to avoid confusion with unconditional probabilitiespss0of Assumption 5.1.

The comparison of the prediction methods is again provided for differentkmin, by imposing the requirement that at least one of the recommendations must satisfy kmin. The number of these valid product views is denoted byns. Forkminẳ0 we obtain all recommendation-relevant product views,ns ẳ15,235. With increasing kminthis number is correspondingly decreasing. Furthermore, we test one and three recommendations. The result is given in Table5.4.

As we can see,pass0exhibits comparable prediction rates topss0. At the first sight this may look like a sad result. However, a deeper analysis leads to a more optimistic interpretation. First we emphasize that our aim is not to make good predictions but to find good recommendations. This means that even if our model does not possess the highest prediction quality, as far as it is applicable in principle, the separation into unconditional and conditional probabilities and their right treatment provide an increased return. We will see this impressively in the exper- iment of the next section where the P-Version exhibits a slightly higher prediction quality than the DP-Version but leads to a much lower return. Having said all this, of course, we do not question the need of good predictions. They are integral for good recommendations.

Table 5.4 Prediction qualities for different prediction methods

kmin ns

Rate forpss0 Rate forpf gssa0 Rate forpass0

1 rec 3 recs 1 rec 3 recs 1 rec 3 recs

0 15,235 8.72 18.32 7.11 15.01 8.79 19.02

1 13,088 8.90 18.70 7.47 15.61 8.31 17.47

2 10,316 8.82 19.02 7.58 16.51 7.97 17.43

5 6,891 9.26 20.24 7.92 17.21 8.84 18.53

10 4,498 9.20 19.83 8.09 17.23 8.62 17.96

20 1,877 8.95 20.78 9.85 19.82 10.12 20.03

50 172 7.56 23.84 10.47 20.93 10.47 24.42

Second we observe that with increasing kmin the prediction quality of pass0

improves, and for higherkminit even outperformspss0. This is because Assumption 5.2 requires a more complex treatment of the data, including a partition of the transactions and their different handling for the two probability types. On the contrary, Assumption 5.1 makes use of all data for unified learning, and hence its algorithms achieve good prediction results even for small statistical volumes. The higher the statistical mass, however, algorithms based on Assumption 5.2 increas- ingly benefit from their structural advantage and finally outperform the simple ones.

Of course, this only applies if Assumption 5.2 is actually realistic! But Table5.4 seems to confirm that and that’s another good news. Finally, we emphasize that Assumption 5.1 used in conjunction with simple update schemas like Algorithm 4.1, though quite simple, exhibits a good overall prediction rate that is really hard to top. We will see this, for example, in Sect. 8.4.4 where we will continue this

discussion. ■

So first experience supports Assumption 5.2. The presented results are also confirmed by similar tests on other data sets. Nevertheless, it is too early to speak about a full improvement. Yet our methodology may be subject to another critical objection: despite all random variations by the softmax policy, our recommendations are still the result of previous analyses and thus not fully statistically inde- pendent. This raises the question whether the presented results are indeed based on the effect of recommendations rather than their analytical selection.

Luckily the effect can be studied by comparison with the control group.

We remember that in the control group no recommendations of the RE algorithm are displayed. In the transaction log files described in Sect. 4.4 (column itemsAction), the RDE also stores the products that it would recommend if it would be allowed to do that. Since recommendation and control sessions are always mixed in time, these recommendations represent that current one of the RE algorithm. By treating these would-like recommendations in the same way as “real”

recommendations, we can repeat all tests and compare them for both recommendation and control group.

Example 5.3 We again used data from a real-world web shop; this time it was a fashion shop. We have analyzed data from two days with (in total) about 12,500 different products and 1.6 Mio. transactions. The procedure was exactly like that of Example 5.2 but now separately for the recommendation and the control group.

Although the recommendations have been less explorative than that of Example 5.2, we obtained similar results.

Figure5.4shows the quotient of conditional and unconditional probabilities for both groups in the same setting as Fig.5.3.

Not surprising the control group coefficientrsC_ctrlis about 1, whereas the one of the recommendation group is higher, between 2 and 3. As in Fig. 5.3 it clearly increases atkminẳ20 but then only slightly. The recommendation coefficientrsC

is about twice as high as that of Example 5.2 – this also corresponds to reality (recommendations in the fashion shop are more accepted) and confirmed by click

statistics. ■

5.4 Experimental Results 83

We summarize that first tests indicate the correctness of Assumption 5.2.

However, more advanced instruments, like factorizations presented in Chaps. 8, 9, and10, are required to increase its effectiveness.

Weaknesses of Current Recommendation Engines

On the Convergence and Implementation