Dynamic Optimization using Machine Learning- 123docz.net

2.3 State-of-the-art Dynamic Optimization

2.3.2 Dynamic Optimization using Machine Learning

We now turn to two retrieval models that leverage the flexibility of machine learning in order to achieve high levels of efficiency.

2.3.2.1 Cascade Rank Model

The Cascade Rank Model (CRM) was developed by (Wang et al., 2011) as an efficiency approach for learning-to-rank (LTR) models, where the input for ranking is a vector of features. For each query, the features are extracted and are partitioned into sets of stages such that the features are ordered according to their time cost/rank performance. The authors then use a variant of the AdaRank algorithm (Xu & Li, 2007) to order the features into the stages. The intuition here is that instead of only learning the best ranking model, the algorithm learns the best ranking model given a certain execution cost.

The CRM is one of the few optimization techniques that can explicitly learn for budgeted time constraints. Given the ordering of stages, the CRM can use the partitioning to honor a possible time constraint parameter, such as one that might be given in a real-time system. After each stage, the algorithm can decide to continue

refining the running score of the current set of candidates, or terminate with the given candidate set. Note that given this construction, the CRM cannot guarantee score-safe ranking unless the time constraint is omitted.

Each stage Si consists of a pair of components {Ji, Hi}, where Ji is a pruning function andHi is a local ranking function. The correct composition and ordering of stages is what is learned via the AdaRank variant, which is shown in Algorithm 4.

Algorithm 4The boosting algorithm for cascade learning.

LearnCascade(Q,J,H) 1 N = |Q|

2 S = {}

3 for each qi ∈Q 4 P1(qi) = 1/N 5 for t = 1 to T

6 SelectSt=hJt(βt), Ht,ãiover the training instances weighted by Pt 7 Set αt = 12ln

Pt(qi)

1−γãC(St,qi)ã(1+E(St,qi)) P

Pt(qi)

1−γãC(St,qi)ã(1−E(St,qi))

8 AddSt=hJt(βt), Ht, αti toS 9 UpdatePt toPt+1:

10 for eachqi ∈Q

11 Pt+1(qi) = exp(−E(St,qi)) exp(γãC(S,qi)) P

qiexp(−E(St,qi)) exp(γãC(S,qi))

12 return S

The parameters supplied to the algorithm are Q, a set of training queries;J, the set of pruning functions to select from; andH, the set of local retrieval functions to cascade. We initialize the algorithm in Lines 1-4, particularly each training query is uniformly weighted. As we progress, at each iteration we construct the stageStthat would provide the most improvement in retrieval effectiveness, balanced against the

cost of that stage (using the cost function C). We set the weight αt for the stage, add it to the cascade, and update the weights of the queries (Lines 8-11). Note that in Line 6, functionJi is sampled from J with replacement, whileHi is sampled without replacement (otherwise we may construct a cascade of only a single ranking function).

2.3.2.2 Selective Pruning Strategy

The Selective Pruning Strategy (SPS) is a modification to the WAND algorithm by (Tonellotto et al., 2013). In this learned version of the algorithm, all query components (features) are given estimates towards their computational cost and their difficulty. Using these estimates, the number of requested results (K) and the aggressiveness (α) of WAND are estimated and used to run the WAND algorithm for an initial retrieval. If a learning component (such as AdaRank) is also part of the model, then the learned model features are extracted from the top-K results, and used for re-ranking the top-K with the learned model. This is not unlike the CRM, in that low-cost evaluation is performed earlier in the retrieval, and high-cost evaluation is used to refine the initial result list.

Algorithm 5The Weak-AND scoring algorithm.

SelectivePruning(Q)

1 {K, α} = Select(Predict(Q)) 2 S = Wand(Q, K, α)

3 F = Extract(Q, S)

4 R = Apply(F,Model(K, α)) 5 return R

Both processes can be paralleled with standard result refinement or reranking approaches. The CRM begins with an approximate ranking, but uses additional information external to the initial retrieval to iteratively refine the results. SPS more closely uses the base model as a fast approximation of the end results, and then refines its result using a learned model where the expected execution cost is much higher.

Another noteworthy aspect of both approaches is that they fundamentally rely on the same mechanism for optimization as the algorithmic methods: the way in which the query is scored must be decomposable in some way. Specifically, all four algorithms receive some set of components that when taken as a whole (e.g. summed together) represent the entire evaluation of the query. We shall leverage this fact later on to show that the optimizations provided here can transfer from the algorithmic techniques to the machine-learned ones.

Dynamic Optimization using Machine Learning

Problem: Bigger and Bigger Queries

Study 2: Examining the Correlation Between Exposure and