THE FRACTAL STRUCTURE OF DATA REFERENCE- P13 pps

Based upon the projected residency times and hit ratios, we may then compute the cache storage requirements, as already discussed for the previous table.. THE FRACTAL STRUCTURE OF DATA R

Trang 1

the average residency time The adjusted hit ratios are obtained by applying the proportionality relationship expressed by (1.19), with θ = 25; for example, the miss ratio for the ERPapplication is projected as 36 x (262/197)-.25 = 34 Based upon the projected residency times and hit ratios, we may then compute the cache storage requirements, as already discussed for the previous table

In this way, we obtain an objective of 2256 megabytes for the cache size of the target system If these requirements could be met exactly, then we would project an aggregate hit ratio, for the three applications, of 72 percent

THE FRACTAL STRUCTURE OF DATA REFERENCE

As a final step, we must also consider how to round off the computed cache memory requirement of 2256 megabytes Since this requirement is very close to 2 gigabytes, we might choose, in this case, to round down rather than up Alternatively, it would also be reasonable to round up to, say, 3 gigabytes, on the grounds that the additional cache can be used for growth

in the workload To account for the rounding off of cache memory, we can apply the proportionality expressed by (1.23) Thus, after rounding down to 2 gigabytes, we would expect the aggregate miss ratio of the target system to be .28 x (2048/2256)-.25/.75 = 29, identical to the current aggregate miss ratio

of the three applications

If the available performance reporting tools are sufficiently complete, it is possible to refine the methods presented in the preceding example In the figures of the example, the stage size was assumed to be equal to 04 megabytes (a reasonable approximation for most OS/390 workloads) The capability to support direct measurements of this quantity has recently been incorporated into some storage controls; if such measurements are supported, they can be found in the System Measurement Facility (SMF) record type 74, subtype 5 Also, in the example, we used the total miss ratio as our measure of the percentage of I/O’s that require more cache memory to be allocated A loophole exists in this technique, however, due to the capability of most current storage controls to accept write requests without needing to wait for a stage to occur

In a storage control of this type, virtually all write requests will typically be reported as “hits,” even though some of them may require allocation of memory For database I/O, this potential source of error is usually not important, since write requests tend to be updates of data already in cache If, however, it is desired to account for any write “hits” that may nevertheless require allocation

of cache memory, counts of these can also be found in SMFrecord type 74,

subtype 5 (they are called write promotions).

Finally, we assumed the guestimate θ = 25 If measurements of the single-reference residency time are available, then θ can be quantified more precisely using (1.16)

Trang 2

2 ANALYSIS OF THE WORKlNG HYPOTHESIS

It is beyond the scope of the present chapter to analyze rigorously every potential source oferror in a capacity planning exercise ofthe typejust presented

in the previous section, nor does a “back of the envelope” approximation method require this Instead, we now focus on the following claim, central

to applications of the working hypothesis: that it makes very little difference

in the estimated hit ratio of the cache as a whole, whether the individual workloads within the cache are modeled with their correct average residency times, or whether they all are modeled assuming a common average residency time reflecting the conditions for the cache as a whole

Obviously, such a statement cannot hold in all cases Instead, it is a statement

about the realistic impact of typical variations between workloads As the data presented in Chapter 1 suggests, the values of the parameter θ, for distinct workloads within a given environment, often vary over a fairly narrow range This gives the proposed hypothesis an important head start, since the hypothesis would be exactly correct for a cache in which several workloads share the same value of the parameterθ In that case, the common value of θ, together with the fact that all the workloads must share a common single reference residency time τ would then imply, by (1.12), that the workloads must also the same average residency time as well

Consider, now, a cache whose activity can be described by the multiple

workload hierarchical reuse model; that is, the cache provides service to n individual workloads, i = 1, 2, , n, each of which can be described by the

hierarchical reuse model The true miss ratio of the cache as a whole is the weighted average of the individual workload miss ratios, weighted by I/O rate:

(3.2)

We must now consider the error that results from replacing the correct miss

ratio of each workload by the corresponding estimate m^

i, calculated using the

average residency time of the cache as a whole Using the proportionality

relationship expressed by (1.19), the values m ^

ican be written as

(3.3) Thus, the working hypothesis implies an overall miss ratio of

(3.4)

Trang 3

48 THE FRACTAL STRUCTURE OF DATA REFERENCE

To investigate the errors implied by this calculation, we write it in the

alternative form

were we define

(3.5)

This expression for m^ can be expanded by applying the binomial theorem:

(3.6)

where the “little-o” notation indicates terms higher than second order

Using (1.16), we define

to be the aggregate value of 8 for the cache as a whole Note, as a result, that

in addition to the definition already given, ζialso has the equivalent definition

– (3.7) where we have applied (1.12) and taken advantage of the fact that each workload

must share the same, common value oft

By applying (1.16)), we may rewrite the first-order terms of (3.6) as follows:

(3.8)

But since each miss corresponds to a cache visit, the aggregate residency

time is computed over misses; that is,

(3.9)

Trang 4

and (3.8) reduces to

Combining (3.2), (3.6), and (3.10), we now have

(3.1 1)

Thus, m^ = m except for second -order and higher terms

In a region sufficiently close to θ1 = θ2 = = θn = θ (or equivalently,

approximated as uniformly zero The region where these second-order terms have at most a minor impact is that in which |ζi| << 1 for i = 1, 2, , n This

requirement permits wide variations in the workloads sharing the cache

For example, suppose that there are two workloads i = 1, 2, with valuesθi

equal to 1 and 3 respectively; and suppose that these workloads share a cache

in which, overall, we have θ = 2 Then the absolute value of ζiis no greater than 1/.7 = 14 for either workload As a result, the absolute value of either of the second order summation terms of (3.11), calculated without the summation

weights ri m i/ rm, does not exceed 02 But the summation of these terms, multiplied by the weights ri m i/rm, is merely a weighted average; so in the

case of the example, the quantity just stated is the largest relative error, in either direction, that can be made by neglecting the second order terms (i.e the error

can be no larger than 2 percent of m) Since the second order terms are so

relatively insignificant, we may conclude that the third-order and higher terms,

shown as o(ζ2

) must be vanishingly small

Trang 5

50 THE FRACTAL STRUCTURE OF DATA REFERENCE

This chapter’s working hypothesis has also proved itself in actual empirical use, without recourse to formal error analysis [22] Its practical success con-firms that the first-order approximation just obtained remains accurate within

a wide enough range of conditions to make it an important practical tool

Định dạng
Số trang	5
Dung lượng	130,04 KB