Based upon the projected residency times and hit ratios, we may then compute the cache storage requirements, as already discussed for the previous table.. THE FRACTAL STRUCTURE OF DATA R
Trang 1the average residency time The adjusted hit ratios are obtained by applying the proportionality relationship expressed by (1.19), with θ = 25; for example, the miss ratio for the ERPapplication is projected as 36 x (262/197)-.25 = 34 Based upon the projected residency times and hit ratios, we may then compute the cache storage requirements, as already discussed for the previous table
In this way, we obtain an objective of 2256 megabytes for the cache size of the target system If these requirements could be met exactly, then we would project an aggregate hit ratio, for the three applications, of 72 percent
THE FRACTAL STRUCTURE OF DATA REFERENCE
As a final step, we must also consider how to round off the computed cache memory requirement of 2256 megabytes Since this requirement is very close to 2 gigabytes, we might choose, in this case, to round down rather than up Alternatively, it would also be reasonable to round up to, say, 3 gigabytes, on the grounds that the additional cache can be used for growth
in the workload To account for the rounding off of cache memory, we can apply the proportionality expressed by (1.23) Thus, after rounding down to 2 gigabytes, we would expect the aggregate miss ratio of the target system to be .28 x (2048/2256)-.25/.75 = 29, identical to the current aggregate miss ratio
of the three applications
If the available performance reporting tools are sufficiently complete, it is possible to refine the methods presented in the preceding example In the figures of the example, the stage size was assumed to be equal to 04 megabytes (a reasonable approximation for most OS/390 workloads) The capability to support direct measurements of this quantity has recently been incorporated into some storage controls; if such measurements are supported, they can be found in the System Measurement Facility (SMF) record type 74, subtype 5 Also, in the example, we used the total miss ratio as our measure of the percentage of I/O’s that require more cache memory to be allocated A loophole exists in this technique, however, due to the capability of most current storage controls to accept write requests without needing to wait for a stage to occur
In a storage control of this type, virtually all write requests will typically be reported as “hits,” even though some of them may require allocation of memory For database I/O, this potential source of error is usually not important, since write requests tend to be updates of data already in cache If, however, it is desired to account for any write “hits” that may nevertheless require allocation
of cache memory, counts of these can also be found in SMFrecord type 74,
subtype 5 (they are called write promotions).
Finally, we assumed the guestimate θ = 25 If measurements of the single-reference residency time are available, then θ can be quantified more precisely using (1.16)
Trang 22 ANALYSIS OF THE WORKlNG HYPOTHESIS
It is beyond the scope of the present chapter to analyze rigorously every potential source oferror in a capacity planning exercise ofthe typejust presented
in the previous section, nor does a “back of the envelope” approximation method require this Instead, we now focus on the following claim, central
to applications of the working hypothesis: that it makes very little difference
in the estimated hit ratio of the cache as a whole, whether the individual workloads within the cache are modeled with their correct average residency times, or whether they all are modeled assuming a common average residency time reflecting the conditions for the cache as a whole
Obviously, such a statement cannot hold in all cases Instead, it is a statement
about the realistic impact of typical variations between workloads As the data presented in Chapter 1 suggests, the values of the parameter θ, for distinct workloads within a given environment, often vary over a fairly narrow range This gives the proposed hypothesis an important head start, since the hypothesis would be exactly correct for a cache in which several workloads share the same value of the parameterθ In that case, the common value of θ, together with the fact that all the workloads must share a common single reference residency time τ would then imply, by (1.12), that the workloads must also the same average residency time as well
Consider, now, a cache whose activity can be described by the multiple
workload hierarchical reuse model; that is, the cache provides service to n individual workloads, i = 1, 2, , n, each of which can be described by the
hierarchical reuse model The true miss ratio of the cache as a whole is the weighted average of the individual workload miss ratios, weighted by I/O rate:
(3.2)
We must now consider the error that results from replacing the correct miss
ratio of each workload by the corresponding estimate m^
i, calculated using the
average residency time of the cache as a whole Using the proportionality
relationship expressed by (1.19), the values m ^
ican be written as
(3.3) Thus, the working hypothesis implies an overall miss ratio of
(3.4)
Trang 348 THE FRACTAL STRUCTURE OF DATA REFERENCE
To investigate the errors implied by this calculation, we write it in the
alternative form
were we define
(3.5)
This expression for m^ can be expanded by applying the binomial theorem:
(3.6)
where the “little-o” notation indicates terms higher than second order
Using (1.16), we define
to be the aggregate value of 8 for the cache as a whole Note, as a result, that
in addition to the definition already given, ζialso has the equivalent definition
– (3.7) where we have applied (1.12) and taken advantage of the fact that each workload
must share the same, common value oft
By applying (1.16)), we may rewrite the first-order terms of (3.6) as follows:
(3.8)
But since each miss corresponds to a cache visit, the aggregate residency
time is computed over misses; that is,
(3.9)
Trang 4and (3.8) reduces to
Combining (3.2), (3.6), and (3.10), we now have
(3.1 1)
Thus, m^ = m except for second -order and higher terms
In a region sufficiently close to θ1 = θ2 = = θn = θ (or equivalently,
approximated as uniformly zero The region where these second-order terms have at most a minor impact is that in which |ζi| << 1 for i = 1, 2, , n This
requirement permits wide variations in the workloads sharing the cache
For example, suppose that there are two workloads i = 1, 2, with valuesθi
equal to 1 and 3 respectively; and suppose that these workloads share a cache
in which, overall, we have θ = 2 Then the absolute value of ζiis no greater than 1/.7 = 14 for either workload As a result, the absolute value of either of the second order summation terms of (3.11), calculated without the summation
weights ri m i/ rm, does not exceed 02 But the summation of these terms, multiplied by the weights ri m i/rm, is merely a weighted average; so in the
case of the example, the quantity just stated is the largest relative error, in either direction, that can be made by neglecting the second order terms (i.e the error
can be no larger than 2 percent of m) Since the second order terms are so
relatively insignificant, we may conclude that the third-order and higher terms,
shown as o(ζ2
) must be vanishingly small
Trang 550 THE FRACTAL STRUCTURE OF DATA REFERENCE
This chapter’s working hypothesis has also proved itself in actual empirical use, without recourse to formal error analysis [22] Its practical success con-firms that the first-order approximation just obtained remains accurate within
a wide enough range of conditions to make it an important practical tool