Figure 1.3 represents thesimple component network corresponding to the video decoding example.Examples of components are tasks that are executed on computingresources or data communicati
Trang 1In addition, in many cases, the same simulation environment can be usedfor both function and performance verifications However, most simulation-based performance estimation methods suffer from insufficient corner-casecoverage This means that they are typically not able to provide worst-caseperformance guarantees Moreover, accurate simulations are often computa-tionally expensive.
In other works [5,6], hybrid performance estimation methods have beenpresented that combine simulation and analytic techniques While theseapproaches considerably shorten the simulation run-times, they still cannotguarantee full coverage of corner cases
To determine guaranteed performance limits, analytic methods must beadopted These methods provide hard performance bounds; however, theyare typically not able to model complex interactions and state-dependentbehaviors, which can result in pessimistic performance bounds
Several models and methods for analytic performance verifications of tributed platforms have been presented so far These approaches are based
dis-on essentially different abstractidis-on cdis-oncepts The first idea was to extendwell-known results of the classical scheduling theory to distributed sys-tems This implies the consideration of communication delays, which cannot
be neglected in a distributed system Such a combined analysis of sor and bus scheduling is often referred to as holistic scheduling analysis.Rather than a specific performance analysis method, holistic scheduling is
proces-a collection of techniques for the proces-anproces-alysis of distributed plproces-atforms, eproces-ach ofwhich is tailored toward a particular combination of an event stream model,
a resource-sharing policy, and communication arbitration (see [10,11,15] asexamples) Several holistic analysis techniques are aggregated and imple-mented in the modeling and analysis suite for real-time applications (MAST)[3].∗
In [12], a more general approach to extend the concepts of the classicalscheduling theory to distributed systems was presented In contrast to holis-tic approaches that extend the monoprocessor scheduling analysis to specialclasses of distributed systems, this compositional method applies existinganalysis techniques in a modular manner: the single components of a dis-tributed system are analyzed with classical algorithms, and the local resultsare propagated through the system by appropriate interfaces relying on alimited set of event stream models
In this chapter, we will describe a different analytic and modularapproach for performance prediction that does not rely on the classicalscheduling theory The method uses real-time calculus [13] (RTC), whichextends the basic concepts of network calculus [7] The corresponding mod-ular performance analysis (MPA) framework [1] analyzes the flow of eventstreams through a network of computation and communication resources
∗Available as Open Source software at http://mast.unican.es
Trang 21.2 Application Scenario
In this section, we introduce the reader to the system-level performanceanalysis by means of a concrete application scenario from the area of videoprocessing Intentionally, this example is extremely simple in terms of theunderlying hardware platform and the application model On the otherhand, it allows us to introduce the concepts that are necessary for a com-positional performance analysis (see Section 1.4)
The example system that we consider is a digital set-top box for thedecoding of video streams The architecture of the system is depicted inFigure 1.2 The set-top box implements a picture-in-picture (PiP) applicationthat decodes two concurrent MPEG-2 video streams and displays them on
the same output device The upper stream, VHR, has a higher frame
reso-lution and is displayed in full screen whereas the lower stream, VLR, has alower frame resolution and is displayed in a smaller window at the bottomleft edge of the screen
The MPEG-2 video decoding consists of the following tasks: variablelength decoding (VLD), inverse quantization (IQ), inverse discrete cosinetransformation (IDCT), and motion compensation (MC) In the consideredset-top box, the decoding application is partitioned onto three processors:CPU1, CPU2, and CPU3 The tasks VLD and IQ are mapped onto CPU1
for the first video stream (process P1) and onto CPU2for the second video
stream (process P3) The tasks IDCT and MC are mapped onto CPU3for both
video streams (processes P2and P4) A pre-emptive fixed priority scheduler
is adopted for the sharing of CPU3between the two streams, with the upperstream having higher priority than the lower stream This reflects the factthat the decoder gives a higher quality of service (QoS) to the stream with a
higher frame resolution, VHR
As shown in the figure, the video streams arrive over a network and enterthe system after some initial packet processing at the network interface The
inputs to P1 and P3 are compressed bitstreams and their outputs are
par-tially decoded macroblocks, which serve as inputs to P2and P4 The fully
decoded video streams are then fed into two traffic-shaping components S1and S2, respectively This is necessary because the outputs of P2and P4arepotentially bursty and need to be smoothed out in order to make sure that
no packets are lost by the video interface, which cannot handle more than acertain packet rate per stream
We assume that the arrival patterns of the two streams, VHR and VLR,from the network as well as the execution demands of the various tasks inthe system are known The performance characteristics that we want to ana-lyze are the worst-case end-to-end delays for the two video streams fromthe input to the output of the set-top box Moreover, we want to analyze thememory demand of the system in terms of worst-case packet buffer occupa-tion for the various tasks
Trang 4In Section 1.3, we at first will formally describe the above system in theconcrete time domain In principle, this formalization could directly be used
in order to perform a simulation; in our case, it will be the basis for the MPAdescribed in Section 1.4
1.3 Representation in the Time Domain
As can be seen from the example described in Section 1.2, the basic model
of computation consists of component networks that can be described as aset of components that are communicating via infinite FIFO (first-in first-out)buffers denoted as channels Components receive streams of tokens via theirinput channels, operate on the arriving tokens, and produce output tokensthat are sent to the output channels We also assume that the components needresources in order to actually perform operations Figure 1.3 represents thesimple component network corresponding to the video decoding example.Examples of components are tasks that are executed on computingresources or data communication via buses or interconnection networks.Therefore, the token streams that are present at the inputs or outputs of acomponent could be of different types; for example, they could representsimple events that trigger tasks in the corresponding computation compo-nent or they could represent data packets that need to be communicated
1.3.1 Arrival and Service Functions
In order to describe this model in greater detail, at first we will describestreams in the concrete time domain To this end, we define the concept of
arrival functions: R (s, t) ∈ R≥0 denotes the amount of tokens that arrive inthe time interval [s, t) for all time instances, s, t ∈ R, s < t, and R(t, t) = 0.
Depending on the interpretation of a token stream, an arrival function may
be integer valued, i.e., R (s, t) ∈ Z≥0 In other words, R (s, t) “counts” the
Trang 5Sec-number of tokens in a time interval Note that we are taking a very liberaldefinition of a token here: It just denotes the amount of data or events thatarrive in a channel Therefore, a token may represent bytes, events, or evendemanded processing cycles.
In the component network semantics, tokens are stored in channels thatconnect inputs and outputs of components Let us suppose that we had
determined the arrival function R(s, t) corresponding to a component
out-put (that writes tokens into a channel) and the arrival function R (s, t)
corre-sponding to a component input (that removes tokens from the channel); then
we can easily determine the buffer fill level, B (t), of this channel at some time
t : B (t) = B(s) + R(s, t) − R(s, t).
As has been described above, one of the major elements of the model isthat components can only advance in their operation if there are resourcesavailable As resources are the first-class citizens of the performance anal-
ysis, we define the concept of service functions: C (s, t) ∈ R≥0 denotes theamount of available resources in the time interval[s, t) for all time instances,
s , t ∈ R, s < t, and C(t, t) = 0 Depending on the type of the underlying resource, C (s, t) may denote the accumulated time in which the resource is
fully available for communication or computation, the amount of processingcycles, or the amount of information that can be communicated in[s, t).
1.3.2 Simple and Greedy Components
Using the above concept of arrival functions, we can describe a set of verysimple components that only perform data conversions and synchronization
• Tokenizer: A tokenizer receives fractional tokens at the input that may
correspond to a partially transmitted packet or a partially executedtask A discrete output token is only generated if the whole process-ing or communication of the predecessor component is finished With
the input and output arrival functions R (s, t) and R(s, t), respectively,
we obtain as a transfer function R(s, t) = R(s, t).
• Scaler: Sometimes, the units of arrival and service curves do not match For example, the arrival function, R, describes a number of events and the service function, C, describes resource units Therefore, we need to introduce the concept of scaling: R(s, t) = w · R(s, t), with the positive
scaling factor, w For example, w may convert events into processor
cycles (in case of computing) or into number of bytes (in case of munication) A much more detailed view on workloads and their mod-eling can be found in [8], for example, modeling time-varying resourceusage or upper and lower bounds (worst-case and best-case resourcedemands)
com-• AND and OR: As a last simple example, let us suppose a component
that only produces output tokens if there are tokens on all inputs(AND) Then the relation between the arrival functions at the inputs
Trang 6R1(s, t) and R2(s, t), and output R(s, t) is R(s, t) = min{B1(s) + R1(s, t),
B2(s) + R2(s, t)}, where B1(s) and B2(s) denote the buffer levels in the
input channels at time s If the component produces an output token for every token at any input (OR), we find R(s, t) = R1(s, t) + R2(s, t).
The elementary components described above do not interact with theavailable resources at all On the other hand, it would be highly desirable
to express the fact that a component may need resources in order to
oper-ate on the available input tokens A greedy processing component (GPC) takes
an input arrival function, R (s, t), and produces an output arrival function,
R(s, t), by means of a service function, C(s, t) It is defined by the
input/out-put relation
R(s, t) = inf
s ≤λ≤t {R(s, λ) + C(λ, t) + B(s), C(s, t)}
where B (s) denotes the initial buffer level in the input channel The remaining
service function of the remaining resource is given by
C(s, t) = C(s, t) − R(s, t)
The above definition can be related to the intuitive notion of a greedy
component as follows: The output between some time λ and t cannot
be larger than C (λ, t), and, therefore, R(s, t) ≤ R(s, λ) + C(λ, t), and also
available at the input, we also have R(s, λ) ≤ R(s, λ) + B(s), and, therefore,
R(s, t) ≤ min{R(s, λ) + C(λ, t) + B(s), C(s, t)} Let us suppose that there is
some last time λ∗ before t when the buffer was empty At λ∗, we clearly
have R(s, λ∗) = R(s, λ∗) + B(s) In the interval from λ∗ to t, the buffer isnever empty and all available resources are used to produce output tokens:
R(s, t) = R(s, λ∗)+B(s)+C(λ∗, t ) If the buffer is never empty, we clearly have
R(s, t) = C(s, t), as all available resources are used to produce output tokens.
As a result, we obtain the mentioned input–output relation of a GPC.Note that the above resource and timing semantics model almost all prac-tically relevant processing and communication components (e.g., processorsthat operate on tasks and use queues to keep ready tasks, communicationnetworks, and buses) As a result, we are not restricted to model the process-ing time with a fixed delay The service function can be chosen to represent
a resource that is available only in certain time intervals (e.g., time divisionmultiple access [TDMA] scheduling), or which is the remaining service after
a resource has performed other tasks (e.g., fixed priority scheduling) Notethat a scaler can be used to perform the appropriate conversions betweentoken and resource units Figure 1.4 depicts the examples of concrete com-ponents we considered so far Note that further models of computation can
be described as well, for example, (greedy) Shapers that limit the amount of output tokens to a given shaping function, σ, according to R(s, t) ≤ σ(t − s)
(see Section 1.4 and also [19])
Trang 7Scaler Tokenizer AND OR GPC Shaper
For example, the input events described by the arrival function, RLR,
trig-ger the tasks in the process P3, which runs on CPU2 whose availability is
described by the service function, C2 The output drives the task in the
pro-cess P4, which runs on CPU3with a second priority This is modeled by
feed-ing the GPC component with the remainfeed-ing resources from the process P2
We can conclude that the flow of event streams is modeled by connectingthe “arrival” ports of the components and the scheduling policy is modeled
by connecting their “service” ports Other scheduling policies like the preemptive fixed priority, earliest deadline first, TDMA, general processorshare, various servers, as well as any hierarchical composition of these poli-cies can be modeled as well (see Section 1.4)
non-1.4 Modular Performance Analysis with Real-Time
Calculus
In the previous section, we have presented the characterization of event andresource streams, and their transformation by elementary concrete processes
We denote these characterizations as concrete, as they represent components,
event streams, and resource availabilities in the time domain and work onconcrete stream instances only However, event and resource streams canexhibit a large variability in their timing behavior because of nondetermin-ism and interference The designer of a real-time system has to provide per-formance guarantees that cover all possible behaviors of a distributed system
Trang 8and its environment In this section, we introduce the abstraction of the MPAwith the RTC [1] (MPA-RTC) that provides the means to capture all possibleinteractions of event and resource streams in a system, and permits to derivesafe bounds on best-case and worst-case behaviors.
This approach was first presented in [13] and has its roots in network culus [7] It permits to analyze the flow of event streams through a network
cal-of heterogeneous computation and communication resources in an ded platform, and to derive hard bounds on its performance
embed-1.4.1 Variability Characterization
In the MPA, the timing characterization of event streams and of the resource
availability is based on the abstractions of arrival curves and service curves,
respectively Both the models belong to the general class of variability acterization curves (VCCs), which allow to precisely quantify the best-caseand worst-case variabilities of wide-sense-increasing functions [8] For sim-plicity, in the rest of the chapter we will use the term VCC if we want to refer
char-to either arrival or service curves
In the MPA framework, an event stream is described by a tuple of arrivalcurves, α(Δ) = [αl(Δ), αu(Δ)], where αl : R≥0 → R≥0 denotes the lowerarrival curve and αu : R≥0 → R≥0 the upper arrival curve of the eventstream We say that a tuple of arrival curves, α(Δ), conforms to an event
stream described by the arrival function, R (s, t), denoted as α |= R iff for all
t > s we have αl(t − s) ≤ R(s, t) ≤ αu(t − s) In other words, there will be
at least αl(Δ) events and at most αu(Δ) events in any time interval [s, t) with
t − s = Δ.
In contrast to arrival functions, which describe one concrete trace of anevent stream, a tuple of arrival curves represents all possible traces of a stream.Figure 1.5a shows an example tuple of arrival curves Note that any eventstream can be modeled by an appropriate pair of arrival curves, which meansthat this abstraction substantially expands the modeling power of standardevent arrival patterns such as sporadic, periodic, or periodic with jitter.Similarly, the availability of a resource is described by a tuple of servicecurves, β(Δ) = [βl(Δ), βu(Δ)], where βl : R≥0 → R≥0 denotes the lowerservice curve and βu : R≥0 → R≥0the upper service curve Again, we saythat a tuple of service curves, β(Δ), conforms to an event stream described
by the service function, C (s, t), denoted as β |= C iff for all t > s we have
βl(t − s) ≤ C(s, t) ≤ βu(t − s) Figure 1.5b shows an example tuple of service
curves
Note that, as defined above, the arrival curves are expressed in terms
of events while the service curves are expressed in terms of workload/service units However, the component model described in Section 1.4.2requires the arrival and service curves to be expressed in the same unit.The transformation of event-based curves into resource-based curves andvice versa is done by means of so-called workload curves which are VCCs
Trang 9(a) Abstract and (b) concrete GPCs.
themselves Basically, these curves define the minimum and maximumworkloads imposed on a resource by a given number of consecutive events,i.e., they capture the variability in execution demands More details aboutworkload transformations can be found in [8] In the simplest case of a con-
stant workload w for all events, an event-based curve is transformed into a resource-based curve by simply scaling it by the factor w This can be done
by an appropriate scaler component, as described in Section 1.3
can identify six components: the four tasks, P1, P2, P3and P4, as well as the
two shaper components, S1and S2
In the MPA framework, an abstract component is a model of the
process-ing semantics of a concrete component, for instance, an application task or
a concrete dedicated HW/SW unit An abstract component models the cution of events by a computation or communication resource and can be
Trang 10exe-seen as a transformer of abstract event and resource streams As an example,Figure 1.6 shows an abstract and a concrete GPC.
Abstract components transform input VCCs into output VCCs, that is,they are characterized by a transfer function that relates input VCCs to out-
put VCCs We say that an abstract component conforms to a concrete
com-ponent if the following holds: Given any set of input VCCs, let us choose anarbitrary trace of concrete component inputs (event and resource streams)that conforms to the input VCCs Then, the resulting output streams mustconform to the output VCCs as computed using the abstract transfer func-tion In other words, for any input that conforms to the corresponding inputVCCs, the output must also conform to the corresponding output VCCs
In the case of the GPC depicted in Figure 1.6, the transfer function Φ of theabstract component is specified by a set of functions that relate the incomingarrival and service curves to the outgoing arrival and service curves In thiscase, we have Φ= [fα, fβ] with α= fα (α, β) and β= fβ (α, β).
1.4.3 Component Examples
In the following, we describe the abstract components of the MPA work that correspond to the concrete components introduced in Section 1.3:scaler, tokenizer, OR, AND, GPC, and shaper
frame-Using the above relation between concrete and abstract components, wecan easily determine the transfer functions of the simple components, tok-enizer, scaler, and OR, which are depicted in Figure 1.4
• Tokenizer: The tokenizer outputs only integer tokens and is ized by R(s, t) = R(s, t) Using the definition of arrival curves, we
character-simply obtain as the abstract transfer function αu(Δ) = αu(Δ) and
by the availability of resources Such a behavior can be modeled with thefollowing internal relations that are proven in [17]:∗
∗The deconvolutions in min-plus and max-plus algebra are defined as
supλ≥0 λ≥0{f (Δ + λ) − g(λ)}, respectively The lution in min-plus algebra is defined as(f ⊗ g)(Δ) = inf0≤λ≤Δ{f (Δ − λ) + g(λ)}.
Trang 11In the example system of Figure 1.2, the processing semantics of the tasks
P1, P2, P3, and P4can be modeled with abstract GPCs
Finally, let us consider a component that is used for event stream shaping
A greedy shaper component (GSC) with a shaping curve σ delays events of
an input event stream such that the output event stream has σ as an upperarrival curve Additionally, a greedy shaper guarantees that no events aredelayed longer than necessary Typically, greedy shapers are used to reshapebursty event streams and to reduce global buffer requirements If the abstractinput event stream of a GSC with the shaping curve, σ, is represented by thetuple of arrival curves,[αl, αu], then the output of the GSC can be modeled
as an abstract event stream with arrival curves:
αuGSC = αu⊗ σ αlGSC = αlNote that a greedy shaper does not need any computation or communica-tion resources Thus, the transfer function of an abstract GSC considers onlythe ingoing and the outgoing event stream, as well as the shaping curve, σ.More details about greedy shapers in the context of MPA can be found in [19]
In the example system of Figure 1.2, the semantics of the shapers, S1and
S2, can be modeled with abstract GSCs
1.4.4 System Performance Model
In order to analyze the performance of a distributed embedded platform, it
is necessary to build a system performance model This model has to sent the hardware architecture of the platform In particular, it has to reflectthe mapping of tasks to computation or communication resources and thescheduling policies adopted by these resources
repre-To obtain a performance model of a system, we first have to model theevent streams that trigger the system, the computation and communicationresources that are available, and the processing components Then, we have
to interconnect the arrival and service inputs and outputs of all these ments so that the architecture of the system is correctly represented
ele-Figure 1.7 depicts the MPA performance model for the example systemdescribed in Figure 1.2 Note that the outgoing abstract service stream ofGPC2 is used as the ingoing abstract service stream for GPC4, i.e., GPC4gets only the resources that are left by GPC2 This represents the fact thatthe two tasks share the same processor and are scheduled according to a
Trang 12FIGURE 1.7
Performance model for the example system in Figure 1.2
pre-emptive fixed priority scheduling policy with GPC2having a higher ority than GPC4
pri-In general, scheduling policies for shared resources can be modeled bythe way the abstract resources β are distributed among the different abstracttasks For some scheduling policies, such as earliest deadline first (EDF) [16],TDMA [20], nonpreemptive fixed priority scheduling [4], various kinds ofservers [16], or any hierarchical composition of these elementary policies,abstract components with appropriate transfer functions have been intro-duced Figure 1.8 shows some examples of how to model different schedul-ing policies within the MPA framework
1.4.5 Performance Analysis
The performance model provides the basis for the performance analysis
of a system Several performance characteristics such as worst-case end delays of events or buffer requirements can be determined analyticallywithin the MPA framework
end-to-The performance of each abstract component can be determined as afunction of the ingoing arrival and service curves by the formulas of the RTC
For instance, the maximum delay, dmax, experienced by an event of an event
stream with arrival curves,[αl, αu], that is processed by a GPC on a resourcewith service curves,[βl, βu], is bounded by
dmax≤ sup
λ ≥0
inf{τ ≥ 0 : αu(λ) ≤ βl(λ + τ)}def= Del(αu, βl)
The maximum buffer space, bmax, that is required to buffer an event stream
with arrival curves,[αl, αu], that is processed by a GPC on a resource withservice curves,[βl, βu], is bounded by
bmax≤ sup
λ ≥0{αu(λ) − βl(λ)}def= Buf(αu, βl)
Trang 13αB
αA
αBβ
In order to compute the end-to-end delay of an event stream over severalconsecutive GPCs, one can simply add the single delays at the various com-ponents Besides this strictly modular approach, one can also use a holisticdelay analysis that takes into consideration that in a chain of task the worst-case burst cannot appear simultaneously in all tasks (This phenomenon isdescribed as “pay burst only once” [7].) For such a task chain the total delaycan be tightened to
Trang 14Graphical interpretation of dmaxand bmax.
Let us come back to the example of Figure 1.2 By applying the abovereasoning, the worst-case end-to-end delay for the packets of the two videostreams can be analytically bounded by
The performance analysis method presented above relies on computations
on arrival and service curves While the RTC provides compact cal representations for the different operations on curves, their computation
mathemati-in practice is typically more mathemati-involved The mamathemati-in issue is that the VCCs aredefined for the infinite range of positive real numbers However, any com-putation on these curves requires a finite representation
To overcome this problem, we introduce a compact representation forspecial classes of VCCs In particular, we consider piecewise linear VCCsthat are finite, periodic, or mixed
• Finite piecewise linear VCCs consist of a finite set of linear segments
• Periodic piecewise linear VCCs consist of a finite set of linear segmentsthat are repeated periodically with a constant offset between consecu-tive repetitions
• Mixed piecewise linear VCCs consist of a finite set of linear segmentsthat are followed by a second finite set of linear segments that arerepeated periodically, again with a constant offset between consecu-tive repetitions
Figure 1.10a through c shows examples of these three classes of curves.Many practically relevant arrival and service curves are piecewise linear.For example, if a stream consists of a discrete token, the corresponding
Trang 15Here, we want to note that there are also piecewise linear VCCs that arenot covered by the three classes of curves that we have defined above Inparticular, we have excluded irregular VCCs, that is, VCCs with an infinitenumber of linear segments that do not eventually show periodicity.However, most practically relevant timing specifications for event streamsand availability specifications for resources can be captured by either finite,
...1.4.4 System Performance Model
In order to analyze the performance of a distributed embedded platform, it
is necessary to build a system performance model This model... framework
1.4.5 Performance Analysis
The performance model provides the basis for the performance analysis
of a system Several performance characteristics such... the MPA performance model for the example systemdescribed in Figure 1.2 Note that the outgoing abstract service stream ofGPC2 is used as the ingoing abstract service stream for GPC4,