Mapping Execution platform Network m1 os1 pe1 os2 pe2 m2 FIGURE 5.5 System-level model of an embedded system.. 5.3.3 Task Mapping A mapping is a static allocation of tasks to processing
Trang 1Mapping
Execution platform Network
m1
os1 pe1
os2 pe2 m2
FIGURE 5.5
System-level model of an embedded system
these layers for a very simple example of an embedded system, which will
be used to explain the aspects of the model throughout the chapter
• The application is described by a collection of communicating sequential
tasks Each task is characterized by four timing properties, describedlater The dependencies between tasks are captured by an acyclicdirected graph (called a “task graph”), which might not be fully con-nected
• The execution platform consists of several processing elements of
possi-bly different types and clock frequencies Each processing element willrun its own real-time operating system, scheduling tasks in a priority-driven manner (static or dynamic), according to their priorities, depen-dencies, and resource usage When a task needs to communicate with
a task on another processing element, it uses a network The setup ofthe network between processing elements must also be specified, and
is part of the platform
• The “mapping” between the application and the execution platform(shown as dashed arrows in the figure) is done by placing each task on
a specific processing element In our model, this mapping is static, andtasks cannot migrate during run-time
The top level of the embedded system consists of an application mappedonto an execution platform This mapping is depicted in Figure 5.5 withdashed arrows The timing characteristics in Table 5.2 originate from [SL96],while the memory and power figures (in Table 5.4) are created for the pur-pose of demonstrating parameters of an embedded system We will elaborate
on the various parameters in the following
Trang 2The task graph for the application can be thought of as an abstraction of
a set of independent sequential programs that are executed on the tion platform Each program is modeled as a directed acyclic graph of taskswhere edges indicate causal dependencies Dependencies are shown withsolid arrows in Figure 5.5 A task is a piece of a sequential code and is con-sidered to be an atomic unit for scheduling A task τjis periodic and is char-acterized by a “period” πj, a “deadline” δj, an initial “offset” ωj, and a fixed
execu-priority fp j(used when an operating system uses fixed priority scheduling).The properties of periodic tasks (except the fixed priority) can be seen inTable 5.2 and are all given in some time unit
5.3.2 Execution Platform Model
The execution platform is a heterogeneous system, in which a number of
processing elements, pe1, , pe n, are connected through a network
5.3.2.1 Processing-Element Model
A processing element pe i is characterized by a “clock frequency” f i, a “local
memory” m i with a bounded size, and a “real-time operating system” os i.The operating system handles synchronization of tasks according to theirdependencies using direct synchronization [SL96]
The access to a shared resource r m(such as a shared memory or a bus)
is handled using a resource allocation protocol, which in the current sion consists of one of the following protocols: preemptive critical section,nonpreemptive critical section, or priority inheritance The tasks are in thecurrent version scheduled using either rate monotonic, deadline monotonic,fixed priority, or earliest deadline first scheduling [Liu00] The properties of
ver-a processing element cver-an be seen in Tver-able 5.3 Allocver-ation ver-and scheduling ver-aredesigned in MOVES for easy extensions, that is, new algorithms can easily beadded to the current pool
The interaction between the operating system and the application model
is shown in Figure 5.6 The operating system model consists of a controller,
a synchronizer, an allocator, and a scheduler The controller receivesreadyor
Trang 3Controller Synchronizer Allocator Scheduler
Trang 4networks, can be modeled As a bus transfer is nonpreemptable, messagetasks are modeled as run-to-completion This is achieved by having all mes-sage tasks running on the bus, that is, the processing elements emulating the
bus, using the same resource r m, thereby preventing the preemption of anymessage task Intraprocessor communication is assumed to be included inthe execution time of the two communicating tasks, and is therefore mod-eled without the use of message tasks
5.3.3 Task Mapping
A mapping is a static allocation of tasks to processing elements of theexecution platform This is depicted by the dashed arrows in Figure 5.5.Suppose that the task τj is mapped onto the processing element pe i The “exe-
cution time,” e ij measured in cycles, memory footprint (“static memory,” sm ij and “dynamic memory,” dm ij ), and “power consumption,” pw ijof a task τj,
depend on the characteristics of the processing element pe iexecuting the task,and can be seen in Table 5.4 In particular, when selecting the operation fre-
quency f i of the processing element pe i, the execution time in seconds, ij, oftask τjcan be calculated as ij = e ij·1
f i
5.3.4 Memory and Power Model
In order to be able to verify that memory and power consumption stay withingiven bounds, the model keeps track of the memory usage and power costs
in each cycle Additional cost parameters can easily be added to the model aslong as the cost can be expressed in terms of the cost of being in a certain state
The memory model includes both static memory allocation (sm), because
of program memory, and dynamic memory allocation (dm), because of data
memory of the task The example in Figure 5.7 illustrates the memory modelfor a set of tasks executing on a single processor It shows the schedulingand resulting memory profiles (split into static and dynamic memories) The
dynamic part is split into private data memory (pdm) needed while ing the task, and communication data memory (cdm) needed to store data
execut-exchanged between tasks The memory needed for data exchange between
Trang 5Memory and power profiles for pe1 when all four tasks in Figure 5.5 are
mapped onto pe1 (a) Schedule where τ3 is preempted by τ4 (b) Memory
usage on pe1: static memory (sm), private data memory (pdm), and munication data memory (cdm) (c) Power usage (d) Task graph from
com-Figure 5.5
τ2and τ3 must be allocated until it has been read by τ3 at the start of τ3’sexecution When τ3becomes preempted, the private data memory of the taskremains allocated till the task finishes
Currently, a simple approach for the modeling of power has been taken
When a task is running, it uses power pw The power usage of a task is zero
at all other times The possible different power usages of tasks can be seen
as the heights of the execution boxes in Figure 5.7c This approach can easily
be extended to account for different power contributions depending on thestate of the task
5.4 Model of Computation
In the following, we will give a rather informal presentation of the model
of computation For a formal and more comprehensive description, pleaserefer to [BHM08] To model the computations of a system, the notion of
Trang 6a “state”, which is a snapshot of the state of affairs of the individual cessing elements, is introduced For the sake of argument, we will con-
pro-sider a system consisting of a single processing element pe i and a set
of tasks τj ∈ T pe i assigned to pe i Furthermore, we shall assume thateach τj is characterized by “best-case” and “worst-case” execution times,bcetj ∈ N and wcet j ∈ N, respectively At the start of each new period,
there is a nondeterministic choice concerning which execution time e ij ∈
{bcetτj , bcetτj + 1, , wcetτj − 1, wcetτj} is needed by τj to finish its job on pe i
of that period
For the processing element pe i, the state component must record whichtask τj(if any) is currently executing, and for every task τj ∈ T pe i record theexecution time eij that is needed by τjto finish its job in its current period
We denote the state σ, where τj is running and where there is a total of n tasks assigned to pe i, as σ = (τ j,(e i1, , e in )) Here, we consider execution
time only; other resource aspects, such as memory or power consumptionare disregarded
A trace is a finite sequence of states, σ1σ2· · · σk , where k≥ 0 is the length
of the trace A trace with length k describes a system behavior in the interval [0, k] For every new period of a task, the task execution time for that period
can be any of the possible execution times in the natural number interval
[bcet, wcet] If bcet = wcet for all tasks, there is only one trace of length k, for any k If bcet = wcet, we may explore all possible extensions of the current
trace by creating a new branch for every possible execution time, every time
a new period is started for a task A “computation tree” is an infinite, finitelybranching tree, where every finite path starting from the root is a trace, andwhere the branching of a given node in the tree corresponds to all possibleextensions of the trace ending in that node This is further explained in thefollowing example
Example 5.2 Let us consider a simple example consisting of three independent tasks
assigned to a single processor The characteristics of each task are shown in Table 5.5 The computation tree for the first 8 time units is shown in Figure 5.8 Here, we will give a short description of how this initial part of the tree is created.
Time t = 0: Only task τ1is ready, as τ2and τ3both have an offset of 2 Hence, τ1starts executing, and as bcet = wcet = 2, there is only one possible execution
time for τ1 The state then becomes σ1= (τ1,(2, 0, 0)).
Trang 7(τ1,(2,0,0)) (τ1,(2,0,0))
(τ1,(2,2,2))
FIGURE 5.8
Possible execution traces A indicates a subtree, the details of which arenot further processed in this example
Time t = 2: τ1 has finished its execution of 2 time units, but a new period for
τ1 has not yet started as π1 = 3 Both τ2 and τ3 are now ready Since
τ2 has the highest priority (i.e., the lowest number), it gets to execute As the execution time interval for both τ2 and τ3 is [1, 2], there are two dif-
ferent execution times for each, and hence, four different possible states, (τ2,(2, 1, 1)), (τ2,(2, 1, 2)), (τ2,(2, 2, 1)), and (τ2,(2, 2, 2)), which give rise to four branches In Figure 5.8, we will only continue the elaboration from state (τ2,(2, 1, 2)).
Time t = 3: τ2finishes its execution τ3is still ready and the first period of τ1has completed initiating its second iteration, hence, τ1 is also ready As τ1 has the highest priority, it gets to execute The state becomes (τ1,(2, 1, 2)).
Time t = 5: τ1finishes its execution τ3is the only task ready, as the first period for τ2has not yet finished The state becomes (τ3,(2, 1, 2)).
Time t = 6: Both τ1and τ2become ready as a new period starts for each of them Again, τ1has the highest priority and gets executed, preempting τ3, which then still needs one time unit of execution to complete its job for the current period Since the execution time of τ2can be 1 or 2, and that of τ1only 1, we just have two branches, that is, the possible new states are (τ1,(2, 1, 2)) and (τ1,(2, 2, 2)).
Trang 8Time t = 8: τ1has completed its execution allowing τ2to take over However, at this point, the second period of τ3starts, while τ3has not yet completed its job for the first period Hence, τ3will not meet its deadline and this example is not schedulable.
This model of computation can easily be extended to a system with tiple processors The system state then becomes the union of the states foreach processor
mul-A run of a system is an infinite sequence of states We call a system
“schedulable” if for every run, each task finishes its job in all its periods
In [BHM08], we have shown that the schedulability problem is decidableand an upper bound on the depth of the part of the computation tree, which
is sufficient to consider when checking for schedulability, is established Anupper bound for that depth is given by
ΩM+ ΠH· (1 + Στ ∈T wcetτ)
where T is the set of all tasks, ΩM is the maximal offset, ΠH is the period of the system (i.e., the least common multiple of all periods of tasks inthe system), and Στ ∈T wcetτan upper bound of the number of hyper-periodsafter which any traces of the system will reach a previous state
The reason why it is necessary to “look deeper” than just one period can be explained as follows: Prior to the time point ΩM, some tasksmay already have started, while others are still waiting for the first period
hyper-to start At the time OM, the currently executing tasks (on various ing elements) may therefore have been granted more execution time in theircurrent periods, than would be the case in periods occurring later than ΩM—you may say that they have “saved up” some execution time and this saving
process-is bounded by the sum of the worst-case execution times in the system In[BHM08], we have provided an example where the saving is reduced by one
in each hyper-period following ΩMuntil a missed deadline is detected Theupper bound above can be tightened:
ΩM+ ΠH· (1 + Στ ∈TXwcetτ)
whereTXis the set of all tasks that do not have a period starting at ΩM
Example 5.3 Let us illustrate the challenge of analyzing multiprocessor systems by
a small example illustrated in Table 5.6.
We have ΩM = 27, ΠH = LCM{11, 8, 251} = 22088, and Στ ∈TXwcetτ =
3+4 = 7 The upper bound on the depth of the tree is ΩM+ΠH·(1+Στ ∈TXwcetτ) =
176731 The number of nodes (states) in the computation tree occurring at a depth
≤ 176731 can be calculated to approximately 3.9 · 1013 For details concerning such calculations we refer to [BHM08].
Trang 9TABLE 5.6
Small Example with a Huge State Space
Execution Time Period Offset
5.5 MoVES Analysis Framework
One aim of our work is to establish a verification framework, called the
“MoVES analysis framework” (see Figure 5.1), that can be used to provideguarantees, for example, about the schedulability, of a system-level model
of an embedded system We have chosen to base this verification framework
on timed automata [AD94] and, in particular, the UPPAAL[BDL04,LPY97] tem for modeling, verification, and simulation In this section, we will brieflydiscuss the rationale behind this choice and give a flavor of the framework
sys-We refer to [BHM08] for more details
First of all, the timed-automata model for an embedded system must beconstructed so that the transition system of this model is a refinement ofthe computation-tree model of Section 5.4, that is, the timed-automata modelmust be correct with respect to the model of computation
Another design criteria is that we want the model to be easily extendible
in the sense that new scheduling, allocation, and synchronization principlesfor example, could be added We therefore structure the timed-automatamodel in the same way the ARTS [MVG04,MVM07] model of the multi-processor platform is structured (cf Figure 5.6) This, furthermore, has theadvantage that the UPPAAL model of the system can also be used for sim-ulation, because an UPPAAL trace in a direct manner reflects events on themultiprocesser platform
The timed-automata model is constructed as a parallel composition ofcommunicating timed automata for each of the components of the embeddedsystem We shall now give a brief overview of the model (details are found in[BHM08]), where an embedded system is modeled as a parallel composition
of an application and an execution platform:
System = Application ExecutionPlatform
Application= τ ∈T TA(τ) ExecutionPlatform= N
j=1TA(pe j )
where denotes the parallel composition of timed automata, TA(τ) the timed automaton for the task τ, and TA(pe) the timed automaton for the processing
Trang 10element, pe Thus, an application consists of a collection of timed automata
for tasks combined in parallel, and an execution platform consists of a lel composition of timed automata for processing elements
paral-The timed-automata model of a processing element, say pe j, is structuredaccording to the ARTS model described in Figure 5.6 as a parallel composi-
tion of a controller, a synchronizer, an allocator, and a scheduler:
TA(pe j ) = Controller j Synchronizer j Allocator j Scheduler j
In the UPPAALmodel, these timed automata communicate synchronouslyover channels and over global variables Furthermore, the procedural lan-guage part of UPPAAL proved particularly useful for expressing manyalgorithms For example, the implementation of the earliest deadline firstscheduling principle is directly expressed as a procedure using appropriatedata structures
Despite that the model of computation in Section 5.4 is a discrete model innature, the real-time clock of UPPAALproved useful for modeling the timing
in the system in a natural manner, and the performance in verification ples was promising as we shall see in Section 5.6 One could have chosen amodel checker for discrete systems, such as SPIN [Hol03], instead of UPPAAL.This would result in a more explicit and less natural modeling of the timing
exam-in the system Later experiments must show whether the verification would
be more efficient
The small example in Table 5.6 shows that verification of “real” tems becomes a major challenge because of the state explosion problem The
sys-MOVES analysis framework is therefore parameterized with respect to the
UPPAAL model of the embedded system in order to be able to experimentwith different approaches and in order to provide an efficient support forspecial cases of systems In the following, we will briefly highlight four ofthese different models
1 One model considers the special case where worst-case and best-caseexecution times are equal Since scheduling decisions are deterministic,nondeterminism is eliminated, and the computation tree of such a sys-tem consists of only one infinite run Note that for such systems it maystill be necessary to analyze a very long initial part of the run beforeschedulability can be guaranteed However, it is possible to analyzevery large systems For the implementation of this model, we used aspecial version of UPPAALin which no history is saved
2 Another model extends the previous one by including the notion ofresource allocation to be used in the analysis of memory footprint andpower consumption
3 A third model includes nondeterminism of execution times, asdescribed in the model of computation in Section 5.4 In this timed-automata model, the execution time for tasks was made discrete in
Trang 11order to handle preemptive scheduling strategies This made the automata model of a task less natural than one could wish.
timed-4 A fourth model used stopwatch automata rather than clocks to modelthe timing of tasks, which allows preemption to be dealt with in amore natural way In general, the reachability problem for stopwatchautomata is undecidable, and the UPPAAL support for stopwatches isbased on overapproximations But our experiences with using thismodel were good: In the examples we have tried so far, the results werealways exact, the verification was more efficient compared with the pre-vious model (typically 40% faster), and it used less space, and we canthus verify larger systems than with the previous model
We are currently working toward a model that will reduce the number
of clocks used compared to the four models mentioned above The goal is
to have just one clock for each processing element, and achieving this, weexpect a major efficiency gain for the verification
5.6 Using the MoVES Analysis Framework
In order to make the model usable for system designers, details of the automata model are encapsulated in the MOVES analysis framework Thesystem designer needs to have an understanding of the embedded systemmodel, but not necessarily of the timed-automata model It is assumed thattasks and their properties are already defined, and, therefore, MOVES is onlyconcerned with helping the system designer configure the execution plat-form and perform the mapping of tasks on it
timed-The timed-automata model is created from a textual description thatresembles the embedded system model presented in Section 5.3 MOVES uses
UPPAALas back-end to analyze the user’s model and to verify properties ofthe embedded system through model checking, as illustrated in Figure 5.1
UPPAALcan produce a diagnostic trace and MOVES transforms this trace into
a task schedule shown as a Gantt chart
As MOVES is a framework aimed at exploring different modelingapproaches, it is possible to change the core model such that the differ-ent modeling approaches described in Section 5.5 can be supported In thefollowing, we will give four examples of using the framework to analyzeembedded systems based on the different approaches The first two exam-ples focus on deterministic models, while the third and the fourth are based
on nondeterministic models
5.6.1 Simple MultiCore Embedded System
To illustrate the design and verification processes using the MOVES ysis framework, consider the simple multi-core embedded system from
Trang 12FIGURE 5.9
Queries and the resulting Gantt chart from the analysis of the system inFigure 5.5 using rate-monotonic scheduling on both processors, and thememory and power figures from Table 5.4 The notation of the schedule is
0for idle,1for running,-for offset, andXfor missed deadline
Figure 5.5 We will use this example to illustrate cross-layer dependenciesand to show how resource costs can be analyzed In the first experiment,
we will use rate-monotonic scheduling as the scheduling policy for the time operating system on both processors Figure 5.9 presents the UPPAAL
real-queries on schedulability and resource usage, and the resulting schedule ofthe system
The verification results show several properties of the system First, thesystem cannot be scheduled in the given form since it misses a deadline.Second, at no point does the system use more than 7 units of power, but atsome point before missing the deadline, 7 units of power is used Finally,
in regard to memory usage, it is verified that pe1 uses 17 units of
mem-ory at some point before missing the deadline but not more, and pe2 uses
12 units but not more It is shown that Task 4 misses a deadline after 11execution cycles Note that Task 5 is the message task between Task 2 andTask 3
In order to explore possible improvements of the system, we attempt
ver-ification of the same system where pe2uses earliest deadline first scheduling.The verification results can be seen in Figure 5.10
First, the system is now schedulable, as can be seen by the
E<>allFinish()query being true The system still has the same
prop-erties for power usage as with rate-monotonic scheduling used on pe2, but
the verification shows that at no point will the revised system (i.e., where pe2
uses earliest deadline first) use more than 11 units of memory Recall that the
system where pe2used rate-monotonic scheduling already before missing adeadline had at some point used 17 units of memory
5.6.2 Smart Phone, Handling Large Models
As shown in Section 5.4, seemingly simple systems can result in very largestate spaces In order to analyze a realistic embedded system, we consider
an application that is part of a smart phone The smart phone includes the
Trang 13E<>missedDeadline: false E<>allFinish(): true E<>totalCostInSystem(Power) == 7: true E<>totalCostInSystem(Power) > 7: false E<>costOnPE[0][Memory] == 11: true E<>costOnPE[0][Memory] > 11: false E<>costOnPE[1][Memory] == 12: true E<>costOnPE[1][Memory] > 12: false
Task 1: 110011001100110011001100110011 Task 2: 001000100000001000100000001000 Task 3: 000011000110000011000110000011 Task 4: 00111001110000111001110000 Task 5: 000100010000000100010000000100
FIGURE 5.10
Queries and the resulting Gantt chart from the analysis of the system in
Figure 5.5 using rate-monotonic scheduling on processor pe1 and earliest
deadline first scheduling on processor pe2
following applications: a GSM encoder, a GSM decoder, and an MP3 decoderwith a total of 103 tasks, as seen in Figure 5.11 These applications do nottogether make up the complete functionality of a smart phone, but are used
as an example, where the number of tasks, their dependencies, and their ing properties are realistic The applications and their properties in the smartphone example originate from experiments done by Schmitz [SAHE04] Thetiming properties, the period, and the deadline of the tasks are imposed
tim-by the application and can be seen in Table 5.7 The smart phone examplehas been verified using worst-case execution times only That is, in order toreduce the state space, we have only considered a deterministic version ofthe application where worst-case execution times equal best-case executiontimes
The execution cycles, memory usage, and power consumption of eachtask depend on the processing element These properties of the tasks havebeen measured by simulating the execution of each task on different types ofprocessing elements (the GPP, the FPGA, and the ASIC) as seen in Table 5.7.The execution cycles range from 52 to 266687 and the periods range from0.02 to 0.025 seconds giving a total number of 504 tasks to be executed in thehyper-period of the system
The three applications have been mapped onto a platform consisting offour general-purpose processing elements, all of type GPP0 running at 25MHz, connected by a bus The parallelism of the MP3-decoder has beenexploited to split this application onto two processing elements The twoother applications run on their own processing element
Having defined the embedded system with the application, the cution platform, and the mapping described above, the MOVES analysis
Trang 14Task graph for three applications from a smart phone, taken from [SAHE04].
framework is used to verify schedulability, maximum memory usage, andpower consumption In this case, the system is schedulable and the maxi-mum memory usage and power consumption is 1500 bytes and 1000 mW.The verification of this example takes roughly 3 h on a 64 bit Linux serverwith an AMD dual-core processor with 2 GB of memory
Trang 15It is possible that better designs exist, for instance, where less power
is used A general-purpose processor could, for example, run at a lowerfrequency, or be replaced by an FPGA or an ASIC This is, however, not thefocus of this case study
5.6.3 Handling Nondeterministic Execution Times
When we allow a span of execution times between the best-case executiontime and the worst-case execution time of a task, the state space growsdramatically, as explained in Section 5.4 We examine the system given
in Table 5.6 using an UPPAAL model capturing the nondeterminism in thechoices for execution times in each period and using discretization of therunning time of tasks In Section 5.4, it was shown that the maximal depth
of the computation tree that is needed when checking for schedulability is
ΩM+ 8 · ΠH(i.e., 176731) The number of states in the initial part of the putation tree until that depth is approximately 3.9·1013 The verification used3.1 GB of memory and took less than 11 min on an AMD CPU of 1.8 MHz and
com-32 GB of RAM
If the system is changed slightly by adding an extra choice for the tion time to τ3(i.e., wcetτ3 = 14), the number of states in the initial part of thecomputation tree until depth ΩM+ 8 · ΠH will be approximately 4.2· 1013.When attempting verification of this revised system on the same CPU, theverification aborts after 19 min with an “Out of memory” error message afterhaving used 3.4 GB of memory
execu-5.6.4 Stopwatch Model
Examining the same system (i.e., adding the extra choice for execution time
wcetτ3 = 14) using an UPPAALmodel with stopwatches, this example can now
be analyzed without the “Out of memory” error Even though the tion with stopwatches is using overapproximations, all the experiments wehave conducted so far with this model have provided exact results Further-more, the tendency for all these experiments is that memory consumption as