Model-Based Design for Embedded Systems- P6 pps

Mapping Execution platform Network m1 os1 pe1 os2 pe2 m2 FIGURE 5.5 System-level model of an embedded system.. 5.3.3 Task Mapping A mapping is a static allocation of tasks to processing

Trang 1

Mapping

Execution platform Network

m1

os1 pe1

os2 pe2 m2

FIGURE 5.5

System-level model of an embedded system

these layers for a very simple example of an embedded system, which will

be used to explain the aspects of the model throughout the chapter

• The application is described by a collection of communicating sequential

tasks Each task is characterized by four timing properties, describedlater The dependencies between tasks are captured by an acyclicdirected graph (called a “task graph”), which might not be fully con-nected

• The execution platform consists of several processing elements of

possi-bly different types and clock frequencies Each processing element willrun its own real-time operating system, scheduling tasks in a priority-driven manner (static or dynamic), according to their priorities, depen-dencies, and resource usage When a task needs to communicate with

a task on another processing element, it uses a network The setup ofthe network between processing elements must also be specified, and

is part of the platform

• The “mapping” between the application and the execution platform(shown as dashed arrows in the figure) is done by placing each task on

a specific processing element In our model, this mapping is static, andtasks cannot migrate during run-time

The top level of the embedded system consists of an application mappedonto an execution platform This mapping is depicted in Figure 5.5 withdashed arrows The timing characteristics in Table 5.2 originate from [SL96],while the memory and power figures (in Table 5.4) are created for the pur-pose of demonstrating parameters of an embedded system We will elaborate

on the various parameters in the following

Trang 2

The task graph for the application can be thought of as an abstraction of

a set of independent sequential programs that are executed on the tion platform Each program is modeled as a directed acyclic graph of taskswhere edges indicate causal dependencies Dependencies are shown withsolid arrows in Figure 5.5 A task is a piece of a sequential code and is con-sidered to be an atomic unit for scheduling A task τjis periodic and is char-acterized by a “period” πj, a “deadline” δj, an initial “offset” ωj, and a fixed

execu-priority fp j(used when an operating system uses fixed priority scheduling).The properties of periodic tasks (except the fixed priority) can be seen inTable 5.2 and are all given in some time unit

5.3.2 Execution Platform Model

The execution platform is a heterogeneous system, in which a number of

processing elements, pe1, , pe n, are connected through a network

5.3.2.1 Processing-Element Model

A processing element pe i is characterized by a “clock frequency” f i, a “local

memory” m i with a bounded size, and a “real-time operating system” os i.The operating system handles synchronization of tasks according to theirdependencies using direct synchronization [SL96]

The access to a shared resource r m(such as a shared memory or a bus)

is handled using a resource allocation protocol, which in the current sion consists of one of the following protocols: preemptive critical section,nonpreemptive critical section, or priority inheritance The tasks are in thecurrent version scheduled using either rate monotonic, deadline monotonic,fixed priority, or earliest deadline first scheduling [Liu00] The properties of

ver-a processing element cver-an be seen in Tver-able 5.3 Allocver-ation ver-and scheduling ver-aredesigned in MOVES for easy extensions, that is, new algorithms can easily beadded to the current pool

The interaction between the operating system and the application model

is shown in Figure 5.6 The operating system model consists of a controller,

a synchronizer, an allocator, and a scheduler The controller receivesreadyor

Trang 3

Controller Synchronizer Allocator Scheduler

Trang 4

networks, can be modeled As a bus transfer is nonpreemptable, messagetasks are modeled as run-to-completion This is achieved by having all mes-sage tasks running on the bus, that is, the processing elements emulating the

bus, using the same resource r m, thereby preventing the preemption of anymessage task Intraprocessor communication is assumed to be included inthe execution time of the two communicating tasks, and is therefore mod-eled without the use of message tasks

5.3.3 Task Mapping

A mapping is a static allocation of tasks to processing elements of theexecution platform This is depicted by the dashed arrows in Figure 5.5.Suppose that the task τj is mapped onto the processing element pe i The “exe-

cution time,” e ij measured in cycles, memory footprint (“static memory,” sm ij and “dynamic memory,” dm ij ), and “power consumption,” pw ijof a task τj,

depend on the characteristics of the processing element pe iexecuting the task,and can be seen in Table 5.4 In particular, when selecting the operation fre-

quency f i of the processing element pe i, the execution time in seconds, ij, oftask τjcan be calculated as ij = e ij·1

f i

5.3.4 Memory and Power Model

In order to be able to verify that memory and power consumption stay withingiven bounds, the model keeps track of the memory usage and power costs

in each cycle Additional cost parameters can easily be added to the model aslong as the cost can be expressed in terms of the cost of being in a certain state

The memory model includes both static memory allocation (sm), because

of program memory, and dynamic memory allocation (dm), because of data

memory of the task The example in Figure 5.7 illustrates the memory modelfor a set of tasks executing on a single processor It shows the schedulingand resulting memory profiles (split into static and dynamic memories) The

dynamic part is split into private data memory (pdm) needed while ing the task, and communication data memory (cdm) needed to store data

execut-exchanged between tasks The memory needed for data exchange between

Trang 5

Memory and power profiles for pe1 when all four tasks in Figure 5.5 are

mapped onto pe1 (a) Schedule where τ3 is preempted by τ4 (b) Memory

usage on pe1: static memory (sm), private data memory (pdm), and munication data memory (cdm) (c) Power usage (d) Task graph from

com-Figure 5.5

τ2and τ3 must be allocated until it has been read by τ3 at the start of τ3’sexecution When τ3becomes preempted, the private data memory of the taskremains allocated till the task finishes

Currently, a simple approach for the modeling of power has been taken

When a task is running, it uses power pw The power usage of a task is zero

at all other times The possible different power usages of tasks can be seen

as the heights of the execution boxes in Figure 5.7c This approach can easily

be extended to account for different power contributions depending on thestate of the task

5.4 Model of Computation

In the following, we will give a rather informal presentation of the model

of computation For a formal and more comprehensive description, pleaserefer to [BHM08] To model the computations of a system, the notion of

Trang 6

a “state”, which is a snapshot of the state of affairs of the individual cessing elements, is introduced For the sake of argument, we will con-

pro-sider a system consisting of a single processing element pe i and a set

of tasks τj ∈ T pe i assigned to pe i Furthermore, we shall assume thateach τj is characterized by “best-case” and “worst-case” execution times,bcetj ∈ N and wcet j ∈ N, respectively At the start of each new period,

there is a nondeterministic choice concerning which execution time e ij ∈

{bcetτj , bcetτj + 1, , wcetτj − 1, wcetτj} is needed by τj to finish its job on pe i

of that period

For the processing element pe i, the state component must record whichtask τj(if any) is currently executing, and for every task τj ∈ T pe i record theexecution time eij that is needed by τjto finish its job in its current period

We denote the state σ, where τj is running and where there is a total of n tasks assigned to pe i, as σ = (τ j,(e i1, , e in )) Here, we consider execution

time only; other resource aspects, such as memory or power consumptionare disregarded

A trace is a finite sequence of states, σ1σ2· · · σk , where k≥ 0 is the length

of the trace A trace with length k describes a system behavior in the interval [0, k] For every new period of a task, the task execution time for that period

can be any of the possible execution times in the natural number interval

[bcet, wcet] If bcet = wcet for all tasks, there is only one trace of length k, for any k If bcet = wcet, we may explore all possible extensions of the current

trace by creating a new branch for every possible execution time, every time

a new period is started for a task A “computation tree” is an infinite, finitelybranching tree, where every finite path starting from the root is a trace, andwhere the branching of a given node in the tree corresponds to all possibleextensions of the trace ending in that node This is further explained in thefollowing example

Example 5.2 Let us consider a simple example consisting of three independent tasks

assigned to a single processor The characteristics of each task are shown in Table 5.5 The computation tree for the first 8 time units is shown in Figure 5.8 Here, we will give a short description of how this initial part of the tree is created.

Time t = 0: Only task τ1is ready, as τ2and τ3both have an offset of 2 Hence, τ1starts executing, and as bcet = wcet = 2, there is only one possible execution

time for τ1 The state then becomes σ1= (τ1,(2, 0, 0)).

Trang 7

(τ1,(2,0,0)) (τ1,(2,0,0))

(τ1,(2,2,2))

FIGURE 5.8

Possible execution traces A indicates a subtree, the details of which arenot further processed in this example

Time t = 2: τ1 has finished its execution of 2 time units, but a new period for

τ1 has not yet started as π1 = 3 Both τ2 and τ3 are now ready Since

τ2 has the highest priority (i.e., the lowest number), it gets to execute As the execution time interval for both τ2 and τ3 is [1, 2], there are two dif-

ferent execution times for each, and hence, four different possible states, (τ2,(2, 1, 1)), (τ2,(2, 1, 2)), (τ2,(2, 2, 1)), and (τ2,(2, 2, 2)), which give rise to four branches In Figure 5.8, we will only continue the elaboration from state (τ2,(2, 1, 2)).

Time t = 3: τ2finishes its execution τ3is still ready and the first period of τ1has completed initiating its second iteration, hence, τ1 is also ready As τ1 has the highest priority, it gets to execute The state becomes (τ1,(2, 1, 2)).

Time t = 5: τ1finishes its execution τ3is the only task ready, as the first period for τ2has not yet finished The state becomes (τ3,(2, 1, 2)).

Time t = 6: Both τ1and τ2become ready as a new period starts for each of them Again, τ1has the highest priority and gets executed, preempting τ3, which then still needs one time unit of execution to complete its job for the current period Since the execution time of τ2can be 1 or 2, and that of τ1only 1, we just have two branches, that is, the possible new states are (τ1,(2, 1, 2)) and (τ1,(2, 2, 2)).

Trang 8

Time t = 8: τ1has completed its execution allowing τ2to take over However, at this point, the second period of τ3starts, while τ3has not yet completed its job for the first period Hence, τ3will not meet its deadline and this example is not schedulable.

This model of computation can easily be extended to a system with tiple processors The system state then becomes the union of the states foreach processor

mul-A run of a system is an infinite sequence of states We call a system

“schedulable” if for every run, each task finishes its job in all its periods

In [BHM08], we have shown that the schedulability problem is decidableand an upper bound on the depth of the part of the computation tree, which

is sufficient to consider when checking for schedulability, is established Anupper bound for that depth is given by

ΩM+ ΠH· (1 + Στ ∈T wcetτ)

where T is the set of all tasks, ΩM is the maximal offset, ΠH is the period of the system (i.e., the least common multiple of all periods of tasks inthe system), and Στ ∈T wcetτan upper bound of the number of hyper-periodsafter which any traces of the system will reach a previous state

The reason why it is necessary to “look deeper” than just one period can be explained as follows: Prior to the time point ΩM, some tasksmay already have started, while others are still waiting for the first period

hyper-to start At the time OM, the currently executing tasks (on various ing elements) may therefore have been granted more execution time in theircurrent periods, than would be the case in periods occurring later than ΩM—you may say that they have “saved up” some execution time and this saving

process-is bounded by the sum of the worst-case execution times in the system In[BHM08], we have provided an example where the saving is reduced by one

in each hyper-period following ΩMuntil a missed deadline is detected Theupper bound above can be tightened:

ΩM+ ΠH· (1 + Στ ∈TXwcetτ)

whereTXis the set of all tasks that do not have a period starting at ΩM

Example 5.3 Let us illustrate the challenge of analyzing multiprocessor systems by

a small example illustrated in Table 5.6.

We have ΩM = 27, ΠH = LCM{11, 8, 251} = 22088, and Στ ∈TXwcetτ =

3+4 = 7 The upper bound on the depth of the tree is ΩM+ΠH·(1+Στ ∈TXwcetτ) =

176731 The number of nodes (states) in the computation tree occurring at a depth

≤ 176731 can be calculated to approximately 3.9 · 1013 For details concerning such calculations we refer to [BHM08].

Trang 9

TABLE 5.6

Small Example with a Huge State Space

Execution Time Period Offset

5.5 MoVES Analysis Framework

One aim of our work is to establish a verification framework, called the

“MoVES analysis framework” (see Figure 5.1), that can be used to provideguarantees, for example, about the schedulability, of a system-level model

of an embedded system We have chosen to base this verification framework

on timed automata [AD94] and, in particular, the UPPAAL[BDL04,LPY97] tem for modeling, verification, and simulation In this section, we will brieflydiscuss the rationale behind this choice and give a flavor of the framework

sys-We refer to [BHM08] for more details

First of all, the timed-automata model for an embedded system must beconstructed so that the transition system of this model is a refinement ofthe computation-tree model of Section 5.4, that is, the timed-automata modelmust be correct with respect to the model of computation

Another design criteria is that we want the model to be easily extendible

in the sense that new scheduling, allocation, and synchronization principlesfor example, could be added We therefore structure the timed-automatamodel in the same way the ARTS [MVG04,MVM07] model of the multi-processor platform is structured (cf Figure 5.6) This, furthermore, has theadvantage that the UPPAAL model of the system can also be used for sim-ulation, because an UPPAAL trace in a direct manner reflects events on themultiprocesser platform

The timed-automata model is constructed as a parallel composition ofcommunicating timed automata for each of the components of the embeddedsystem We shall now give a brief overview of the model (details are found in[BHM08]), where an embedded system is modeled as a parallel composition

of an application and an execution platform:

System = Application ExecutionPlatform

Application= τ ∈T TA(τ) ExecutionPlatform= N

j=1TA(pe j )

where denotes the parallel composition of timed automata, TA(τ) the timed automaton for the task τ, and TA(pe) the timed automaton for the processing

Trang 10

element, pe Thus, an application consists of a collection of timed automata

for tasks combined in parallel, and an execution platform consists of a lel composition of timed automata for processing elements

paral-The timed-automata model of a processing element, say pe j, is structuredaccording to the ARTS model described in Figure 5.6 as a parallel composi-

tion of a controller, a synchronizer, an allocator, and a scheduler:

TA(pe j ) = Controller j Synchronizer j Allocator j Scheduler j

In the UPPAALmodel, these timed automata communicate synchronouslyover channels and over global variables Furthermore, the procedural lan-guage part of UPPAAL proved particularly useful for expressing manyalgorithms For example, the implementation of the earliest deadline firstscheduling principle is directly expressed as a procedure using appropriatedata structures

Despite that the model of computation in Section 5.4 is a discrete model innature, the real-time clock of UPPAALproved useful for modeling the timing

in the system in a natural manner, and the performance in verification ples was promising as we shall see in Section 5.6 One could have chosen amodel checker for discrete systems, such as SPIN [Hol03], instead of UPPAAL.This would result in a more explicit and less natural modeling of the timing

exam-in the system Later experiments must show whether the verification would

be more efficient

The small example in Table 5.6 shows that verification of “real” tems becomes a major challenge because of the state explosion problem The

sys-MOVES analysis framework is therefore parameterized with respect to the

UPPAAL model of the embedded system in order to be able to experimentwith different approaches and in order to provide an efficient support forspecial cases of systems In the following, we will briefly highlight four ofthese different models

1 One model considers the special case where worst-case and best-caseexecution times are equal Since scheduling decisions are deterministic,nondeterminism is eliminated, and the computation tree of such a sys-tem consists of only one infinite run Note that for such systems it maystill be necessary to analyze a very long initial part of the run beforeschedulability can be guaranteed However, it is possible to analyzevery large systems For the implementation of this model, we used aspecial version of UPPAALin which no history is saved

2 Another model extends the previous one by including the notion ofresource allocation to be used in the analysis of memory footprint andpower consumption

3 A third model includes nondeterminism of execution times, asdescribed in the model of computation in Section 5.4 In this timed-automata model, the execution time for tasks was made discrete in

Trang 11

order to handle preemptive scheduling strategies This made the automata model of a task less natural than one could wish.

timed-4 A fourth model used stopwatch automata rather than clocks to modelthe timing of tasks, which allows preemption to be dealt with in amore natural way In general, the reachability problem for stopwatchautomata is undecidable, and the UPPAAL support for stopwatches isbased on overapproximations But our experiences with using thismodel were good: In the examples we have tried so far, the results werealways exact, the verification was more efficient compared with the pre-vious model (typically 40% faster), and it used less space, and we canthus verify larger systems than with the previous model

We are currently working toward a model that will reduce the number

of clocks used compared to the four models mentioned above The goal is

to have just one clock for each processing element, and achieving this, weexpect a major efficiency gain for the verification

5.6 Using the MoVES Analysis Framework

In order to make the model usable for system designers, details of the automata model are encapsulated in the MOVES analysis framework Thesystem designer needs to have an understanding of the embedded systemmodel, but not necessarily of the timed-automata model It is assumed thattasks and their properties are already defined, and, therefore, MOVES is onlyconcerned with helping the system designer configure the execution plat-form and perform the mapping of tasks on it

timed-The timed-automata model is created from a textual description thatresembles the embedded system model presented in Section 5.3 MOVES uses

UPPAALas back-end to analyze the user’s model and to verify properties ofthe embedded system through model checking, as illustrated in Figure 5.1

UPPAALcan produce a diagnostic trace and MOVES transforms this trace into

a task schedule shown as a Gantt chart

As MOVES is a framework aimed at exploring different modelingapproaches, it is possible to change the core model such that the differ-ent modeling approaches described in Section 5.5 can be supported In thefollowing, we will give four examples of using the framework to analyzeembedded systems based on the different approaches The first two exam-ples focus on deterministic models, while the third and the fourth are based

on nondeterministic models

5.6.1 Simple MultiCore Embedded System

To illustrate the design and verification processes using the MOVES ysis framework, consider the simple multi-core embedded system from

Trang 12

FIGURE 5.9

Queries and the resulting Gantt chart from the analysis of the system inFigure 5.5 using rate-monotonic scheduling on both processors, and thememory and power figures from Table 5.4 The notation of the schedule is

0for idle,1for running,-for offset, andXfor missed deadline

Figure 5.5 We will use this example to illustrate cross-layer dependenciesand to show how resource costs can be analyzed In the first experiment,

we will use rate-monotonic scheduling as the scheduling policy for the time operating system on both processors Figure 5.9 presents the UPPAAL

real-queries on schedulability and resource usage, and the resulting schedule ofthe system

The verification results show several properties of the system First, thesystem cannot be scheduled in the given form since it misses a deadline.Second, at no point does the system use more than 7 units of power, but atsome point before missing the deadline, 7 units of power is used Finally,

in regard to memory usage, it is verified that pe1 uses 17 units of

mem-ory at some point before missing the deadline but not more, and pe2 uses

12 units but not more It is shown that Task 4 misses a deadline after 11execution cycles Note that Task 5 is the message task between Task 2 andTask 3

In order to explore possible improvements of the system, we attempt

ver-ification of the same system where pe2uses earliest deadline first scheduling.The verification results can be seen in Figure 5.10

First, the system is now schedulable, as can be seen by the

E<>allFinish()query being true The system still has the same

prop-erties for power usage as with rate-monotonic scheduling used on pe2, but

the verification shows that at no point will the revised system (i.e., where pe2

uses earliest deadline first) use more than 11 units of memory Recall that the

system where pe2used rate-monotonic scheduling already before missing adeadline had at some point used 17 units of memory

5.6.2 Smart Phone, Handling Large Models

As shown in Section 5.4, seemingly simple systems can result in very largestate spaces In order to analyze a realistic embedded system, we consider

an application that is part of a smart phone The smart phone includes the

Trang 13

E<>missedDeadline: false E<>allFinish(): true E<>totalCostInSystem(Power) == 7: true E<>totalCostInSystem(Power) > 7: false E<>costOnPE[0][Memory] == 11: true E<>costOnPE[0][Memory] > 11: false E<>costOnPE[1][Memory] == 12: true E<>costOnPE[1][Memory] > 12: false

Task 1: 110011001100110011001100110011 Task 2: 001000100000001000100000001000 Task 3: 000011000110000011000110000011 Task 4: 00111001110000111001110000 Task 5: 000100010000000100010000000100

FIGURE 5.10

Queries and the resulting Gantt chart from the analysis of the system in

Figure 5.5 using rate-monotonic scheduling on processor pe1 and earliest

deadline first scheduling on processor pe2

following applications: a GSM encoder, a GSM decoder, and an MP3 decoderwith a total of 103 tasks, as seen in Figure 5.11 These applications do nottogether make up the complete functionality of a smart phone, but are used

as an example, where the number of tasks, their dependencies, and their ing properties are realistic The applications and their properties in the smartphone example originate from experiments done by Schmitz [SAHE04] Thetiming properties, the period, and the deadline of the tasks are imposed

tim-by the application and can be seen in Table 5.7 The smart phone examplehas been verified using worst-case execution times only That is, in order toreduce the state space, we have only considered a deterministic version ofthe application where worst-case execution times equal best-case executiontimes

The execution cycles, memory usage, and power consumption of eachtask depend on the processing element These properties of the tasks havebeen measured by simulating the execution of each task on different types ofprocessing elements (the GPP, the FPGA, and the ASIC) as seen in Table 5.7.The execution cycles range from 52 to 266687 and the periods range from0.02 to 0.025 seconds giving a total number of 504 tasks to be executed in thehyper-period of the system

The three applications have been mapped onto a platform consisting offour general-purpose processing elements, all of type GPP0 running at 25MHz, connected by a bus The parallelism of the MP3-decoder has beenexploited to split this application onto two processing elements The twoother applications run on their own processing element

Having defined the embedded system with the application, the cution platform, and the mapping described above, the MOVES analysis

Trang 14

Task graph for three applications from a smart phone, taken from [SAHE04].

framework is used to verify schedulability, maximum memory usage, andpower consumption In this case, the system is schedulable and the maxi-mum memory usage and power consumption is 1500 bytes and 1000 mW.The verification of this example takes roughly 3 h on a 64 bit Linux serverwith an AMD dual-core processor with 2 GB of memory

Trang 15

It is possible that better designs exist, for instance, where less power

is used A general-purpose processor could, for example, run at a lowerfrequency, or be replaced by an FPGA or an ASIC This is, however, not thefocus of this case study

5.6.3 Handling Nondeterministic Execution Times

When we allow a span of execution times between the best-case executiontime and the worst-case execution time of a task, the state space growsdramatically, as explained in Section 5.4 We examine the system given

in Table 5.6 using an UPPAAL model capturing the nondeterminism in thechoices for execution times in each period and using discretization of therunning time of tasks In Section 5.4, it was shown that the maximal depth

of the computation tree that is needed when checking for schedulability is

ΩM+ 8 · ΠH(i.e., 176731) The number of states in the initial part of the putation tree until that depth is approximately 3.9·1013 The verification used3.1 GB of memory and took less than 11 min on an AMD CPU of 1.8 MHz and

com-32 GB of RAM

If the system is changed slightly by adding an extra choice for the tion time to τ3(i.e., wcetτ3 = 14), the number of states in the initial part of thecomputation tree until depth ΩM+ 8 · ΠH will be approximately 4.2· 1013.When attempting verification of this revised system on the same CPU, theverification aborts after 19 min with an “Out of memory” error message afterhaving used 3.4 GB of memory

execu-5.6.4 Stopwatch Model

Examining the same system (i.e., adding the extra choice for execution time

wcetτ3 = 14) using an UPPAALmodel with stopwatches, this example can now

be analyzed without the “Out of memory” error Even though the tion with stopwatches is using overapproximations, all the experiments wehave conducted so far with this model have provided exact results Further-more, the tendency for all these experiments is that memory consumption as

Tiêu đề	Model-Based Design for Embedded Systems
Trường học	Vietnam National University, Hanoi
Chuyên ngành	Embedded Systems
Thể loại	thesis
Năm xuất bản	2024
Thành phố	Hanoi

Định dạng
Số trang	30
Dung lượng	857,99 KB