Model-Based Design for Embedded Systems- P10 pot

Thus, the system architecture model expresses parallelism inthe target application through capturing the mapping of the functions intotasks and the tasks into subsystems.. 248 Model-Base

Trang 1

246 Model-Based Design for Embedded Systems

HDS API

HAL CPU

Comm.

HALAPI Task 1 Task 2 Task q

Abstract CPUs

& native SW execution

HdS API

HdS API Comm OS Task 1 Task 2 Task n

Abstract sub system comm.

intra-& native SW execution HdS API Task 1 Task 2 Task n

Abstract sub system comm.Abstract intra-sub system comm.

intra-Task 1 intra-Task 2 intra-Task n

Intra-sub syst comm.

CPU Peripherals Intra-sub syst comm.

CPU Peripherals

Intra-sub syst comm.

Partitioning and mapping

Mapping comm.

on HW resources

SW adapt to specific HW comm implementation

Abstract subsystem comm.

intra-Abstract inter-subsystem communication

Intra-subsystem communication

Inter-subsystem communication

HDS API

HAL CPU

HAL API Task 1 Task 2 Task n

Intra-subsyst.comm.

FIGURE 9.6

MPSoC programming steps

The result of each of these four phases represents a step in thesoftware and communication refinement process The refinement is anincremental process At each stage, additional software component and

Trang 2

architecture details are integrated with the previously generated andvalidated components This results to a gradual transformation of a highlevel representation with abstract components and high level programmingmodels into a concrete low level executable software code The transforma-tion has to be validated at each design step The validation can be performed

by formal analysis, simulation, or combining simulation with formal sis [23] In the following, we will use simulation-based validation to ensurethat the system behavior respects the initial specification

analy-During the partitioning and mapping of the application on the target

archi-tecture, the relationship between application and architecture is defined Thisrefers to the number of application tasks that can be executed in parallel, thegranularity of these tasks (coarse grain or fine grain), and the associationbetween tasks and the processors that will execute them

The result of this step is the decomposition of the application into tasksand the association between tasks and processors The resulting model is thesystem architecture model The system architecture model represents a func-tional description of the application specification, combined with the parti-tioning and mapping information Aspects related to the architecture model(e.g., processing units available in the target hardware platform) are com-bined into the application model (i.e., multiple tasks executed on the pro-cessing units) Thus, the system architecture model expresses parallelism inthe target application through capturing the mapping of the functions intotasks and the tasks into subsystems It also makes explicit the communicationunits to abstract the intra-subsystem communication protocols (the commu-nication between the tasks inside a subsystem) and the inter-subsystem com-munication protocols (the communication between different subsystems)

The second step implements the mapping of communication onto the

hard-ware platform resources At this phase, the different links used for thecommunication between the different tasks are mapped on the hardwareresources available in the architecture to implement the specified protocol.For example, a FIFO communication unit can be mapped to a hardwarequeue, a shared memory or some kind of bus-based device The task code

is adapted to the communication mechanism through the use of adequateHdS communication primitives The resulting model is named virtual archi-tecture model

The next step of the proposed flow consists of software adaptation to specific

the communication protocol are detailed, for example, the synchronizationmechanism between the different processors running in parallel becomesexplicit The software code has to be adapted to the synchronization method,such as events or semaphores This can be done by using the services of OSand communication components of the software stack The resulting model

is the Transaction Accurate Architecture model

The final step corresponds to specific adaptation of the software to the

Trang 3

processor dependent software code into the software stack (HAL) to allowlow level access to the hardware resources and the final memory mapping.The resulting model is called Virtual Prototype model

These different steps of the global flow correspond to different softwarecomponents generation and validation at different abstraction levels

9.6 Experiments with H.264 Encoder Application

In this section, we apply the proposed programming environment for a plex MPSoC architecture The target application corresponds to the H.264encoder, also called AVC (advanced video coding) Firstly, the specification

com-of the target architecture and application are given, and then, the ming steps at the system architecture, virtual architecture, transaction accu-rate architecture, and virtual prototype levels are described, respectively

program-9.6.1 Application and Architecture Speciﬁcation

The H.264 encoder application is a video processing multimedia tion that supports coding and decoding of 4:2:0 YUV video formats [24] Themain functions of the H.264 encoder are illustrated in Figure 9.7 The input

each consisting of 16 pixels To encode a macroblock, there are three mainsteps: (1) prediction, with the main blocks motion estimation-ME, motioncompensation-MC, and frame filtering; (2) transformation with quantization(T, Q, and Reorder); and (3) entropy encoding (CABAC in this case) TheH.264 standard supports seven sets of capabilities, which are referred to

Intra pred.

+ –

Filter

FIGURE 9.7

H.264 encoder

Trang 4

DSP2 SS DSP1 SS

FIGURE 9.8

Diopsis R2DT with Hermes NoC

as profiles, targeting specific class of applications In this section, the mainprofile will be used as an application case study

The target MPSoC architecture is named Diopsis R2DT (RISC + 2 DSP)tile [25] As shown in Figure 9.8, it contains three SW-SS: one ARM9 RISCprocessor subsystem and two ATMEL magicV VLIW DSP processing sub-systems

The hardware nodes represent the global external memory (DXM) andPOT (peripherals on tile) subsystem The POT subsystem contains theperipherals of the ARM9 processor and the I/O peripherals of the tile Allthe three processors may access the local memories and registers of the otherprocessors and also the distributed external memory (DXM) The differentsubsystems are interconnected using the Hermes network on chip (NoC),which supports two types of topologies: Mesh and Torus [26]

9.6.2 Programming at the System Architecture Level

Programming at the system architecture level consists of functional ing of the application, partitioning the application into the tasks, and map-ping them onto the processing subsystems

model-Therefore, the H.264 application functions are mapped onto the available

SW-SS, as shown in Figure 9.9 Thus, the DSP1-SS is responsible for encoding

a frame of the video sequence The DSP2-SS compresses the encoded frame The ARM9-SS creates the final bitstream and computes the bit-rate controller.

The application executes in pipeline fashion and requires three application

data transfers between the processors: COMM1 between DSP1 and DSP2,

The resulting system architecture is modeled using the Simulink ronment To validate the H.264 encoder algorithm, the system architecture

Trang 5

envi-250 Model-Based Design for Embeđed Systems

T

+

+ +

DSP1-SS

COMM2

.yuv

FIGURE 9.9

System architecture model of H.264

model is simulated using a discrete-time simulation enginẹ The input testvideo is a 10 frames video sequence in QCIF YUV 420 format The simula-tion requires approximately 30 s on a PC running at 1.73 GHz with 1 GBytesRAM

The H.264 simulation allowed validating the functionality, but also suring early execution requirements Thus, the total number of iterations nec-essary to decode the 10 frames video sequence was equal with the number

mea-of frames This is because mea-of the fact that all the application functions mented in Simulink operate at the frame level The communication betweenthe DSP1 and DSP2 processors uses a communication unit that requires abuffer of 288,585 words to transmit the encoded frame from the DSP1 pro-cessor to the DSP2 in order to be compressed The DSP2 processor and theARM9 processor communicate through a communication unit that requires

imple-a buffer of 19,998 words The limple-ast communicimple-ation unit between the ARM9and DSP1 processors requires one word buffer size in order to store thequanta value required for the encoder The total number of words exchangedbetween the different subsystems during the encoding process of the 10frames video sequence, using main profile configuration of the encoder algo-rithm, was approximately 3085 kWords

9.6.3 Programming at the Virtual Architecture Level

Programming at the virtual architecture level consists of generating the Ccode for each task from the system architecture model The generated tasks

code for the H.264 encoder application uses send_datặ )/recv_datặ ) APIs

for the communication primitives and is optimized in terms of data memoryrequirements

Table 9.4 shows the task code and data size of the software at the virtualarchitecture level The first two columns represent the code, respectively thedata size of the functions that are independent of the design and optimiza-tion methods, which are part of an independent librarỵ The third and fourth

Trang 6

TABLE 9.4

Task Code Generation for H.264 Encoder

Library Code Library Data Multitasking Code Multitasking Data

T1

T2 T1

FIGURE 9.10

Global view of Diopsis R2DT running H.264

columns show the code and data size obtained with memory optimizationtechniques

The hardware at the virtual architecture level consists of a SystemC ware platform, consisting of abstract processor subsystems and interconnectcomponents Figure 9.10 illustrates a conceptual view of the virtual architec-ture for the Diopsis R2DT with Hermes NoC

hard-The virtual architecture can be simulated not only to validate the taskscode, but also to gather important early performance measurements to pro-file the interconnect charge, for instance, the number of words exchangedbetween the tasks through the network component or the total packets initi-ated for the transfer by various subsystems

Figure 9.11 shows the total words passed through the NoC in case of ferent communication mapping schemes Hence, when all the communica-tion buffers are mapped on the DXM memory, as shown in Figure 9.10, theNoC is accessed to transfer 6,171,680 words during the encoding process of

dif-the 10 frames In anodif-ther case, comm1 is mapped on DXM, comm2 on REG2 and comm3 on DMEM1 This case required 5,971,690 words to be transferred through the NoC A third case maps comm1 on DMEM1, comm2 on DMEM2, and comm3 on SRAM and it generates 3,085,840 words to be operated by

the NoC

Trang 7

Read/Write Total Sent

In all the communication mapping schemes, the simulation time required

to encode the 10 image frames using QCIF YUV 420 format was mately 40 s on a PC running Linux OS at 1.73 GHz

Trang 8

approxi-9.6.4 Programming at the Transaction Accurate

Architecture Level

Programming at the transaction accurate architecture level means to buildeach software stack running on the processors This consists of combiningthe tasks code with the OS and communication libraries Thus, the H.264tasks code previously designed is combined with a tiny OS necessary for theinterrupts management and the tasks initialization, and the implementation

of the send_datặ )/recv_datặ ) communication primitives The processors

execute single task on top of the OS

The transaction accurate architecture of the Diopsis R2DT tile with mes NoC is illustrated in Figure 9.12 The hardware platform is composed

Her-of the three processor subsystems (ARM9-SS, DSP1-SS, and DSP2-SS), oneglobal MEM-SS, and the peripherals on tile subsystem (POT-SS), all sub-systems having the local architecture detailed The different subsystems areinterconnected through an explicit Hermes NoC

The simulation of the transaction accurate architecture allows validatingthe integration of the tasks code with the OS and communication libraries,but it also provides better performance estimation, such as communicationperformances

At this level, in order to analyze the overall system performance, weexperimented with several communication architectures by changing theinterconnection component and/or communication mapping schemẹ TheNoC allows various mapping schemes of the IPs over the NoC with differentimpact on performancẹ In this work, two different mappings of the IP cores

MEM-SS

DXM

NI

SRAM ARM9-SS

NI

Abstract ARM9

HdS API Comm OS HAL API

T1 T3

Trang 9

IP cores mapping schemes A and B over the NoC

over the Mesh and Torus NoC are experimented: Scheme A and Scheme B,respectively Figure 9.13 summarizes these schemes by presenting the corre-spondence between the Network Interface and the IP core, e.g., the MEM-SS

and y coordinates are 1).

Table 9.6 presents the results of the transaction accurate simulationsfor various interconnection components (AMBA bus, NoC) with differenttopologies for the NoC (Torus, Mesh), different IP cores mapping over theNoC and diverse communication buffer mapping schemes The estimatedperformance indicators are: estimated execution cycles of the H.264 encoder,the simulation time using the different interconnect components on a PCrunning at 1.73 GHz with 1 GBytes RAM and the total routing requestsfor the NoC These results were evaluated for the two considered IP map-ping schemes shown in Figure 9.13 (A and B) and for three communication

buffer mapping schemes: DXM+DXM+DXM, DMEM1+DMEM2+SRAM and DMEM1+SRAM+DXM The AMBA had the best performance, as it

implied the fewest clock cycles during the execution for all the cation mapping schemes The Mesh NoC attained the worse performance in

communi-case of mapping all the communication buffers onto the DXM and similar

performance with the Torus in case of using the local memories

This is explained by the small numbers of subsystems interconnectedthrough the NoC In fact, NoCs are very efficient in architectures withmore than 10 IP cores interconnected, while they can have a compara-ble performance results with the AMBA bus in less complex architectures.Between the NoCs, the Torus has better path diversity than the Mesh Thus,Torus reduces network congestion and decreases the routing requests Also,Scheme A of IP cores mapping provided better results than Scheme B for the

the performance of Scheme A was superior to Scheme B In fact, the ideal

IP cores mapping scheme would have the communicating IPs separated byonly one hop (number of intermediate routers) over the network to reducelatency

9.6.5 Programming at the Virtual Prototype Level

Programming at the virtual prototype level consists of integrating the HALlayer into the software stack for each particular processor subsystem and to

Trang 11

ARM9-SS MEM-SS

Mailbox

DMEM1 REG1

PIC

DSP2-SS DSP1-SS

NI

SRAM

ARM9 ISS

SW Stack ARM9

SW Stack DSP1

SW Stack DSP2

DSP2 ISS

Mailbox

DMEM2 REG2 SPI

AIC

HAL HAL API OS HdS API T3

Comm

Comm Hermes NOC

Trang 12

1 H Meyr, Application specific processors (ASIP): On design and

imple-mentation efficiency, Proceeding of SASIMI 06, Nagoya, Japan, 2006.

6 J Turley, Survey says: Software tools more important than chips,

embedded.com/columns/surveys/160700620?_requestid=177492

7 MPICH—MPI implementation http://www-unix.mcs.anl.gov/mpi/mpich/index.htm

8 W Wolf, High-Performance Embedded Computing: Architectures,

Fran-cisco, CA, 2006

9 D Culler, J.P Singh, A Gupta, Parallel Computer Architecture: A Hardware/

CA, August 1998, ISBN 1558603433

10 P Paulin, C Pilkington, M Langevin, E Bensoudane, D Lyonnard,

O Benny, B Lavigueur, D Lo, G Beltrame, V Gagne, G Nicolescu, allel programming models for a multi-processor SoC platform applied

Par-to networking and multimedia, IEEE Transactions on VLSI Journal, 14(7),

12 A Jerraya, W Wolf, Hardware-software interface codesign for

embed-ded systems, Computer, 38(2), 63–69, February 2005.

13 D Skillicorn, D Talia, Models and languages for parallel computation,

Trang 13

14 A Jerraya, A Bouchhima, F Petrot, Programming models and HW-SW

interfaces abstraction for multi-processor SoC, Proceeding of DAC 2006,

San Francisco, CA, 2006, pp 280–285

15 Simulink, The MathWorks Inc., http://www.mathworks.com

16 F Ghenassia, Transaction-Level Modeling with SystemC TLM Concepts and

centric approach, Special Session, Proceeding of CODES+ISSS 2004,

Stock-holm, Sweden, September 2004

19 D.R Butenhof, Programming with POSIX Threads, Addison Wesley,

Boston, MA, May, 1997

20 E Cheong, J Liebman, J Liu, F Zhao, TinyGALS: A programming model

for event-driven embedded systems, Proceeding of 2003 ACM Symposium

21 J.A Rowson, Hardware/software cosimulation, Proceeding of DAC 1994,

San Diego, CA, June 6–10, 1994, pp 439–440

co-verification in C/C++, Proceeding of ASP-DAC 2000, Yokohama,

Japan, 2000, pp 405–408

23 S Kunzli, F Poletti, L Benini, L Thiele, Combining simulation and

for-mal methods for system-level performance analysis, Proceeding of DATE

24 J.-W Chen, C.-Y Kao, Y.-L Lin, Introduction to H.264, Proceeding of

25 P.S Paolucci, A.A Jerraya, R Leupers, L Thiele, P Vicini, SHAPES: A tiledscalable software hardware architecture platform for embedded systems,

26 F Moraes et al., HERMES: An infrastructure for low area overhead

packet-switching networks-on-chip integration, VLSI Journal, 38(1), 2004,

69–93

Trang 14

Platform-Based Design and Frameworks:

Felice Balarin, Massimiliano D’Angelo, Abhijit Davare, Douglas

Densmore, Trevor Meyerowitz, Roberto Passerone, Alessandro Pinto, Alberto Sangiovanni-Vincentelli, Alena Simalatsar, Yosinori Watanabe, Guang Yang, and Qi Zhu

CONTENTS

10.1 Introduction 260

10.2 Platform-Based Design 261

10.2.1 Design Challenge 261

10.2.2 Principles of Platform-Based Design 262

10.2.2.1 PBD Flow 263

10.2.2.2 “Fractal” Nature of PBD: Successive Refinements 264

10.2.2.3 Design Parameters for PBD 266

10.3 METROPOLISDesign Environment 267

10.3.1 Overview 267

10.3.2 METROPOLISMeta-Model 268

10.3.2.1 Function Modeling 268

10.3.2.2 Architecture Modeling 269

10.3.2.3 Mapping 271

10.3.2.4 Recursive Paradigm of Platforms 273

10.3.3 METROPOLISTools 275

10.3.3.1 Simulation 275

10.3.3.2 Formal Property Verification 276

10.3.3.3 Simulation Monitor 276

10.3.3.4 Quasi-Static Scheduling 277

10.4 METROII Design Environment 278

10.4.1 Overview 278

10.4.2 METROII Design Elements 279

10.4.2.1 Components 280

10.4.2.2 Ports 282

10.4.2.3 Constraint Solvers 282

10.4.2.4 Annotators and Schedulers 283

10.4.2.5 Mappers 283

10.4.2.6 Adaptors 284

10.4.3 METROII Semantics 284

10.4.3.1 Three-Phase Execution 285

10.4.3.2 Semantics of Required/Provided Ports 287

10.4.3.3 Semantics of Mapping 287

Trang 15

10.5 Related Work 292

10.5.1 Origin of METROII: From Polis to METROPOLIS 292

10.5.2 Industrial Approaches 294

10.5.3 Academic Approaches 295

10.6 Case Studies 301

10.6.1 UMTS 301

10.6.1.1 Functional Modeling 301

10.6.1.2 Architectural Modeling 304

10.6.1.3 Mapped System 306

10.6.1.4 Results 306

10.6.1.5 Conclusions 312

10.6.2 Intelligent Buildings: Indoor Air Quality 313

10.7 Conclusions 315

Acknowledgments 316

References 317

10.1 Introduction

System-level design (SLD) means many different things to many different people In our view, SLD is about the design of a whole that consists of several components where specifications are given in terms of functionality along with

• Constraints on the properties the design has to satisfy

• Constraints on the components that are available for implementation

• Objective functions that express the desirable features of the design when completed

This definition is general since it relates to many application domains from semiconductors to systems such as cars, airplanes, buildings, telecom-munications, and biological systems To deal with system-level problems, our view is that the issue to address is not developing new tools, albeit they are essential to advance the state of the art in design; rather it is the understanding of the principles of system design, the necessary change to design methodologies, and the dynamics of the supply chain Developing this understanding is necessary to define a sound approach to the needs of the system and component industry as they try to serve their customers bet-ter, and to develop their products faster and with higher quality This chapter

is about principles and how a unified methodology together with a support-ing software framework, as challengsupport-ing as it may seem, can be developed to bring the embedded electronics industry to a new level of efficiency

To demonstrate this view, we will first present the design challenges for future systems and a manifesto espousing the benefits of a unified methodol-ogy We will then summarize a methodology, platform-based design (PBD), that has been developed over the past decade and that we believe can fulfill

Tiêu đề	Model-Based Design for Embedded Systems
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Embedded Systems
Thể loại	Theoretical Document
Thành phố	Hanoi

Định dạng
Số trang	30
Dung lượng	782,67 KB