Model based design for embedded systems part 21

Nicolescu/Model-Based Design for Embedded Systems 67842_C007 Finals Page 179 2009-10-27 MPSoC Platform Mapping Tools for Data-Dominated Applications Pierre G.. Nicolescu/Model-Based Desi

Trang 1

Nicolescu/Model-Based Design for Embedded Systems 67842_C006 Finals Page 176 2009-10-1

23 MATLAB Homepage: http://www.mathworks.com Visited 2008-09-30

24 Modelica Homepage: http://modelica.org Visited 2008-09-30

25 ns-2 Homepage: http://www.isi.edu/nsnam/ns Visited 2008-09-30

26 Martin Ohlin, Dan Henriksson, and Anton Cervin TrueTime 1.5— Reference Manual, January 2007 Homepage: http://www.control.lth.se/

truetime

27 OMNeT++ Homepage: http://www.omnetpp.org Visited 2008-09-30

28 F Österlind A sensor network simulator for the Contiki OS Technical report T2006-05, SICS – Swedish Institute of Computer Science, February 2006

29 L Palopoli, L Abeni, and G Buttazzo Real-time control system analysis:

An integrated approach In Proceedings of the 21st IEEE Real-Time Systems Symposium, Orlando, FL, December 2000.

30 A Panousopoulou and A Tzes Utilization of mobile agents for

Voronoi-based heterogeneous wireless sensor network reconfiguration In Pro-ceedings of the European Control Conference (ECC), Kos, Greece, 2007.

31 C.E Perkins and E.M Royer Ad-hoc on-demand distance vector

(AODV) routing In Proceedings of the Second IEEE Workshop on Mobile Computing Systems and Applications, New Orleans, LA, 1999.

32 RUNES—Reconfigurable Ubiquitous Networked Embedded Systems Homepage: http://www.ist-runes.org Visited 2008-09-30

33 Scilab Homepage: http://www.scilab.org Visited 2008-09-30

34 F Singhoff, J Legrand, L Nana, and L Marcé Cheddar: A flexible real

time scheduling framework ACM SIGAda Ada Letters, 24(4), 1–8, 2004.

35 M.F Storch and J.W.-S Liu DRTSS: A simulation framework for

complex real-time systems In Proceedings of the Second IEEE Real-Time Technology and Applications Symposium, Boston, MA, 1996.

36 H.-Y Tyan Design, realization and evaluation of a component-based compositional software architecture for network simulation PhD thesis, Ohio State University, 2002

37 B Zurita Ares, C Fischione, A Speranzon, and K.H Johansson On power control for wireless sensor networks: Radio model, software

implementation and experimental evaluation In Proceedings of the Euro-pean Control Conference (ECC), Kos, Greece, 2007.

Trang 2

Nicolescu/Model-Based Design for Embedded Systems 67842_S002 Finals Page 177 2009-10-1

Part II

Design Tools and Methodology for Multiprocessor System-on-Chip

Trang 3

Nicolescu/Model-Based Design for Embedded Systems 67842_S002 Finals Page 178 2009-10-1

Trang 4

7

MPSoC Platform Mapping Tools for

Data-Dominated Applications

Pierre G Paulin, Olivier Benny, Michel Langevin, Youcef Bouchebaba, Chuck Pilkington, Bruno Lavigueur, David Lo, Vincent Gagne, and Michel Metzger

CONTENTS

7.1 Introduction 179

7.1.1 Platform Programming Models 181

7.1.1.1 Explicit Capture of Parallelism 184

7.1.2 Characteristics of Parallel Multiprocessor SoC Platforms 184

7.2 MultiFlex Platform Mapping Technology Overview 185

7.2.1 Iterative Mapping Flow 186

7.2.2 Streaming Programming Model 187

7.3 MultiFlex Streaming Mapping Flow 188

7.3.1 Abstraction Levels 189

7.3.2 Application Functional Capture 190

7.3.3 Application Constraints 191

7.3.4 The High-Level Platform Specification 192

7.3.5 Intermediate Format 192

7.3.6 Model Assumptions and Distinctive Features 192

7.4 MultiFlex Streaming Mapping Tools 194

7.4.1 Task Assignment Tool 194

7.4.2 Task Refinement and Communication Generation Tools 195

7.4.3 Component Back-End Compilation 197

7.4.4 Runtime Support Components 197

7.5 Experimental Results 198

7.5.1 3G Application Mapping Experiments 198

7.5.2 Refinement and Simulation 202

7.6 Conclusions 203

7.6.1 Outlook 204

References 205

7.1 Introduction

The current deep submicron technology era—as it applies to low-cost, high-volume consumer digital convergence products—presents two opposing challenges: rising system-on-chip (SoC) platform development costs and

179

Trang 5

shorter product market windows Compounding the problem is the rate of change due to evolving specifications and the appearance of multiple stan-dards that need to be incorporated into a single platform

There are three main causes to the rising SoC platform development costs The first is the continued rise in gate and memory count Today’s SoCs can have over 100 million transistors—enough to theoretically place the logic of over one thousand 32 bit RISC processors on a single die Leveraging these capabilities is a major challenge

The second cause is the increased complexity of dealing with deep submi-cron effects These include electro-migration, voltage-drop, and on-chip vari-ations These effects are having a dampening impact on design productivity Also, rising mask set costs—currently over one million dollars—compound the problem, and present a nearly insurmountable financial market entry barrier for smaller companies

The third cause is the rising embedded software development cost in current generation SoCs, driven by an accelerated rate of new feature intro-duction This is partly because of the convergence of computing, consumer, and communications domains that implies supporting a broader range of functionalities and standards for a wide set of geographic markets While the growth of hardware complexity in SoCs has tracked Moore’s law, with

a resulting growth of 56% in transistor count per year, industry studies [22] show that the complexity of embedded S/W is rising at a staggering 140% per year This software now represents over 50% of development costs in most SoCs and over 75% in emerging multiprocessor SoC (MP-SoC) platforms

As a result, the significant investment to develop the platform—typically

between 10M$ and 100M$ for today’s 65 nm platforms—requires to maximize the time-in-market for a given platform On the other hand, the consumer-led product cycles imply increasingly shorter time-to-market for the applications

supported by the platform

Finally, customers of a given SoC platform increasingly request to add their own value-added features as a market differentiator These features are not just superficial additions, such as human-interface and top-level control code For example, a SoC platform customer may have proprietary multimedia-oriented enhancements that they want to include in the platform (e.g., image noise reduction, face recognition, etc.)

All of these factors lead to the need for a domain-specific flexible plat-form that can be reused across a wide range of application variants In addition, time-to-market considerations mean that the platform must come with high-level application-to-platform mapping tools that increase devel-oper productivity Both of these requirements point in the direction of highly S/W programmable platform solutions A wide range of general-purpose and domain-specific cores exist and they come with powerful compilation, debug, and analysis tools This makes them a key component of the flexible SoC of the future

Trang 6

MPSoC Platform Mapping Tools for Data-Dominated Applications 181

From the above market trends, it is clear that multiprocessor-based platforms will play a key role Of course, delivering this flexibility cannot

be achieved at any cost or power In mobile multimedia products, typical power targets for SoCs used in battery-powered products are a few hun-dred milliwatts [11] This suggests the use of domain-optimized heteroge-neous MP-SoC platforms that will embody a rich mix of general-purpose processor cores, domain- and application-specific processor cores, and H/W processing elements (PEs) to deliver a solution at a competitive cost and power

A key question is therefore how to effectively exploit this type of plat-form We need to tackle this challenge from three main directions:

1 The development of high-level platform programming models

2 The development of effective platform mapping technologies

3 The design of parallel platforms that support the programming models and facilitate the development of the platform mapping tools

This chapter focuses primarily on the first two objectives

7.1.1 Platform Programming Models

A SoC platform programming model is an abstraction of a heterogeneous system consisting of a range of loosely and tightly coupled processors, local and shared memory, communication channels, various hardware accelera-tors, and input/output (I/O) A platform programming model must both hide and expose the functionalities offered by the platform It must hide the heterogeneity of the underlying PEs, the heterogeneity of the tools used to program these PEs, and abstract the low-level communication mechanisms between the PEs, the storage elements, and I/O blocks

However, the programming model should also expose some top-level characteristics of the underlying platform It needs to capture the type

of high-level parallelism supported by the platform This is because most platforms are designed to naturally support one main class of high-level pro-gramming models For example, symmetric multiprocessing using shared memory, message-passing, or streaming

Moreover, in the domain of MP-SoCs, the programming model should not only abstract the programmable processors, it should also allow the exploitation of the abstract functionalities provided by all types of plat-form components including H/W blocks, communication channels, storage components, and I/O Figure 7.1 illustrates the programming model as the boundary between the high-level application description and the underly-ing heterogeneous platform

Trang 7

Application Control Audio Programming model Platform Video

RISC DSP NoC

I/O Mem H/W

FIGURE 7.1

Application, platform, and programming model

We believe that at least three classes of platform

programming models are needed:

1 A symmetric multiprocessor (SMP) model, in

the spirit of Unix POSIX threads [15] This

pro-gramming model relies on symmetric

process-ing resources that access a shared memory

2 A distributed client–server programming

model, in the spirit of CORBA [16] or

DCOM [17] In this approach, applications

are encapsulated into well-defined

compo-nents with explicit interfaces It relies on

an abstract message-passing communication

scheme where all communication between

parallel application components is explicit

3 A dataflow-oriented streaming programming

model, as illustrated by StreamIt [3] and

Brooks [2] As with the client–server model,

this approach encapsulates applications into well-defined S/W compo-nents, but implements a dataflow-driven static or dynamic communica-tion semantic Control is typically fairly simple

Table 7.1 summarizes the main advantages and drawbacks of these three programming models

• In the SMP model, the application is organized as a set of processes that share a common operating system (OS) and memory This model provides the support of current OSs and facilitates the use of legacy code Moreover, some form of load balancing of resources is usu-ally supported However, the data coherency has to be maintained This typically involves expensive cache coherency hardware In data-dominated applications, this programming model implies high data bandwidth for inter-processor communication unless data movement

is controlled carefully By definition, it is designed for symmetric sys-tems and is hardly applicable for heterogeneous processing resources

In practical implementations of SMP platforms, scalability is limited between two and eight processors

• In the client–server model, the application is organized as a set of clients and servers; the client makes a service request from the server that fulfills the request Generally, an object request broker (ORB) acts as an agent between the client request and the completion of this request This model is appropriate for heterogeneous systems and control-oriented applications and it presents a good potential for scal-ing and load balancscal-ing However, the client–server model requires data marshaling—the process of gathering data and transforming it into a standard format before it is transmitted over a network—so that the data can transcend network boundaries [8] This generalization of

Trang 8

TABLE 7.1

Programming Models for MPSoCs

Programming

SMP Natural support of current OS

Legacy code support

Need to maintain coherence

of local, shared data Load balancing High inter-processor data

communication bandwidth Limited scalability

No support for heterogeneous systems Client–server Supports heterogeneous systems Marshalling problem

Potential for scaling and load balancing

Heavy infrastructure Lack of streamlining Good support for

control-oriented application Streaming Low overhead communications Timing of control and data

Reduced data bandwidth on communication channels

Poor support for control-oriented applications Orthogonal communication and

computation Easy to estimate the communication requirements of the application

the communication adds to the complexity of the supporting infras-tructure and implies some performance overhead

• In comparison with the client–server and SMP models, the stream-ing programmstream-ing model provides poor support for control-oriented computation, and the timing of control and data is difficult However, this model is more suitable for data-oriented applications The stream-ing model enables low overhead communications and the reduction

of data bandwidth Moreover, communication and computation are orthogonal and by analyzing the communication edges in a stream computation, it is possible to obtain precise estimates of the commu-nication requirements for a given application This greatly simplifies analysis and mapping of application onto parallel architectures [1]

In summary, there is a continuum of characteristics that need to be consid-ered when moving between SMP on one end, client–server in the middle, and streaming on the other end SMP is the most preferred general-purpose model, it is relatively user-friendly, but this ease of use is at the expense of predictability, performance, and cost At the opposite end of the continuum, streaming is a more constrained, predictable, and understandable model, but

is more specialized toward dataflow and requires more time to express and optimize The client–server programming model is more general-purpose than streaming, and expresses control applications better However,

Trang 9

automatic load balancing can imply high-communication bandwidth between PEs

Each of these programming models have their advantages and inconve-niences, and we have found that, for the consumer style multimedia and communications SoC platforms we have been working with, we need to use all three—sometimes making use of more than one for a single platform, often in a tightly coupled, interoperable fashion Due to the tight constraints

in the design of MP-SoCs, the designers have to choose the appropriate pro-gramming model(s) in order to develop their applications on a particular platform or subsystem

7.1.1.1 Explicit Capture of Parallelism

A key assumption made here—for all three programming models, as we have defined them—is that the application developer is responsible for iden-tifying and explicitly expressing parallelism However, in our experience for domain-specific application code in communications, imaging, video, and audio, this is a reasonable assumption Parallelism is tractable and well understood in many cases Moreover, designers have been dealing with this type of parallelism in hardware-based platforms for many years For an application such as an MPEG4 video encoder consisting of 10,000 lines of sequential C reference code, our experience has shown that the paralleliza-tion represents less than one or two person-months of work (for a person already familiar with the application and the programming model)

7.1.2 Characteristics of Parallel Multiprocessor SoC Platforms

While our research work is focused primarily on the programming mod-els and platform mapping tools, the characteristics of the target MP-SoC platform have a significant impact on the complexity of the mapping problem, and the efficiency of the end results From an idealistic mapping tools-only perspective, the MP-SoC platforms would embed a homogeneous set of general-purpose RISC-style processors This is not realistic for the foreseeable future [20]:

• Domain-specific cores such as DSPs offer 2X–4X performance in their domain of application via instruction specialization and wider instruc-tion words The combinainstruc-tion of SIMD-style word-level parallelism can increase performance by another factor of 2X–8X in certain cases

• Configurable ASIPs (application-specific instruction-set processors) can offer 10X–100X performance improvements via application-specific instruction sets and tightly coupled H/W coprocessors

• Hardware coprocessors can offer 100X or more performance advan-tages and/or significant power and area savings They will remain essential for highly parallel, regular operations with high data rates

In particular, for data processing operations that are fixed for an

Trang 10

application domain (e.g., direct and inverse discrete cosine transforms—DCT and iDCT—used in video processing)

• Legacy code and general-purpose OS support will often dictate the host processor for the platform The data representation used in this processor is not likely to be compatible with the parallel processor sub-systems, or the hardware coprocessors

• Some application tasks will not be parallelizable; therefore, fast general-purpose cores will be necessary to support these

As a result, we believe that a performance and power effective platform for the consumer-dominated convergence platforms will be composed of a het-erogeneous composition of the following PE types:

• A medium to high-performance, general-purpose RISC core, typically running a standard general-purpose OS Increasingly, this host system will consist of a two to four core SMP cluster, as they appear in the marketplace All the top-level control code will run here Legacy code that is not performance critical will also run on this processor Finally, customer-specific developments and controlled access to the domain-specific parallel subsystems will usually occur via this general-purpose processor and OS pair

• Domain-specific subsystems composed of mostly homogeneous, lightweight multiprocessor clusters Although homogeneous, the instruction-set of these processors will typically be optimized toward

a broad application domain (e.g., video codec, image quality improve-ment, wireless communications, and 3D graphics)

• Tightly coupled hardware PEs for domain-specific data processing functions

• Domain-specific I/O blocks, which are becoming increasingly flexible

7.2 MultiFlex Platform Mapping Technology Overview

This section introduces the MultiFlex technology, which supports the mapping of user-defined parallel applications, expressed in one or more programming models, onto a MP-SoC platform

The support in MultiFlex of a lightweight SMP programming model was described in [12] This uses a hardware-assisted concurrency engine to sup-port small grain parallelism dynamically

In MultiFlex, the client–server programming model is referred to as

“DSOC” (Distributed System Object Component), and was also described

in [12] This toolset supports static and dynamic load balancing and sup-ports heterogeneous PEs with potentially different data representations Dynamic load balancing is achieved using either a lightweight S/W-based kernel to dynamically schedule large-grain tasks, or a hardware-assisted

Định dạng
Số trang	10
Dung lượng	269,47 KB