Nicolescu/Model-Based Design for Embedded Systems 67842_C007 Finals Page 179 2009-10-27 MPSoC Platform Mapping Tools for Data-Dominated Applications Pierre G.. Nicolescu/Model-Based Desi
Trang 1Nicolescu/Model-Based Design for Embedded Systems 67842_C006 Finals Page 176 2009-10-1
23 MATLAB Homepage: http://www.mathworks.com Visited 2008-09-30
24 Modelica Homepage: http://modelica.org Visited 2008-09-30
25 ns-2 Homepage: http://www.isi.edu/nsnam/ns Visited 2008-09-30
26 Martin Ohlin, Dan Henriksson, and Anton Cervin TrueTime 1.5— Reference Manual, January 2007 Homepage: http://www.control.lth.se/
truetime
27 OMNeT++ Homepage: http://www.omnetpp.org Visited 2008-09-30
28 F Österlind A sensor network simulator for the Contiki OS Technical report T2006-05, SICS – Swedish Institute of Computer Science, February 2006
29 L Palopoli, L Abeni, and G Buttazzo Real-time control system analysis:
An integrated approach In Proceedings of the 21st IEEE Real-Time Systems Symposium, Orlando, FL, December 2000.
30 A Panousopoulou and A Tzes Utilization of mobile agents for
Voronoi-based heterogeneous wireless sensor network reconfiguration In Pro-ceedings of the European Control Conference (ECC), Kos, Greece, 2007.
31 C.E Perkins and E.M Royer Ad-hoc on-demand distance vector
(AODV) routing In Proceedings of the Second IEEE Workshop on Mobile Computing Systems and Applications, New Orleans, LA, 1999.
32 RUNES—Reconfigurable Ubiquitous Networked Embedded Systems Homepage: http://www.ist-runes.org Visited 2008-09-30
33 Scilab Homepage: http://www.scilab.org Visited 2008-09-30
34 F Singhoff, J Legrand, L Nana, and L Marcé Cheddar: A flexible real
time scheduling framework ACM SIGAda Ada Letters, 24(4), 1–8, 2004.
35 M.F Storch and J.W.-S Liu DRTSS: A simulation framework for
complex real-time systems In Proceedings of the Second IEEE Real-Time Technology and Applications Symposium, Boston, MA, 1996.
36 H.-Y Tyan Design, realization and evaluation of a component-based compositional software architecture for network simulation PhD thesis, Ohio State University, 2002
37 B Zurita Ares, C Fischione, A Speranzon, and K.H Johansson On power control for wireless sensor networks: Radio model, software
implementation and experimental evaluation In Proceedings of the Euro-pean Control Conference (ECC), Kos, Greece, 2007.
Trang 2Nicolescu/Model-Based Design for Embedded Systems 67842_S002 Finals Page 177 2009-10-1
Part II
Design Tools and Methodology for Multiprocessor System-on-Chip
Trang 3Nicolescu/Model-Based Design for Embedded Systems 67842_S002 Finals Page 178 2009-10-1
Trang 4Nicolescu/Model-Based Design for Embedded Systems 67842_C007 Finals Page 179 2009-10-2
7
MPSoC Platform Mapping Tools for
Data-Dominated Applications
Pierre G Paulin, Olivier Benny, Michel Langevin, Youcef Bouchebaba, Chuck Pilkington, Bruno Lavigueur, David Lo, Vincent Gagne, and Michel Metzger
CONTENTS
7.1 Introduction 179
7.1.1 Platform Programming Models 181
7.1.1.1 Explicit Capture of Parallelism 184
7.1.2 Characteristics of Parallel Multiprocessor SoC Platforms 184
7.2 MultiFlex Platform Mapping Technology Overview 185
7.2.1 Iterative Mapping Flow 186
7.2.2 Streaming Programming Model 187
7.3 MultiFlex Streaming Mapping Flow 188
7.3.1 Abstraction Levels 189
7.3.2 Application Functional Capture 190
7.3.3 Application Constraints 191
7.3.4 The High-Level Platform Specification 192
7.3.5 Intermediate Format 192
7.3.6 Model Assumptions and Distinctive Features 192
7.4 MultiFlex Streaming Mapping Tools 194
7.4.1 Task Assignment Tool 194
7.4.2 Task Refinement and Communication Generation Tools 195
7.4.3 Component Back-End Compilation 197
7.4.4 Runtime Support Components 197
7.5 Experimental Results 198
7.5.1 3G Application Mapping Experiments 198
7.5.2 Refinement and Simulation 202
7.6 Conclusions 203
7.6.1 Outlook 204
References 205
7.1 Introduction
The current deep submicron technology era—as it applies to low-cost, high-volume consumer digital convergence products—presents two opposing challenges: rising system-on-chip (SoC) platform development costs and
179
Trang 5Nicolescu/Model-Based Design for Embedded Systems 67842_C007 Finals Page 180 2009-10-2
shorter product market windows Compounding the problem is the rate of change due to evolving specifications and the appearance of multiple stan-dards that need to be incorporated into a single platform
There are three main causes to the rising SoC platform development costs The first is the continued rise in gate and memory count Today’s SoCs can have over 100 million transistors—enough to theoretically place the logic of over one thousand 32 bit RISC processors on a single die Leveraging these capabilities is a major challenge
The second cause is the increased complexity of dealing with deep submi-cron effects These include electro-migration, voltage-drop, and on-chip vari-ations These effects are having a dampening impact on design productivity Also, rising mask set costs—currently over one million dollars—compound the problem, and present a nearly insurmountable financial market entry barrier for smaller companies
The third cause is the rising embedded software development cost in current generation SoCs, driven by an accelerated rate of new feature intro-duction This is partly because of the convergence of computing, consumer, and communications domains that implies supporting a broader range of functionalities and standards for a wide set of geographic markets While the growth of hardware complexity in SoCs has tracked Moore’s law, with
a resulting growth of 56% in transistor count per year, industry studies [22] show that the complexity of embedded S/W is rising at a staggering 140% per year This software now represents over 50% of development costs in most SoCs and over 75% in emerging multiprocessor SoC (MP-SoC) platforms
As a result, the significant investment to develop the platform—typically
between 10M$ and 100M$ for today’s 65 nm platforms—requires to maximize the time-in-market for a given platform On the other hand, the consumer-led product cycles imply increasingly shorter time-to-market for the applications
supported by the platform
Finally, customers of a given SoC platform increasingly request to add their own value-added features as a market differentiator These features are not just superficial additions, such as human-interface and top-level control code For example, a SoC platform customer may have proprietary multimedia-oriented enhancements that they want to include in the platform (e.g., image noise reduction, face recognition, etc.)
All of these factors lead to the need for a domain-specific flexible plat-form that can be reused across a wide range of application variants In addition, time-to-market considerations mean that the platform must come with high-level application-to-platform mapping tools that increase devel-oper productivity Both of these requirements point in the direction of highly S/W programmable platform solutions A wide range of general-purpose and domain-specific cores exist and they come with powerful compilation, debug, and analysis tools This makes them a key component of the flexible SoC of the future
Trang 6Nicolescu/Model-Based Design for Embedded Systems 67842_C007 Finals Page 181 2009-10-2
MPSoC Platform Mapping Tools for Data-Dominated Applications 181
From the above market trends, it is clear that multiprocessor-based platforms will play a key role Of course, delivering this flexibility cannot
be achieved at any cost or power In mobile multimedia products, typical power targets for SoCs used in battery-powered products are a few hun-dred milliwatts [11] This suggests the use of domain-optimized heteroge-neous MP-SoC platforms that will embody a rich mix of general-purpose processor cores, domain- and application-specific processor cores, and H/W processing elements (PEs) to deliver a solution at a competitive cost and power
A key question is therefore how to effectively exploit this type of plat-form We need to tackle this challenge from three main directions:
1 The development of high-level platform programming models
2 The development of effective platform mapping technologies
3 The design of parallel platforms that support the programming models and facilitate the development of the platform mapping tools
This chapter focuses primarily on the first two objectives
7.1.1 Platform Programming Models
A SoC platform programming model is an abstraction of a heterogeneous system consisting of a range of loosely and tightly coupled processors, local and shared memory, communication channels, various hardware accelera-tors, and input/output (I/O) A platform programming model must both hide and expose the functionalities offered by the platform It must hide the heterogeneity of the underlying PEs, the heterogeneity of the tools used to program these PEs, and abstract the low-level communication mechanisms between the PEs, the storage elements, and I/O blocks
However, the programming model should also expose some top-level characteristics of the underlying platform It needs to capture the type
of high-level parallelism supported by the platform This is because most platforms are designed to naturally support one main class of high-level pro-gramming models For example, symmetric multiprocessing using shared memory, message-passing, or streaming
Moreover, in the domain of MP-SoCs, the programming model should not only abstract the programmable processors, it should also allow the exploitation of the abstract functionalities provided by all types of plat-form components including H/W blocks, communication channels, storage components, and I/O Figure 7.1 illustrates the programming model as the boundary between the high-level application description and the underly-ing heterogeneous platform
Trang 7Nicolescu/Model-Based Design for Embedded Systems 67842_C007 Finals Page 182 2009-10-2
Application Control Audio Programming model Platform Video
RISC DSP NoC
I/O Mem H/W
FIGURE 7.1
Application, platform, and programming model
We believe that at least three classes of platform
programming models are needed:
1 A symmetric multiprocessor (SMP) model, in
the spirit of Unix POSIX threads [15] This
pro-gramming model relies on symmetric
process-ing resources that access a shared memory
2 A distributed client–server programming
model, in the spirit of CORBA [16] or
DCOM [17] In this approach, applications
are encapsulated into well-defined
compo-nents with explicit interfaces It relies on
an abstract message-passing communication
scheme where all communication between
parallel application components is explicit
3 A dataflow-oriented streaming programming
model, as illustrated by StreamIt [3] and
Brooks [2] As with the client–server model,
this approach encapsulates applications into well-defined S/W compo-nents, but implements a dataflow-driven static or dynamic communica-tion semantic Control is typically fairly simple
Table 7.1 summarizes the main advantages and drawbacks of these three programming models
• In the SMP model, the application is organized as a set of processes that share a common operating system (OS) and memory This model provides the support of current OSs and facilitates the use of legacy code Moreover, some form of load balancing of resources is usu-ally supported However, the data coherency has to be maintained This typically involves expensive cache coherency hardware In data-dominated applications, this programming model implies high data bandwidth for inter-processor communication unless data movement
is controlled carefully By definition, it is designed for symmetric sys-tems and is hardly applicable for heterogeneous processing resources
In practical implementations of SMP platforms, scalability is limited between two and eight processors
• In the client–server model, the application is organized as a set of clients and servers; the client makes a service request from the server that fulfills the request Generally, an object request broker (ORB) acts as an agent between the client request and the completion of this request This model is appropriate for heterogeneous systems and control-oriented applications and it presents a good potential for scal-ing and load balancscal-ing However, the client–server model requires data marshaling—the process of gathering data and transforming it into a standard format before it is transmitted over a network—so that the data can transcend network boundaries [8] This generalization of
Trang 8Nicolescu/Model-Based Design for Embedded Systems 67842_C007 Finals Page 183 2009-10-2
MPSoC Platform Mapping Tools for Data-Dominated Applications 183
TABLE 7.1
Programming Models for MPSoCs
Programming
SMP Natural support of current OS
Legacy code support
Need to maintain coherence
of local, shared data Load balancing High inter-processor data
communication bandwidth Limited scalability
No support for heterogeneous systems Client–server Supports heterogeneous systems Marshalling problem
Potential for scaling and load balancing
Heavy infrastructure Lack of streamlining Good support for
control-oriented application Streaming Low overhead communications Timing of control and data
Reduced data bandwidth on communication channels
Poor support for control-oriented applications Orthogonal communication and
computation Easy to estimate the communication requirements of the application
the communication adds to the complexity of the supporting infras-tructure and implies some performance overhead
• In comparison with the client–server and SMP models, the stream-ing programmstream-ing model provides poor support for control-oriented computation, and the timing of control and data is difficult However, this model is more suitable for data-oriented applications The stream-ing model enables low overhead communications and the reduction
of data bandwidth Moreover, communication and computation are orthogonal and by analyzing the communication edges in a stream computation, it is possible to obtain precise estimates of the commu-nication requirements for a given application This greatly simplifies analysis and mapping of application onto parallel architectures [1]
In summary, there is a continuum of characteristics that need to be consid-ered when moving between SMP on one end, client–server in the middle, and streaming on the other end SMP is the most preferred general-purpose model, it is relatively user-friendly, but this ease of use is at the expense of predictability, performance, and cost At the opposite end of the continuum, streaming is a more constrained, predictable, and understandable model, but
is more specialized toward dataflow and requires more time to express and optimize The client–server programming model is more general-purpose than streaming, and expresses control applications better However,
Trang 9Nicolescu/Model-Based Design for Embedded Systems 67842_C007 Finals Page 184 2009-10-2
automatic load balancing can imply high-communication bandwidth between PEs
Each of these programming models have their advantages and inconve-niences, and we have found that, for the consumer style multimedia and communications SoC platforms we have been working with, we need to use all three—sometimes making use of more than one for a single platform, often in a tightly coupled, interoperable fashion Due to the tight constraints
in the design of MP-SoCs, the designers have to choose the appropriate pro-gramming model(s) in order to develop their applications on a particular platform or subsystem
7.1.1.1 Explicit Capture of Parallelism
A key assumption made here—for all three programming models, as we have defined them—is that the application developer is responsible for iden-tifying and explicitly expressing parallelism However, in our experience for domain-specific application code in communications, imaging, video, and audio, this is a reasonable assumption Parallelism is tractable and well understood in many cases Moreover, designers have been dealing with this type of parallelism in hardware-based platforms for many years For an application such as an MPEG4 video encoder consisting of 10,000 lines of sequential C reference code, our experience has shown that the paralleliza-tion represents less than one or two person-months of work (for a person already familiar with the application and the programming model)
7.1.2 Characteristics of Parallel Multiprocessor SoC Platforms
While our research work is focused primarily on the programming mod-els and platform mapping tools, the characteristics of the target MP-SoC platform have a significant impact on the complexity of the mapping problem, and the efficiency of the end results From an idealistic mapping tools-only perspective, the MP-SoC platforms would embed a homogeneous set of general-purpose RISC-style processors This is not realistic for the foreseeable future [20]:
• Domain-specific cores such as DSPs offer 2X–4X performance in their domain of application via instruction specialization and wider instruc-tion words The combinainstruc-tion of SIMD-style word-level parallelism can increase performance by another factor of 2X–8X in certain cases
• Configurable ASIPs (application-specific instruction-set processors) can offer 10X–100X performance improvements via application-specific instruction sets and tightly coupled H/W coprocessors
• Hardware coprocessors can offer 100X or more performance advan-tages and/or significant power and area savings They will remain essential for highly parallel, regular operations with high data rates
In particular, for data processing operations that are fixed for an
Trang 10Nicolescu/Model-Based Design for Embedded Systems 67842_C007 Finals Page 185 2009-10-2
MPSoC Platform Mapping Tools for Data-Dominated Applications 185
application domain (e.g., direct and inverse discrete cosine transforms—DCT and iDCT—used in video processing)
• Legacy code and general-purpose OS support will often dictate the host processor for the platform The data representation used in this processor is not likely to be compatible with the parallel processor sub-systems, or the hardware coprocessors
• Some application tasks will not be parallelizable; therefore, fast general-purpose cores will be necessary to support these
As a result, we believe that a performance and power effective platform for the consumer-dominated convergence platforms will be composed of a het-erogeneous composition of the following PE types:
• A medium to high-performance, general-purpose RISC core, typically running a standard general-purpose OS Increasingly, this host system will consist of a two to four core SMP cluster, as they appear in the marketplace All the top-level control code will run here Legacy code that is not performance critical will also run on this processor Finally, customer-specific developments and controlled access to the domain-specific parallel subsystems will usually occur via this general-purpose processor and OS pair
• Domain-specific subsystems composed of mostly homogeneous, lightweight multiprocessor clusters Although homogeneous, the instruction-set of these processors will typically be optimized toward
a broad application domain (e.g., video codec, image quality improve-ment, wireless communications, and 3D graphics)
• Tightly coupled hardware PEs for domain-specific data processing functions
• Domain-specific I/O blocks, which are becoming increasingly flexible
7.2 MultiFlex Platform Mapping Technology Overview
This section introduces the MultiFlex technology, which supports the mapping of user-defined parallel applications, expressed in one or more programming models, onto a MP-SoC platform
The support in MultiFlex of a lightweight SMP programming model was described in [12] This uses a hardware-assisted concurrency engine to sup-port small grain parallelism dynamically
In MultiFlex, the client–server programming model is referred to as
“DSOC” (Distributed System Object Component), and was also described
in [12] This toolset supports static and dynamic load balancing and sup-ports heterogeneous PEs with potentially different data representations Dynamic load balancing is achieved using either a lightweight S/W-based kernel to dynamically schedule large-grain tasks, or a hardware-assisted