Five different modelswere used: a timed SystemC UMTS model [55], a timed METRO II UMTSmodel, an untimed METROII UMTS model, a SystemC runtime processingmodel, and a METROII architectural
Trang 1are different ways in which the cost may be calculated Steps 6–7 inFigure 10.12 illustrate two different types of processing elements thatmay be used, and the interface to inform them which processing rou-tine they should compute a cost for The type of the processing elementmay be changed easily to provide the necessary balance between thespeed of simulation and the required pre-simulation effort.
10.6.1.3 Mapped System
Table 10.4 describes the 48 mappings investigated These vary from 11 PEs
to 1 PE Partitions are broken down by the Rx, the Tx, the RLC, and the MACfunctionalities Each is categorized into one of nine separate classes based onthe number of processing elements and the mix of pre-profiled and runtimeprocessing elements Mappings are further categorized as purely runtimeprocessing (RTP) elements, purely profiled processing (PP) elements, or amix (MIX)
10.6.1.4 Results
Results relating to the design effort, the processing time, the framework ulation time, and the event processing are analyzed Five different modelswere used: a timed SystemC UMTS model [55], a timed METRO II UMTSmodel, an untimed METROII UMTS model, a SystemC runtime processingmodel, and a METROII architectural model In specific configurations, METRO
sim-II constraints were used as opposed to explicit synchronization The selection
of constraints, functional model configuration, architectural model ters, and mapping assignment is all achieved through small changes to thetop-level netlist All results are gathered on a 1.8 GHz Pentium M laptoprunning Windows XP with 1GB of RAM
parame-Figure 10.13 shows the UMTS estimated execution times (cycles) alongwith the average processing-element utilization Utilization is calculated asthe percentage of simulation rounds that an architectural processing elementhas enabled outstanding functional model event requests for its services.Low utilization indicates that a processing element is idle despite available,
outstanding requests The x-axis (mapping #) is ordered by increasing
execu-tion times The data is collected for each of the three scheduling algorithms.For round-robin scheduling, the lowest and highest execution times areobtained with mapping #1 (11 Sparcs) and mapping #46 (1 μBlaze), respec-tively Mapping #1 is 2167% faster than mapping #46 This shows a largerange in potential performances across mappings It is interesting to notethat there are 23 different mappings that offer better performance than the
11 μBlaze or 11 ARM7 cores (mappings #2 and #3) This illustrates that processor communication is a bottleneck for many designs, and despite hav-ing more concurrency those designs cannot keep pace with smaller, moreheavily-loaded mappings Among all four processor systems, mapping #14has the lowest execution time (two ARM9s used for the receiver and two
Trang 4Sparcs used for the transmitter) Mapping #31 has a similar execution timewith four different processors (Rx MAC on μBlaze, Rx RLC on ARM9, TxMAC on ARM7, and Tx RLC on Sparc) Many of the execution times aresimilar and the graph shows that there are essentially four performancegroupings.
The lowest utilization values for round robin occur in the 11 processorsetups (an average of 15%) The highest is 100% for all single processorsetups The max utilization before 100% is 39% This gap points to ineffi-ciency in the round-robin scheduler It may be a goal of the other schedulingalgorithms to close this gap Also notice that for similar execution times, uti-lization can vary as much as 28% (mappings #41 and #32, for example).The priority-based scheduling keeps the same relative ordering amongstthe execution times but reduces them on average by 13% The highest is an18% reduction (mapping #22, for example) and the smallest reduction is 9%(mapping #8, for example) The utilization numbers are actually reduced
as well by an average of 2% The largest reduction was 7% (in mapping
#6, for example) and the smallest was 1% (in mapping #31, for example)
As expected there was no change in the utilization or execution times formappings involving either eleven processing elements (fully concurrent) orthose with one element (no scheduling options) The utilization drop resultsfrom high-priority, data-dependent jobs running before low-priority, data-independent jobs
The FCFS scheduling also does not change the relative ordering of tion times but is not as successful at reducing them The average reduction
execu-is only 7% The maximum reduction execu-is 11% (in mapping #24, for example)and the minimum reduction is 4% (in mapping #5, for example) However,utilization is increased by 27% The max increase was 45% (in mapping
#31, for example) and the minimum improvement was 20% (in mapping #5,for example) The FCFS increases utilization due to the fact that many jobsthat would be low priority often request processing in the same round ashigh-priority jobs While technically they are both “first,” the priority wouldnegate this fact The FCFS’s round-robin tie-breaking scheme helps smallerjobs in this case
The analysis of execution and utilization for the UMTS shows that highutilization is difficult to obtain due to the data dependencies in the applica-tion Also, some of the partitions explored do not balance computation wellamongst the different processing elements in the architecture Many of thecoarser mappings only make this problem worse A solution is to furtherrefine the functional model to extract more concurrency From an execution-time standpoint, scheduling can improve the overall execution time but not
as much as is needed to make a large majority of these mappings desirablefor an actual implementation
An accuracy comparison was performed with mappings #2, #6, and #46(pure μBlaze mappings) These designs were created on the Xilinx ML310development board For mappings #2 and #46, there was only a 3.1% and
Trang 5a 2% increase, respectively, in execution times in the actual designs Formapping #6 (when scheduling affects the outcome), the increase was 16.2%(RR), 18% (PR), and 15% (FCFS) Mapping #46 inaccuracy is due to the start-
up code and IO operations not captured by the model Mapping #2 suffersfrom a slightly oversimplified point-to-point communication scheme in themodel as compared to the FSL links used by the MicroBlazes Finally, map-ping #6 requires a more refined OS model to more closely match the schedul-ing overhead of the actual OS used This comparison shows that METROIIsimulation can closely (within 5%) reflect actual implementations, and inthe cases where the differences are greater, a trade-off between the mod-eling detail, the simulation performance, and the accuracy can be quicklyanalyzed
The untimed METRO II UMTS functional model contains 12 processeswhile the architectural model may contain up to 26 processes This is a largedesign, spread across 85 files and 8,300 lines of code The changing of a map-ping is trivial however, which requires only changing a few macros andrecompiling two files (2.3% of total; <20 s) All 48 mappings can be done
in less than 16 min
The conversion of the SystemC timed functional model to an untimedMETROII functional model removes 1081 lines of code (related to schedulingand timing—both of which are in the architecture model) METROII mappingremoves much of the overhead associated with the SystemC model synchro-nization
METROII constraints for the read/write semantics of a FIFO only require
60 lines of code, which is 1.4% of the total code cost The average difference
of the entire conversion to METROII was only 1% per file More than half ofthese lines (58%) have to do with registering the constraints with the solvers.The conversion of a SystemC runtime processing model (the Sparc pro-cessing element) to METRO II only requires 92 additional lines This was amere 3.4% increase (2773 lines to 2681 lines) This includes adding sup-port for loading a new code at runtime, returning the cost of operation tothe netlist, and exposing events for mapping This result is encouraging forimporting code
Figure 10.14 illustrates the percentage of the actual simulation runtimespent in each of METROII’s simulation phases for the nine classes of map-pings The SystemC entry indicates the time spent in the SystemC simulationinfrastructure upon which METROII is built
On an average, 61% of the time is spent in Phase 1 (lowest section onthe bar graph), 5% in Phase 2 (second section), and 17% in Phase 3 (thirdsection) For models with only runtime processing elements (R), the aver-ages are 93%, 0.9%, and 3%, respectively This indicates that in runtimeprocessing, the METROII activities of annotation and scheduling are negli-gible in the runtime picture For pure profiled (P) mappings, they are 21%,7%, and 26% In this case, one can see that METRO II now accounts for agreater percentage of runtime (Phase 1 alone is the representative of other
Trang 6Runtime spent in different phases
PP avg RTP avg 9 8 7 6 5 4 3 2
1
Class
System C Phase 3 Phase 2 Phase 1
FIGURE 10.14
METROII phase runtime analysis
simulation environments.) For mixed classes, the numbers are 82%, 2.6% and7.6% Again the runtime processing elements dominate It should be notedthat while Ps have higher averages, the average runtime to process 7000 bytes
of data was 54 seconds The Phase 1 runtime and the SystemC overhead arethe main contributors to overall runtime
If we consider the SystemC timed functional model, the METROII timedfunctional model, and the METROII untimed functional model mapped to anarchitecture, the METROII timed functional model had an average increase of7.4% in runtime for the nine classes while the mapped version had a 54.8%reduction This reduction is due to the fact that METRO II Phases 2 and 3have significantly less overheads than the timer- and scheduler-based sys-tem required by the SystemC timed functional model
Table 10.5 shows the average number of event state changes per phaseand the average number of phases an event waits
On an average, only 0.14 events are annotated or scheduled per round.Because of the architectural model integration with the UMTS functionalmodel, there are a limited number of synchronization points (which satisfy arendezvous constraint, and, hence, an event state change) As shown in Fig-ure 10.14, Phases 2 and 3 do not account for a large portion of the runtime, so,while the event state change activity is low, it does not translate to increasedruntime Runtime is not increased directly by changing an event’s state, butrather by the total number of events in Phases 2 and 3
Trang 7TABLE 10.5
METROII Phase Event Analysis
Class Event/Ph Comp % Comm % Coord % Avg Wait
Finally, it should be noted that runtime processing vs pre-profiled cessing does not impact this aspect of simulation Comparing Classes 1 with
pro-2 or 3 with 4 confirms this This contrasts heavily with the runtime of thesimulation (in which the PE type is a key factor) The runtime processing inthe microarchitectural model is treated as a black box by METROII such thatthe internal events are unseen and do not trigger phase changes This indi-cates that SystemC components can be imported quite easily into METROIIwithout affecting the three-phase execution semantics
The 3rd, 4th, and 5th columns of Table 10.5 categorize the events inPhase 1 Computational events request processing-element services directly.Communication events transfer data between FIFOs, and coordinationevents maintain correct simulation semantics and operation The table indi-cates that events in the system are heavily related to coordination Classes 8and 9 have the lowest percentage of coordination events (64%), since theseare single-PE systems
10.6.1.5 Conclusions
We illustrated how an event-based design framework, METROII, may be used
to carry out architectural modeling and design-space exploration mental results show that METROII is capable of capturing functional mod-eling, architectural modeling, and mapping for a UMTS case study withlimited overhead as compared with a baseline SystemC model We showedthat the design effort involved in carrying out 48 separate mappings with avariety of architectural models is minimal Within the framework, we detail
Trang 8Experi-the runtime spent in Experi-the three different METROII execution phases and vide an idea of how events move throughout the system.
pro-Future work involves identifying and removing events not relevant forannotation or scheduling from METROII’s second and third phases, supportfor a wider variety of declarative constraints, and the analysis of other appli-cations that may be mapped onto similar architectural platforms
10.6.2 Intelligent Buildings: Indoor Air Quality
The construction of future energy-efficient commercial buildings will makeuse of sophisticated control architectures that are able to sense several phys-ical quantities, compute control laws, and apply control actions throughactuators Sensors, actuators, and computation units are physically dis-tributed over the buildings The control algorithm can be run on eitherdistributed controllers or a central controller The control performance is crit-ically affected by both computation and communication delays that need to
be within precise bounds in order to guarantee energy savings while taining the comfort level Thus, a major challenge in designing such systems
main-is to balance the computation and communication efforts In particular, adesigner needs to decide how to map the control algorithm on a set of con-trollers and needs to find an optimal communication network, meaning thecommunication medium and the network topology
The goal of this case study is to model and simulate the control of thetemperature in the rooms of a building at a high level of abstraction Thesimulation results will be used to partition the sensor–actuator delay intocomputation and communication latency requirements The communicationlatency requirements are then passed to an optimization tool that finds thebest communication network that supports the gathering of data from thesensors and the delivery of commands to actuators
Our design flow is shown in Figure 10.15 In Step 1, both the ality of the system and the architecture platform are modeled The map-ping between function and architecture models is carried out where thecontrollers and the point-to-point communication between sensors, actu-ators, and controllers are annotated with actual computation delays andvirtual communication delays The performance of the control algorithm isevaluated for different values of the communication delays until the leastconstraining latency requirements are found The communication require-ments are then passed to an external network synthesis tool—the commu-nication synthesis infrastructure (COSI) [51] In Step 2, the COSI synthe-sizes the communication network of the system based on the simulationresults Then, in Step 3, the abstract point-to-point communication channelsare mapped to the communication network obtained by COSI
function-Both the functionality and the architecture platforms of the control tem are modeled in METROII, while the environment dynamics is modeled
sys-in OpenModelica [27], an external simulation tool OpenModelica sys-interacts
Trang 9Step 1: modeling and
simulation
Mapping
Function model
Architecture model
Step 3: refinement COSI synthesis results
Simulation results
COSI Step 2: synthesis
FIGURE 10.15
Design flow of the room temperature control system
with the function model of the system The METRO II function model of
a two-room example and its interaction with OpenModelica is shown inFigure 10.16 The environment dynamics is described in the Modelicaprogramming language The Modelica language is designed to allow
Modelica model
OpenModelica
CORBA communication
METRO II
Interface to OpenModelica
A2 A1
FIGURE 10.16
METROII function model and OpenModelica
Trang 10convenient, component-oriented modeling of complex physical systems, e.g.,systems containing mechanical, electrical, electronic, hydraulic, thermal, con-trol, electric power, or process-oriented subcomponents [46] The Modelicamodel in the indoor air quality case study deals with pressure and tempera-ture dynamics in an indoor environment It takes into account the structure
of the building, its floorplan, the sizes of the different rooms, and the ment of doors and windows Moreover, it includes outlet vents that can inject
place-a cold/hot place-air flow to perform cooling/heplace-ating of the environment; they place-arethe actuators of the control system, but expressed in Modelica in terms oftheir effect on the temperature and pressure dynamics of the system
The METRO II model and the Modelica model are run together simulation [57]) Sensors and actuators in the functional model interact withthe plant to retrieve temperature values in the different rooms and to set thestatus (closed/open; hot/cold air flow) of the vents These operations obvi-ously require synchronization and information exchange between the tools.They are managed by the environment functional module, which controlsthe execution of the Modelica model (start and stop the simulation) and it
(co-is able to set and get the value of its parameters From an implementationpoint of view, this interaction is performed by the remote calling of a set ofservices provided by OpenModelica over a CORBA connection [18] estab-lished between the tools
The architecture model includes generic electronic control units (ECUs)communicating with sensors and actuators During mapping, the controllers
in the function model are allocated onto ECUs If multiple controllers aremapped onto one ECU, a METROII scheduler is constructed to coordinatetheir executions Various scheduling policies can be applied by designingdifferent types of schedulers, while keeping the controller tasks intact Inour example, we use round-robin scheduling Sensors and actuators in thefunction model are mapped to architectural sensors and actuators The com-munication between ECUs and sensoring/actuating units is modeled at anabstract level in Step 1 of the design flow The services of sensing, computingcontrol algorithms, and actuating are annotated with time by METROII anno-tators The end-to-end delays from sensing to actuating are computed dur-ing simulation The simulation results are sent to COSI, which synthesizesthe communication network in Step 2 of the design flow Then the synthesisresults are utilized to refine the abstract communication network in Step 3 ofthe flow
10.7 Conclusions
We discussed the trends and challenges of system design from a broad spective that covers both semiconductor and industrial segments that use
Trang 11per-embedded systems We argued in favor of the need of a unified way ofthinking about system design as the basis for a novel system science Oneapproach was presented, the PBD, that aims at achieving that unifying role.
We discussed some of the most promising approaches for chip and ded system design in the PBD perspective METROPOLIS and its successorMETRO II frameworks were presented Some examples of METROII applica-tions to different industrial domains were then described
embed-While we believe we are making significant inroads, much work remains
to be done to transfer the ideas and approaches that are flourishing today inresearch and in advanced companies to the generality of IC and embeddedsystem designers To be able to do so,
• We need to further advance the understanding of the relationshipsamong parts of a heterogeneous design and its interaction with thephysical environment
• The efficiency of algorithms and tools must be improved to offer a solidfoundation to the users
• Models and use cases have to be developed
• The scope of system-level design must be extended to include faulttolerance, security, and resiliency
• The EDA industry has to embrace the new paradigms and venture intounchartered waters to grow beyond where it is today It must create thenecessary tools to help engineers to apply the new paradigms
• Academia must develop new curricula (e.g., [13]) that favor a broaderapproach to engineering while emphasizing the importance of founda-tional disciplines such as mathematics and physics; embedded systemdesigners require a broad view and the capability of mastering hetero-geneous technologies
• The system and semiconductor industry must recognize the tance of investing in training and tools for their engineers to be able
impor-to bring new products and services impor-to market
Acknowledgments
We wish to acknowledge the support of the Gigascale System Research ter, the support of NSF-sponsored Center for Hybrid and Embedded Soft-ware Systems, the support of the EU networks of excellence ARTIST andHYCON, and of the European community project SPEEDS The past and thepresent support of General Motors, Infineon, Intel, Pirelli, ST, Telecom Italia(in particular, Marco Sgroi, Fabio Bellifemine, and Fulvio Faraci), UMC, andUnited Technologies Corporation (in particular, the strong interaction withClas Jacobson, John F Cassidy Jr., and Michael McQuade) is also gratefullyacknowledged
Trang 121 A Agrawal Graph rewriting and transformation (GReAT): A solution
for the model integrated computing (MIC) bottleneck In Proceedings of the 18th IEEE International Conference on Automated Software Engineering
2 P Alexander System Level Design with Rosetta Elsevier, San Francisco,
CA, 2006
3 K Arnold and J Gosling The Java Programming Language Addison
Wesley, Reading, MA, 1996
4 A Bakshi, V K Prasanna, A Ledeczi, V Mathur, S Mohanty, C S.Raghavendra, M Singh, A Agrawal, J Davis, B Eames, S Neema, and
G Nordstrom MILAN: A model based integrated simulation framework
for design of embedded systems In Proceedings of the Workshop on
UT, June 2001
5 F Balarin, M Chiodo, P Giusto, H Hsieh, A Jurecska, L Lavagno,
C Passerone, A Sangiovanni-Vincentelli, E Sentovich, K Suzuki, and
B Tabbara Hardware-Software Co-Design of Embedded Systems: The Polis
6 F Balarin, L Lavagno, C Passerone, A Sangiovanni-Vincentelli,
G Yang, and Y Watanabe Concurrent execution semantics and
sequen-tial simulation algorithms for the metropolis meta-model In ings of the Tenth International Symposium on Hardware/Software Codesign.
Society Press, 2002
7 F Balarin, L Lavagno, C Passerone, A Sangiovanni-Vincentelli,
M Sgroi, and Y Watanabe Modeling and designing heterogenous
sys-tems In J Cortadella, A Yakovlev, and G Rozenberg, editors,
2002 LNCS2549
8 F Balarin, H Hsieh, L Lavagno, C Passerone, A Vincentelli, and Y Watanabe Metropolis: An integrated environment for
Sangiovanni-electronic system design IEEE Computer, 36(4): 45–52, April 2003.
9 A Basu, M Bozga, and J Sifakis Modeling heterogeneous real-time
com-ponents in BIP In Proceedings of the Fourth IEEE International Conference on
DC, 2006 IEEE Computer Society
Trang 1310 G Berry and G Gonthier The ESTEREL synchronous programming
lan-guage: Design, semantics, implementation Science of Computer ming, 19(2):87–152, November 1992
Program-11 S Bliudze and J Sifakis The algebra of connectors—structuring
inter-actions in BIP In Proceedings of the 7th ACM & IEEE International
30–October 3, 2007
12 C Brooks, E A Lee, X Liu, S Neuendorffer, Y Zhao, and H Zheng(eds.) Heterogeneous concurrent modeling and design in Java (Vol-ume 1: Introduction to Ptolemy II) Technical Report UCB/ERL M05/21,University of California, Berkeley, CA, July 2005
13 A Burns and A Sangiovanni-Vincentelli Editorial ACM Transactions
August 2005
14 San Jose Mercury News (CA) Census counts on pencils, not computers.April 4, 2008
15 X Chen, F Chen, H Hsieh, F Balarin, and Y Watanabe Formal
verifica-tion of embedded system designs at multiple levels of abstracverifica-tion
Cannes, France, September 2002
16 X Chen, H Hsieh, F Balarin, and Y Watanabe Automatic
genera-tion of simulagenera-tion monitors from quantitative constraint formula Design
17 CoFluent Design CoFluent Studio World Wide Web, http://www.
Watan-In Proceedings of the 23rd Watan-International Conference on Application and Theory
20 P Cumming The TI OMAP platform approach to SOC In G Martin
and H Chang, editors, Winning the SoC Revolution, Kluwer Academic,
Norwell, MA, 2003
21 A Davare, D Densmore, T Meyerowitz, A Pinto, A Vincentelli, G Yang, and Q Zhu A next-generation design framework
Sangiovanni-for platSangiovanni-form-based design In Design and Verification Conference
Trang 1422 A Davare, Q Zhu, J Moondanos, and A Sangiovanni-Vincentelli JPEGencoding on the Intel MXP5800: A platform-based design case Study In
September 2005
23 J A de Oliveira and H van Antwerpen The Philips Nexperia digital
video platform In G Martin and H Chang, editors, Winning the SoC
24 D Densmore, A Donlin, and A L Sangiovanni-Vincentelli FPGAarchitecture characterization for system level performance analysis In
25 D Densmore, R Passerone, and A L Sangiovanni-Vincentelli A
platform-based taxonomy for ESL design IEEE Design & Test of ers, 23(5):359–374, May 2006
Comput-26 J Eker, J W Janneck, E A Lee, J Liu, X Liu, J Ludvig, S Neuendorffer,
S Sachs, and Y Xiong Taming heterogeneity—the Ptolemy approach
27 P Fritzson, P Aronsson, A Pop, H Lundvall, K Nystrom, L Saldamli,
D Broman, and A Sandholm Openmodelica—a free open-source
envi-ronment for system modeling, simulation, and teaching 2006 IEEE
Germany, pp 1588–1595, October 2006
28 G J Holzmann The model checker spin IEEE Transactions on Software
29 S Ito Convergence and divergence in parallel for the ubiquitous era
pp 143–143, November 2007
30 A Jantsch Modeling Embedded Systems and SOC’s: Concurrency and Time
CA, 2003
31 G Kahn The semantics of a simple language for parallel programming
In J L Rosenfeld, editor, Proceedings of the IFIP Congress 74, Information
1974
32 G Karsai, J Sztipanovits, A Ledeczi, and T Bapty Model-integrated
development of embedded software Proceedings of the IEEE, 91(1):145–
184, January 2003
33 K Keutzer, S Malik, A R Newton, J M Rabaey, and A Vincentelli System-level design: Orthogonalization of concerns and
Trang 15Sangiovanni-platform-based design IEEE Transactions on Computer-Aided Design of
34 C Kong and P Alexander The Rosetta meta-model framework In ceedings of the IEEE Engineering of Computer-Based Systems Symposium and
35 M Krigsman IT failure at Heathrow T5: What really happened April 7,
2008 http://blogs.zdnet.com/projectfailures/?p=681
36 A Ledeczi, J Davis, S Neema, and A Agrawal Modeling
methodol-ogy for integrated simulation of embedded systems ACM Transactions
37 A Lee and A Sangiovanni-Vincentelli A framework for comparing
models of computation IEEE Transactions on Computer-Aided Design of
38 X Liu, Y Xiong, and E A Lee The Ptolemy II framework for visual
languages In Proceedings of the IEEE 2001 Symposia on Human Centric
Computer Society, 2001
39 D Mathaikutty, H Patel, and S Shukla EWD: A metamodeling drivencustomizable multi-MoC system modeling environment FERMAT Tech-nical Report 2004-20, Virginia Tech, 2004
40 D A Mathaikutty, H Patel, and S Shukla A functional programmingframework of heterogeneous model of computation for system design
In Forum on Specification and Design Languages (FDL’04), Lille, France,
September 13–17, 2004
41 D A Mathaikutty, H D Patel, S K Shukla, and A Jantsch UMoC++:
A C++-based multi-MoC modeling environment In A Vachoux, editor,
Application of Specification and Design Languages for SoCs - Selected paper
42 T Meyerowitz, A Sangiovanni-Vincentelli, M Sauermann, and D gen Source level timing annotation and simulation for a heterogeneous
Lan-multiprocessor In DATE08, Munich, Germany, March 10–14, 2008.
43 J Miller and J Mukerji, editors MDA guide version 1.0.1 TechnicalReport omg/2003-06-01, OMG, 2003
44 Mirabilis Design Visual Sim World Wide Web, http://www.
mirabilisdesign.com, 2007
45 MLDesign Technologies MLDesigner World Wide Web, http://www.
mldesigner.com, 2007