As shown in Figure 1.2, the simulator now sup-ports 3D NoC architectures 3D mesh and 3D torus, as shown inFigure 1.3and vertical link interconnection patterns.. The type of input traffic
Trang 1NETWORKS- CHIPS Theory and Practice
Trang 2NETWORKS-
ON-CHIPS Theory and Practice
Edited by
FAYEZGEBALI
HAYTHAM ELMILIGI HQHAHED WATHEQ EL-KHARASHI
CRC Press
Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the
Taylor & Francis Group, an inform,! business
Trang 36000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2009 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S Government works
Printed in the United States of America on acid-free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number-13: 978-1-4200-7978-4 (Hardcover)
This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher can- not assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access right.com ( http://www.copyright.com /) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that pro- vides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
www.copy-Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Networks-on-chips : theory and practice / editors, Fayez Gebali, Haytham
Elmiligi, Mohamed Watheq El-Kharashi.
p cm.
“A CRC title.”
Includes bibliographical references and index.
ISBN 978-1-4200-7978-4 (hardcover : alk paper)
1 Networks on a chip I Gebali, Fayez II Elmiligi, Haytham III El-Kharashi,
Mohamed Watheq IV Title.
Trang 4Preface vii
About the Editors xi
Contributors xiii
1 Three-Dimensional Networks-on-Chip Architectures 1
Alexandros Bartzas, Kostas Siozios, and Dimitrios Soudris 2 Resource Allocation for QoS On-Chip Communication 29
Axel Jantsch and Zhonghai Lu 3 Networks-on-Chip Protocols .65
Michihiro Koibuchi and Hiroki Matsutani 4 On-Chip Processor Traffic Modeling for Networks-on-Chip Design 95
Antoine Scherrer, Antoine Fraboulet, and Tanguy Risset 5 Security in Networks-on-Chips 123
Leandro Fiorin, Gianluca Palermo, Cristina Silvano, and Mariagiovanna Sami 6 Formal Verification of Communications in Networks-on-Chips .155
Dominique Borrione, Amr Helmy, Laurence Pierre, and Julien Schmaltz 7 Test and Fault Tolerance for Networks-on-Chip Infrastructures .191
Partha Pratim Pande, Cristian Grecu, Amlan Ganguly, Andre Ivanov, and Resve Saleh 8 Monitoring Services for Networks-on-Chips 223
George Kornaros, Ioannis Papaeystathiou, and Dionysios Pnevmatikatos 9 Energy and Power Issues in Networks-on-Chips .255 Seung Eun Lee and Nader Bagherzadeh
v
Trang 510 The CHAINworks Tool Suite: A Complete Industrial
Design Flow for Networks-on-Chips 281 John Bainbridge
Coding Applications .307 Dragomir Milojevic, Anthony Leroy, Frederic Robert,
Philippe Martin, and Diederik Verkest
Trang 6Networks-on-chip (NoC) is the latest development in VLSI integration creasing levels of integration resulted in systems with different types of ap-plications, each having its own I/O traffic characteristics Since the early days
In-of VLSI, communication within the chip dominated the die area and dictatedclock speed and power consumption Using buses is becoming less desirable,especially with the ever growing complexity of single-die multiprocessor sys-tems As a consequence, the main feature of NoC is the use of networkingtechnology to establish data exchange within the chip
Using this NoC paradigm has several advantages, the main being theseparation of IP design and functionality from chip communicationrequirements and interfacing This has a side benefit of allowing the designer
to use different IPs without worrying about IP interfacing because wrappermodules can be used to interface IPs to the communication network Need-less to say, the design of complex systems, such as NoC-based applications,involves many disciplines and specializations spanning the range of systemdesign methodologies, CAD tool development, system testing, communica-tion protocol design, and physical design such as using photonics
This book addresses many challenging topics related to the NoC researcharea The book starts by studying 3D NoC architectures and progresses to adiscussion on NoC resource allocation, processor traffic modeling, and for-mal verification NoC protocols are examined at different layers of abstrac-tion Several emerging research issues in NoC are highlighted such as NoCquality of service (QoS), testing and verification methodologies, NoC secu-rity requirements, and real-time monitoring The book also tackles powerand energy issues in NoC-based designs, as power constraints are currentlyconsidered among the bottlenecks that limit embedding more processingelements on a single chip Following that, the CHAINworks, an industrial
design flow from Silistix, is introduced to address the complexity issues ofcombining various design techniques using NoC technology A case study
of Multiprocessor SoC (MPSoC) for video coding applications is presentedusing Arteris NoC The proposed MPSoC is a flexible platform, which allowsdesigners to easily implement other multimedia applications and evaluatethe future video encoding standards
This book is organized as follows.Chapter 1 discusses the design of 3DNoCs, which are multi-layer-architecture networks with each layer designed
as a 2D NoC grid The chapter explores the design space of 3D NoCs, takinginto account consumed energy, packet latency, and area overhead as cost fac-tors Aiming at the best performance for incoming traffic, the authors present
a methodology for designing heterogeneous 3D NoC topologies with a bination of 2D and 3D routers and vertical links
com-vii
Trang 7Chapter 2studies resource allocation schemes that provide shared NoCcommunication resources, where well-defined QoS characteristics are ana-lyzed The chapter considers delay, throughput, and jitter as the performancemeasures The authors consider three main categories for resource allocationtechniques: circuit switching, time division multiplexing (TDM), and aggre-gate resource allocation The first technique, circuit switching, allocates allnecessary resources during the lifetime of a connection The second tech-nique, TDM, allocates resources to a specific user during well-defined timeperiods, whereas the third one, aggregate resource allocation, provides a flex-ible allocation scheme The chapter also elaborates on some aspects of priorityschemes and fairness of resource allocation As a case study, an example of acomplex telecom system is presented at the end of the chapter.
flow control These issues are vital for any on-chip interconnection networkbecause they affect transfer latency, silicon area, power consumption, andoverall performance Switch-to-switch and end-to-end flow control techni-ques are discussed with emphasis on switching and channel buffer manage-ment Different algorithms are also explained with a focus on performancemetrics The chapter concludes with a detailed list of practical issues includ-ing a discussion on research trends in relevant areas Following are the trendsdiscussed: reliability and fault tolerance, power consumption and its relation
to routing algorithms, and advanced flow control mechanisms
performance Predictable communication schemes are required for trafficmodeling and generation of dedicated IPs (e.g., for multimedia and signalprocessing applications) Precise traffic modeling is essential to build an effi-cient tool for predicting communication performance Although it is possible
to generate traffic that is similar to that produced by an application IP, it ismuch more difficult to model processor traffic because of the difficulty inpredicting cache behavior and operating system interrupts A common way
to model communication performance is using traffic generators instead ofreal IPs This chapter discusses the details of traffic generators It first detailsvarious steps involved in the design of traffic generation environment Then,
as an example, an MPEG environment is presented
scala-bility, efficiency, and reliability could be undermined by a security weakness.However, NoCs could contribute to the overall security of any system byproviding additional means to monitor system behavior and detect specificattacks The chapter presents and analyzes security solutions to counteractvarious security threats It overviews typical attacks that could be carried outagainst the communication subsystem of an embedded system The authorsfocus on three main aspects: data protection for NoC-based systems, security
in NoC-based reconfigurable architectures, and protection from side-channelattacks
with an emphasis on the application of formal methods The authors formalize
Trang 8two dimensions of the NoC design space: the communication infrastructureand the communication paradigm as a functional model in the ACL2 logic Foreach essential design decision—topology, routing algorithm, and schedulingpolicy—a meta-model is given Meta-model properties and constraints areidentified to guarantee the overall correctness of the message delivery overthe NoC Results presented are general and thus application-independent.
To ensure correct message delivery on a particular NoC design, one has toinstantiate the meta-model with the specific topology, routing, and schedul-ing, and demonstrate that each one of these main instantiated functions sat-isfies the expected properties and constraints
their particular nature, NoCs are exposed to a range of faults that can cape the classic test procedures Among such faults: crosstalk, faults in thebuffers of the NoC routers, and higher-level faults such as packet misroutingand data scrambling These fault types add to the classic faults that must betested postfabrication for all ICs Moreover, an issue of concern in the case
es-of communication-intensive platforms, such as NoCs, is the integrity es-of thecommunication infrastructure By incorporating novel error correcting codes(ECC), it is possible to protect the NoC communication fabric against transienterrors and at the same time lower the energy dissipation
Network monitoring is the process of extracting information regarding theoperation of a network for purposes that range from management functions
to debugging and diagnostics NoC monitoring faces a number of challenges,including the volume of information to be monitored and the distributedoperation of the system The chapter details the objectives and opportuni-ties of network monitoring and the required interfaces to extract informationfrom the distributed monitor points It then describes the overall NoC mon-itoring architecture and the implementation issues of monitoring in NoCs,such as cost, the effects on the design process, etc A case study is presented,where several approaches to provide complete NoC monitoring services arediscussed
includ-ing dynamic and static power consumptions, and the energy model for NoCare studied The techniques for managing power and energy consumption
on NoC are discussed, starting with micro-architectural-level techniques, lowed by system-level power and energy optimizations Micro-architectural-level power-reduction methodologies are highlighted based on the powermodel for CMOS technology Parameters such as low-swing signaling, linkencoding, RTL optimization, multi-threshold voltage, buffer allocation, andperformance enhancement of a switch are investigated to reduce the powerconsumption of the network On the other hand, system-level approaches,such as dynamic voltage scaling (DVS), on–off links, topology selection, andapplication mapping, are addressed For each technique, recent efforts to solvethe power problem in NoC are presented To evaluate the dissipation of com-munication energy in NoC, energy models for each NoC component are used
Trang 9fol-Power modeling methodologies, which are capable of providing a cycleaccurate power profile and enable power exploration at the system level,are also introduced in this chapter.
clock-less NoC IP blocks that fit into the existing ASIC flows and are used forthe design and synthesis of CHAIN networks that meet the critical chal-
lenges in complex devices This chapter takes the reader on a guided tourthrough the steps involved in the design of an NoC-based system using theCHAINworks tool suite As part of this process, aspects of the vast range of
trade-offs possible in building an NoC-based design are investigated Also,some of the additional challenges and benefits of using a self-timed NoC toachieve true top-level asynchrony between endpoint blocks are highlighted
in this chapter
Interuniver-sity Microelectronics Center (IMEC), Leuven, Belgium in partnership withSamsung Electronics and Freescale, using Arteris NoC as communicationinfrastructure This MPSoC platform is dedicated to high-performance HDTVimage resolution, low-power, real-time video coding applications using state-of-the-art video encoding algorithms such as MPEG-4, AVC/H.264, and Scal-able Video Coding (SVC) The presented MPSoC platform is built using sixCoarse Grain Array ADRES processors, also developed at IMEC, four on-chip memory nodes, one external memory interface, one control processor,one node that handles input and output of the video stream, and Arteris NoC
as communication infrastructure The proposed MPSoC platform is designed
to be flexible, allowing easy implementation of different multimedia tions, and scalable to the future evolutions of video encoding standards andother mobile applications in general
applica-The editors would like to give special thanks to all authors who contributed
to this book Also, special thanks to Nora Konopka and Jill Jurgensen fromTaylor & Francis Group for their ongoing help and support
Fayez Gebali Haytham El-Miligi
M Watheq El-Kharashi
Victoria, BC, Canada
Trang 10Fayez Gebalireceived a B.Sc degree in electrical engineering (first class ors) from Cairo University, Cairo, Egypt, a B.Sc degree in applied mathemat-ics from Ain Shams University, Cairo, Egypt, and a Ph.D degree in electricalengineering from the University of British Columbia, Vancouver, BC, Canada,
hon-in 1972, 1974, and 1979, respectively For the Ph.D degree he was a holder of anNSERC postgraduate scholarship He is currently a professor in the Depart-ment of Electrical and Computer Engineering, University of Victoria, Victoria,
BC, Canada He joined the department at its inception in 1984, where he was
an assistant professor from 1984 to 1986, associate professor from 1986 to 1991,and professor from 1991 to the present Gebali is a registered professional en-gineer in the Province of British Columbia, Canada, since 1985 and a seniormember of the IEEE since 1983 His research interests include networks-on-chips, computer communications, computer arithmetic, computer security,parallel algorithms, processor array design for DSP, and optical holographicsystems
Engineering Department, University of Victoria, Victoria, BC, Canada, sinceJanuary 2006 His research interests include Networks-on-Chip (NoC) mod-eling, optimization, and performance analysis and reconfigurable Systems-on-Chip (SoC) design Elmiligi worked in the industry for four years as ahardware design engineer He also acted as an advisory committee memberfor the Wighton Engineering Product Development Fund (Spring 2008) at theUniversity of Victoria, a publication chair for the 2007 IEEE Pacific Rim Con-ference on Communications, Computers and Signal Processing (PACRIM’07),
Victoria, BC, Canada, and a reviewer for the International Journal of
Communi-cation Networks and Distributed Systems (IJCNDS), Journal of Circuits, Systems and Computers (JCSC), and Transactions on HiPEAC.
from the University of Victoria, Victoria, BC, Canada, in 2002, and B.Sc (firstclass honors) and M.Sc degrees in computer engineering from Ain ShamsUniversity, Cairo, Egypt, in 1992 and 1996, respectively He is currently anassociate professor in the Department of Computer and Systems Engineering,Ain Shams University, Cairo, Egypt and an adjunct assistant professor in theDepartment of Electrical and Computer Engineering, University of Victoria,Victoria, BC, Canada His research interests include advanced microprocessordesign, simulation, performance evaluation, and testability, Systems-on-Chip(SoC), Networks-on-Chip (NoC), and computer architecture and computernetworks education El-Kharashi has published about 70 papers in refereedinternational journals and conferences
xi
Trang 11Manchester Technology Centre
Manchester, United Kingdom
John.bainbridge@silistix.com
Alexandros Bartzas
VLSI Design and Testing Center
Department of Electrical
and Computer Engineering
Democritus University of Thrace
Thrace, Greece
ampartza@ee.duth.gr
Dominique Borrione
TIMA Laboratory, VDS Group
Grenoble Cedex, France
kornaros@epp.teiher.gr
Seung Eun Lee
The Henry Samueli School
of EngineeringUniversity of CaliforniaIrvine, California
seunglee@uci.edu
xiii
Trang 12Partha Pratim Pande
Washington State University
Pullman, Washington
pande@eecs.wsu.edu
Ioannis Papaeystathiou
Technical University of Crete
Kounoupidiana, Chania, Greece
sami@elet.polimi.it
Antoine Scherrer
Laboratoire de PhysiqueUniversit´e de LyonENS-Lyon, France
antoine.scherrer@ens-lyon.fr
Julien Schmaltz
Radboud University NijmegenInstitute for Computing andInformation SciencesHeijendaalseweg, The Netherlands
julien@cs.ru.nl
Trang 13and Computer Engineering
Democritus University of Thrace
dsoudris@ee.duth.gr
Diederik Verkest
Interuniversity MicroelectronicsCentre - IMEC
Leuven, Belgium
Diederik.Verkest@imec.be
Trang 14Three-Dimensional Networks-on-Chip
Architectures
Alexandros Bartzas, Kostas Siozios, and Dimitrios Soudris
CONTENTS
1.1 Introduction 1
1.2 Related Work 3
1.3 Alternative Vertical Interconnection Topologies 5
1.4 Overview of the Exploration Methodology 7
1.5 Evaluation—Experimental Results 9
1.5.1 Experimental Setup 9
1.5.2 Routing Procedure 12
1.5.3 Impact of Traffic Load 13
1.5.4 3D NoC Performance under Uniform Traffic 14
1.5.5 3D NoC Performance under Hotspot Traffic 16
1.5.6 3D NoC Performance under Transpose Traffic 19
1.5.7 Energy Dissipation Breakdown 19
1.5.8 Summary 22
1.6 Conclusions 23
Acknowledgments 23
References 24
1.1 Introduction
Future integrated systems will contain billions of transistors [1], composing tens to hundreds of IP cores These IP cores, implementing emerging complex multimedia and network applications, should be able to deliver rich multi-media and networking services An efficient cooperation among these IP cores (e.g., efficient data transfers) can be achieved through innovations of on-chip communication strategies
The design of such complex systems includes several challenges One chal-lenge is designing on-chip interconnection networks that efficiently connect the IP cores Another challenge is application mapping that makes efficient
1
Trang 15use of available hardware resources [2,3] An architecture that is able to modate such a high number of cores, satisfying the need for communicationand data transfers, is the networks-on-chip (NoC) architecture [4,5] For thesereasons NoC became a popular choice for designing the on-chip interconnect.The industry has initiated different NoC-based designs such as the ÆtherealNoC [6] from Philips, the STNoC [7] from STMicroelectronics, and an 80-coreNoC from Intel [8] The key design challenges of emerging NoC designs, aspresented by Ogras and Marculescu [9], are (a) the communication infras-tructure, (b) the communication paradigm selection, and (c) the applicationmapping optimization.
accom-The type of IP cores, as well as the topology and interconnection scheme,plays an important role in determining how efficiently an NoC will performfor a certain application or a set of applications Furthermore, the applicationfeatures (e.g., data transfers, communication, and computation needs) play
an equally important role in the overall performance of the NoC system Anoverview of the cost considerations for the design of NoCs is given by Bolotin
et al [10]
Up to now NoC designs were limited to two dimensions But emerging 3Dintegration technology exhibits two major advantages, namely, higher per-formance and smaller energy consumption [11] A survey of the existing 3Dfabrication technologies is presented by Beyne [12] The survey shows theavailable 3D interconnection architectures and illustrates the main researchissues in current and future 3D technologies Through process/integrationtechnology advances, it is feasible to design and manufacture NoCs that willexpand in the third dimension (3D NoCs) Thus, it is expected that 3D inte-gration will satisfy the demands of the emerging systems for scaling, perfor-mance, and functionality A considerable reduction in the number and length
of global interconnect using 3D integration is expected [13]
In this chapter, we present a methodology for designing alternative 3DNoC architectures We define 3D NoCs as architectures that use severalactive silicon planes Each plane is divided into a grid where 2D or 3D routermodules are placed The main objective of the methodology is to derive 3DNoC topologies with a mix of 2D and 3D routers and vertical link intercon-nection patterns that offer best performance for the given chip traffic The costfactors we consider are (i) energy consumption, (ii) average packet latency,and (iii) total switch block area We make comparisons with an NoC in whichall the routers are 3D ones We have employed and extended the Worm_SimNoC simulator [14], which is able to model these heterogeneous architecturesand simulate them, gathering information on their performance The hetero-geneous NoC architecture can be achieved using a combined implementation
of 2D and 3D routers in each layer
The rest of the chapter is organized as follows: In Section 1.2 the relatedwork is described In Section 1.3 we present the 3D NoC topologies under con-sideration, whereas in Section 1.4 the proposed methodology is introduced
In Section 1.5 the simulation process and the achieved results are presented.Finally, in Section 1.6 the conclusions are drawn and future work is outlined
Trang 161.2 Related Work
On-chip interconnection is a widely studied research field and good views are presented [15,16], which illustrate the various interconnectionschemes available for present ICs and emerging Multiprocessor Systems-on-Chip (MPSoC) architectures An NoC-based interconnection is able toprovide an efficient and scalable infrastructure, which is able to handle theincreased communication needs Lee et al [17] present a quantitative evalu-ation of 2D point-to-point, bus, and NoC interconnection approaches In thiswork, an MPEG-2 implementation is studied and it proved that the NoC-based solution scales very well in terms of area, performance, and powerconsumption
over-To evaluate NoC designs, a number of simulators has been developed,such as the Nostrum [18], Polaris [19], XPipes [20], and Worm_Sim [14], usingC++ and/or SystemC [21] To provide adequate input/stimuli to an NoCdesign, synthetic traffic is usually used Several synthetic traffic generatorshave been proposed in several texts [22–25] to provide adequate inputs toNoC simulators for evaluation and exploration of proposed designs
A methodology that synthesizes NoC architectures is proposed by Ogras,
Hu, and Marculescu [26] where long-range links are inserted on top of amesh network In this methodology, the NoC design is addressed using anapplication specific approach, but it is limited to two dimensions Li et al [27]presented a mesh-based 3D network-in-memory architecture, using a hybridNoC/bus interconnection fabric, to accommodate efficiently processors andL2 cache memories in 3D NoCs It is demonstrated that by using a 3D L2memory architecture, better results are achieved compared to 2D designs.Koyanagi et al [28] presented a 3D integration technique of vertical stack-ing and gluing of several wafers By utilizing this technology, the authorswere able to increase the connectivity while reducing the number of long in-terconnections A fabricated 3D shared memory is presented by Lee et al [29].The memory module has three planes and can perform wafer stacking usingthe following technologies: (i) formation of buried interconnection, (ii) mi-crobumps, (iii) wafer thinning, (iv) wafer alignment, and (v) wafer bonding.Another 3D integration scheme is proposed by Iwata et al [30], where wirelessinterconnections are employed to offer connectivity
An overview of the available interconnect solutions for Systems-on-Chip(SoC) are presented by Meindl [31] This study includes interconnects for 3DICs and shows that 3D integration reduces the length of the longest globalinterconnects [32] and reduces the total required wire length, and thus thedissipated energy [33]
Benkart et al [34] presented an overview of the 3D chip stacking technologyusing throughchip interconnects In their work, the trade-off between the highnumber of vertical interconnects versus the circuit density is highlighted.Furthermore, Davis et al [35] show the implementation of an FFT in a 3D ICachieving 33% reduction in maximum wire length, thereby proving that the
Trang 17move to 3D ICs is beneficial However, the heat dissipation is highlighted asone of the limiting factors.
The placement and routing in 3D integrated circuits are studied by Ababei
et al [36] Also, a system on package solution for 3D network is presented
by Lim [37] However, the heat dissipation of 3D circuits remains a big lenge [38] To tackle this challenge, several analysis techniques have beenproposed [39–41] One approach is to perform thermal-aware placement andmapping for 3D NoCs, such as the work presented by Quaye [42] Further-more, the insertion of thermal vias can lower the chip temperature as illus-trated in several texts [43,44]
chal-A generalized NoC router model is presented; based on that, Ogras andMarculescu performed NoC performance analysis Using the aforementionedrouter model, it is feasible to perform NoC evaluation, which is significantlyfaster than performing simulation Additionally, Pande et al [46] presented
an evaluation methodology to compare the performance and other metrics
of a variety of NoC architectures But, this comparison is made only among2D NoC architectures The work of Feero and Pande [47] extended the afore-mentioned work considering 3D NoCs, and illustrated that the 3D NoCs areadvantageous when compared to 2D ones (with both having the same number
of components in total) It is demonstrated that besides reducing the footprint
in a fabricated design, 3D network structures provide a better performancecompared to traditional 2D architectures This work shows that despite thecost of a small area penalty, 3D NoCs achieve significant gains in terms ofenergy, latency, and throughput
Pavlidis and Friedman [48] presented and evaluated various 3D NoCtopologies They also proposed an analytic model for 3D NoCs where a meshtopology is considered under a zero-load latency Kim et al [49] presented
an exploration of communication architectures on 3D NoCs A dimensionallydecomposed router and its comparison with a hop-by-hop router connectionand hybrid NoC-bus architecture is presented The aforementioned works,both from the physical level as well as adding more communication archi-tectures, such as full 3D crossbar and bus-based communication, are comple-mentary to the one presented here and can be used for the extension of themethodology
The main difference between the related work and the one presented here
is that we do not assume full vertical interconnection (as shown inFigure 1.1),but rather a heterogeneous interconnection fabric, composed of a mix of3D and 2D routers An additional motivation for this heterogeneous design
is not only for the reduction of total interconnection network length, butalso to get the reduced size of the 2D routers when compared to the 3Dones [47] Reducing the number of vertical interconnection links simplifiesthe fabrication of the design and frees up more active chip area for availablelogic/memory blocks Two-dimensional routers are routers that have con-nections with neighboring ones of the same grid By comparison, a 3D routerhas direct, hop-by-hop connections with neighboring routers belonging to thesame grid and those belonging to the adjacent planes This difference between
Trang 18(a) Full vertical interconnection (100%)
for a 3D NoC.
(b) Uniform distribution of vertical links.
(c) Positioning of vertical links at the
center of the NoC.
(d) Positioning of vertical links at the periphery of the NoC.
3D Router Y
3D Router Y
Z
X
2D Router
3D Router Y
1.3 Alternative Vertical Interconnection Topologies
We consider four different groups of interconnection patterns, as well as 10vertical interconnection topologies in the context of this work Consider a 3D
NoC composed of Z 2D active silicon planes Each 2D plane has dimensions
Trang 19X × Y We also denote 0 ≤ K ≤ 100 as the percentage of the routers that have connections in the vertical direction (called 3D routers) The available
scenarios of how these 3D routers can be placed on a grid in each plane are
as follows:
1 Uniform: 3D routers are uniformly distributed over the different
planes Using this scheme, we “spread” the 3D routers along everyplane of the 3D NoC To find the place of each router we work likethis:
• Place the first 3D router at position (0, 0, z) where z = 0, 1, · · · ,
Z− 1
• Place the four neighboring 2D routers in the positions (x+r +1,
y, z), (x − r − 1, y, z), (x, y + r + 1, z), and (x, y − r − 1, z) The step size r is defined as:
r= 1
r represents the number of 2D routers between consecutive 3D
ones This scheme is illustrated inFigure 1.1(b), showing one
plane of a 3D NoC, with K = 25% and r = 3.
2 Center: All the 3D routers are positioned at the center of each plane,
as shown in Figure 1.1(c) Because the 3D routers are located in thecenter of the plane, the 2D routers are distributed in the outer region
of the NoC grid, connecting only to the neighboring routers of thesame plane
3 Periphery: The 3D routers are positioned at the periphery of each
plane [as shown in Figure 1.1(d)] In this case, the NoC is focused
on serving best the communication needs of the outer cores
4 Full custom: The position of the 3D routers is fully customized
matching the needs of the application with the NoC architecture.This solution fits best the needs of the application, while it mini-mizes the number of 3D routers However, derivation of a full cus-tom solution requires high design time, because this exploration isgoing to be performed for every application Furthermore, this willcreate a nonregular design that will not adjust well to the potentialchange of functionality, the number of applications that are going
to be executed, etc
The aforementioned patterns are based on the 3D FPGAs work presented
by Siozios et al [50] To perform an exploration toward full customized connection schemes, real applications and/or application traces are needed
inter-In this chapter, we adopt various types of synthetic traffic, so the explorationfor full customized interconnections schemes is out of the scope More specif-ically, we focus on pattern-based vertical interconnection topologies (cate-gories 1–3) We consider 10 different vertical link interconnection topologies.For each of these topologies, the number of 3D routers is given and the value
Trang 20of K given in parentheses For a 4×4×4 NoC architecture we use the notation
64 (K ).
• Full: Where all the routers of the NoC are 3D ones [number of 3D
routers: 64 (100%)]
• Uniform based: Pattern-based topologies withr value equal to three
[by_three pattern, as shown inFigure 1.1(b)], four (by_four), and five (by_five) Correspondingly, the number of 3D routers is 44 (68.75%),
48 (75%), and 52 (81.25%)
• Odd: In this pattern, all the routers belonging to the same row are
of the same type Two adjacent rows never have the same type ofrouter [number of 3D routers: 32 (50%)]
• Edges: Where the center (dimensionsx×y) of the 3D NoC has only2D routers [number of 3D routers: 48 (75%)]
• Center: Where only the center (dimensionsx×y) of the 3D NoChas 3D routers [number of 3D routers: 16 (25%)]
• Side based: Where a side (e.g., outer row) of each plane has 2D
routers Patterns evaluated have one (one_side), two (two_side), or three (three_side) sides as “2D routers only.” The number of 3D
routers for each pattern is 48 (75%), 36 (56.25%), and 24 (37.5%),respectively
Each of the aforementioned vertical interconnection schemes has tages and disadvantages These schemes perform on the basis of the behavior
advan-of the applications that are implemented on the NoC Experimental results inSection 1.5 show that a wrong choice may diminish the gains of using a 3Darchitecture
1.4 Overview of the Exploration Methodology
An overview of the proposed methodology is shown in Figure 1.2 To form the exploration of alternative topologies for 3D NoC architectures, theWorm_Sim NoC simulator [14], which utilizes wormhole switching, isused [51] (this is the center block inFigure 1.2)
per-To support 3D architectures/topologies, we have extended this tor to adapt to the provided routing schemes, and be compatible with theTrident traffic format [23] As shown in Figure 1.2, the simulator now sup-ports 3D NoC architectures (3D mesh and 3D torus, as shown inFigure 1.3)and vertical link interconnection patterns Each of these 3D architectures iscomposed of many grids, and each grid is composed of tiles that are con-nected to each other using mesh or torus interconnection networks Eachtile is composed of a processing core and a router Because we are consider-ing 3D architectures, the router is connected to the neighboring tiles and its
Trang 21simula-3D NoC Architectures
Existing tools Extensions Output of new tools
NoC Simulator
Real application traffic
Vertical link interconnection patterns
An overview of the exploration methodology of alternative topologies for 3D Networks-on-Chip.
local processing core via channels, consisting of bidirectional point-to-pointlinks
The NoC simulator can be configured using the following parameters(shown in Figure 1.2):
1 The NoC architecture (2D or 3D mesh or torus) as well as defining
the specific x, y, and z parameters
2 The type of input traffic (uniform, transpose, or hotspot) as well ashow heavy the traffic load will be
Legend:
Link to upper layer Link to lower layer
FIGURE 1.3
3D NoC architectures.
Trang 223 The routing scheme
4 The vertical link configuration file, which defines the locations ofthe vertical links
5 The router model as well as the models used to calculate the energyand delay figures
The output of the simulation is a log file that contains the relevant ated cost factors, such as overall latency, average latency per packet, and theenergy breakdown of the NoC, providing values for link energy consump-tion, crossbar and router energy consumption, etc From these energy figures,
evalu-we calculate the total energy consumption of the 3D NoCs
The 3D architectures to be explored may have a mix of 2D and 3D routers,ranging from very few 3D routers to only 3D routers To steer the exploration,
we use different patterns (as presented in Section 1.3) The proposed 3D NoCscan be constructed by placing a number of identical 2D NoCs on individualplanes, providing communication by interplane vias among vertically adja-cent routers This means that the position of silicon vias is exactly the same foreach plane Hence, the router configuration is extended to the third dimen-sion, whereas the structure of the individual logic blocks (IP cores) remainsunchanged
1.5 Evaluation—Experimental Results
The main objective of the methodology and the exploration process is to findalternative irregular 3D NoC topologies with a mix of 2D and 3D routers Thenew topologies exhibit vertical link interconnection patterns that acquire thebest performance Our primary cost function is the energy consumption, withthe other cost factors being the average packet latency and total switch blockarea We compare these patterns against the fully vertically interconnected3D NoC as well as the 2D one (all having the same number of nodes)
1.5.1 Experimental Setup
The 3D router used here has a 7× 7 crossbar switch, whereas the 2D routeruses a 5× 5 crossbar switch Additionally, each router has a routing table andbased on the source/destination address, the routing table decides whichoutput link the outgoing packet should use The routing table is built usingthe algorithm described inFigure 1.4
The NoC simulator uses the Ebit energy model, proposed by Benini and
de Micheli [52] We make the assumption (based on the work presented by
Reif et al [53]) that the vertical communication links between the planes are
elec-trically equivalent to horizontal routing tracks with the same length Based on this
assumption, the energy consumption of a vertical link between two routers
Trang 231: functionR OUTING XYZ
4:
5: findCoordinates();//returns src.x, src.y, src.z, dst.x, dst.y and dst.z
6:
7: for allplane ∈ NoC do
9: findTmpDestination();//find a temporary destination of the packet for each plane
of the NoC that the packet passes from
21: for allvalid Nodes ∈ plane do
the vertical interconnections patterns input file.
with the smallest Manhattan distance
Routing algorithm modifications (// denotes a comment in the algorithm)
equals the consumption of a link between two neighboring routers at the same
plane (if they have the same length).
More specifically because the 3D integration technology, which providescommunication among layers using through-silicon vias (TSVs), has not beenexplored sufficiently yet, 3D-based systems design still needs to be addressed.Due to the large variation of the 3D TSV parameters, such as diameter, length,dielectric thickness, and fill material among alternative process technolo-gies, a wide range of measured resistances, capacitances, and inductanceshave been reported in the literature Typical values for the size (diameter) ofTSVs is about 4× 4μm, with a minimum pitch around 8–10μm, whereastheir total length starting from plane T1 and terminating on plane T3 is17.94 μm, implying wafer thinning of planes T2 and T3 to approximately10–15μm [54–56]
The different TSV fabrication processes lead to a high variation in the responding electrical characteristics The resistance of a single 3D via variesfrom 20 m to as high as 600 m [55,56], with a feasible value (in terms of
cor-fabrication) around 30 m Regarding the capacitances of these vias, their
Trang 24values vary from 40 fF to over 1 pF [57], with feasible value for fabrication
to be around 180 fF In the context of this work, we assume a resistance of
350 m and a capacitance of 2.5 fF.
Using our extended version of the NoC simulator, we have performedsimulations involving a 64-node and a 144-node architecture with 3D meshand torus topologies with synthetic traffic patterns The configuration filesdescribing the corresponding link patterns are supplied to the simulator as
an input The sizes of the 3D NoCs we simulated were 4× 4 × 4 and 6 × 6 × 4,whereas the equivalent 2D ones were 8× 8 and 12 × 12 We have used threetypes of input (synthetic traffic) and three traffic loads (heavy, normal, andlow) The traffic schemes used are as follows:
• Uniform: Where we have uniform distribution of the traffic across
the NoC with the nodes receiving approximately the same number
of packets
• Transpose: In this traffic scheme, packets originating from node
(a , b, c) is destined to node (X − a, Y − b, Z − c), where X, Y, and
Z are the dimensions of the 3D NoC.
• Hotspot: Where some nodes (a minority) receive more packets than
the majority of the nodes The hotspot nodes in the 2D grids arepositioned in the middle of every quadrant, where the size of thequadrant is specified by the dimensions of each plane in the 3D NoCarchitecture under simulation, whereas in the 3D NoC, a hotspot islocated in the middle of each plane
We have used the three routing schemes presented in Worm_Sim [14], andextended them in order to function in a 3D NoC as follows:
• XYZ-old: Which is an extended version of XY routing.
• XYZ: Which is based on XY routing but routes the packet along the
direction with least delay
• Odd-even: Which is the odd-even routing scheme presented by
Chiu [58] In this scheme, the packets take some turns in order toavoid deadlock situations
From the simulations performed, we have extracted figures regarding theenergy consumption (in joules) and the average packet latency (in clockcycles) Additionally, for each vertical interconnection pattern, as well as forthe 2D NoC, we calculated the occupied area of the switching block, based onthe gate equivalent of the switching fabric presented by Feero and Pande [47]
A good design is the one that exhibits lower values in the aforementioned rics when compared to the 2D NoC as well as to the 3D NoC which has fullvertical connectivity (all the routers are 3D ones) Furthermore, all the simu-lation measurements were taken for the same number of operational cycles(200,000 cycles)
Trang 25met-1.5.2 Routing Procedure
To route packets over the 3D topologies, we modified the routing procedure,
as shown inFigure 1.4 The modified routing procedure is valid for all routingschemes This modification allows customization of the routing scheme toefficiently cope with the heterogeneous topologies, based on vertical linkconnectivity patterns
The steps of the routing algorithm are as follows:
1 For each packet, we know the source and destination nodes and canfind the positions of these nodes in the topology The on-chip “coor-dinates” of the nodes for the destination one aredst.x, dst.y,
2 By doing so, we can formulate the temporary destinations, one for
each plane For the number of planes a packet has to traverse toarrive at its final destination, the algorithm initially sets the route
to a temporary destination located at positiondst.x, dst.y,
the packet is going to follow across the planes (i.e., if it is going to
an upper or lower plane according to its “source” plane) and finds
the nearest valid link at each plane This outputs, as an outcome
to update properly, thezcoefficient of the temporary destination’s
position Valid link is every vertical interconnection link available in
the plane in which the packet traverses This information is obtainedfrom the vertical interconnection patterns file A link is uniquelyidentified by the node that is connected and its direction So, forall the specified valid links that are located at the same plane, theheader flit of the packet checks if the desired route is matched to thedestination up or down link
3 If there is no match between them, compute the Manhattan distance(in case of 3D torus topology, we have modified it to produce thecorrect Manhattan distance between the two nodes)
4 Finally, the valid link with the smallest Manhattan distance is sen, and its corresponding node is chosen to be the temporary des-tination at each plane the packet is going to traverse
cho-5 After finding a set of temporary destinations (each one located at adifferent plane), they are stored into the header flit of the packet Theaforementioned temporary destinations may or may not be used,
as the packet is being routed during the simulation, so they are
“candidate” temporary destinations The decision of being just acandidate or the actual destination per plane is taken based on one
of two scenarios: (1) if a set of vertical links, which exhibited tively high utilization during a previous simulation with the samenetwork parameters, achieved the desired minimum link commu-nication volume or (2) according to a given vertical link pattern such
rela-as the one presented in Section 1.1
Trang 26The modification of the algorithm essentially checks if a vertical link exists
in the temporary destination of the packet, otherwise the closest router withsuch a link is chosen Thus the routing complexity is kept low
1.5.3 Impact of Traffic Load
Three different traffic loads were used (heavy, medium/normal, low) In thisway, by altering the packet generation rate, it is possible to test the perfor-mance of the NoC The heavy load has 50% increased traffic, whereas the lowone has 90% decreased traffic compared to the medium load, respectively.The behavior of the NoCs in terms of the average packet latency is shown
in Figure 1.5 In this figure, the latency is normalized to the average packet
latency of the full_connectivity 3D NoC under medium load and for each traffic
scheme The impact of the traffic load (latency increases as the load increases)
can be observed, and also we can see that NoCs can cope with the increasedtraffic as well as the differences between different traffic schemes
Mesh topologies exhibit similar behavior, though the latency figures arehigher due to the decreased connectivity when compared to torus topologies.This is shown inFigure 1.6where the latency of 64-node mesh and torus NoCsare compared (the basis for the latency normalization is the average packet
latency of the full_connectivity 3D torus) From this comparison, it is shown
that the mesh topologies have an increased packet latency of 34% compared
to the torus ones (for the same traffic scheme, load, and routing algorithm)
90%
8×8/torus by_five by_four by_three center edg
es odd one_side three_side two_side
full_connectivity
Latency behavior for 64-node NoCs(torus topology, xyz routing)
Hotspot (heavy) Hotspot (normal) Hotspot (low) Transpose (heavy) Transpose (normal) Transpose (low) Uniform (heavy) Uniform (normal) Uniform (low)
FIGURE 1.5
Impact of traffic load on 2D and 3D NoCs (for all different types of traffic used).
Trang 278×8/mesh by_five by_four by_three center edges
odd one_
side three_side two_side
Latency behavior for 64-node mesh and torus NoCs
(uniform traffic, xyz routing)
Mesh (heavy uniform) Mesh (medium uniform) Mesh (low uniform) Torus (heavy uniform) Torus (medium uniform) Torus (low uniform)
FIGURE 1.6
Impact of traffic load on 2D and 3D mesh and torus NoCs (for uniform traffic).
1.5.4 3D NoC Performance under Uniform Traffic
to 3D mesh networks by using uniform traffic, medium load, and xyz-old
routing We compared the total energy consumption, average packet latency,total area of the switching blocks (routers), and the percentage of 2D routers(having 5 I/O ports instead of 7) under 4× 4 × 4 [Figure 1.7(a)] and 6 × 6 × 4
[Figure 1.7(b)] mesh architectures In the x-axis all the interconnection patterns are presented In the y-axis, and in a normalized manner (used as the basis
for the figures of the full vertically interconnected 3D NoC), the cost factorsfor total energy consumption, average packet latency, total switching blockarea, and percentage of vertical links are presented
The advantages of 3D NoCs when compared to 2D ones are shown inFigure 1.7(a) In this case, the 8× 8 mesh dissipates 39% more energy and has29% higher packet delivery latency However, the switching area is 71% of thearea of the fully interconnected 3D NoC because all its routers are 2D ones
Employing the by_five link pattern results in 3% reduction in energy and 5%
increase in latency In this pattern, only 81% of the routers are 3D ones so thearea of the switching logic is reduced by 5% (when compared to the area of thefully interconnected 3D NoC) Figure 1.7(b) shows that more patterns exhibit
Trang 28(a) Experimental results for a 4 × 4 × 4 3D mesh.
(b) Experimental results for a 6 × 6 × 4 3D mesh.
#Links
12×12/mesh
by_five by_four by_three center edges
odd one_side three_side two_side
Trang 29better results It is worth noticing that the overall performance of the 2D NoCsignificantly decreases, exhibiting around 50% increase in energy and latency.When we increase the traffic load by increasing the packet generation rate
by 50%, we see that all patterns have worse behavior than the full_connectivity
3D NoC The reason is that by using a pattern-based 3D NoC, we decreasethe number of 3D routers by decreasing the number of vertical links, therebyreducing the connectivity within the NoC As expected, this reduced connec-tivity has a negative impact in cases where there is an increased traffic.For low traffic load NoC, the patterns can become beneficial because there
is not that high need for communication resources This effect is illustrated in
and 3D NoCs under low uniform traffic and xyz routing The exception is the
edges pattern in the 64-node 3D NoC [Figure 1.8(a)], where all the 3D routers
reside on the edges of each plane of the 3D NoC This results in a 7% increase
in the packet latency Again it is worth noticing that as the NoC dimensionsincrease, the performance of the 2D NoC decreases This can be clearly seen
in Figure 1.8(b), where the 2D NoC has 38% increased energy dissipation
We have also compared the performance of the proposed approach againstthat achievable with a torus network, which provides wraparound linksadded in a systematic manner Note that the vertical links connecting thebottom with the upper planes are not removed, as this is the additional fea-ture of the torus topology when compared to the mesh Our simulationsshow that using the transpose traffic scheme, the vertical link patterns exhibitnotable results; this pattern continues as the dimensions of the NoC get bigger.The explanation is that the flow of packets between a source and a destina-tion follows a diagonal course among the nodes at each plane At the sametime, the wraparound links of the torus topology play a significant role inpreserving the performance even when some vertical links are removed Theresults show that increasing the dimensions of the NoC increases the energysavings, when the link patterns are applied But, this is not true for the case ofmesh topology In particular, in the 6× 6 × 4 3D torus architecture, using the
by_five, by_four, by_three, one_side, and two_side patterns show better results as
far as the energy consumption is concerned For instance, the two_side pattern
exhibits 7.5% energy savings and 32.84 cycles increased latency relative to the
30 cycles of the fully vertical connected 3D torus topology
1.5.5 3D NoC Performance under Hotspot Traffic
In the case of hotspot traffic (Figure 1.9), testing the 4× 4 × 4 3D mesh tecture, seven out of the nine link patterns perform better relative to the fully
archi-vertically connected topology For instance, the two_side pattern exhibits 2%
decrease in network energy consumption, whereas the increase in latency is2.5 cycles Note that only 56.25% of the vertical links are present The hotspottraffic in 3D mesh topologies favors cube topologies (e.g., 6× 6 × 6) Even so,
in 6× 6 × 4 mesh architecture, the center and two_side patterns exhibit similar
performance regarding average cycles per packet compared to that of fully
Trang 308×8/mesh by_five by_four by_three center edg
es odd one_
side three_side two_side
full_connectivi ty
(a) Experimental results for a 4 × 4 × 4 3D mesh.
#Links
12×12/mesh
by_five by_four by_three center edges
odd one_side three_side two_side
full_connectivi ty
(b) Experimental results for a 6 × 6 × 4 3D mesh.
FIGURE 1.8
Uniform traffic (low load) on a 3D NoC for alternative interconnection topologies.
Trang 318×8/mesh by_five by_four by_three center edg
es odd one_side three_side two_side
full_connectivi ty
(a) Experimental results for a 4 × 4 × 4 3D mesh.
#Links
12×12/mesh
by_five by_four by_three center edges
odd one_side three_side two_side
full_connectivity (b) Experimental results for a 6 × 6 × 4 3D mesh.
FIGURE 1.9
Hotspot traffic (low load) on a 3D NoC for alternative interconnection topologies.
Trang 32vertical connected architecture (that was expected due to the location wherethe hotspot nodes were positioned).
triggered by a hotspot-type traffic are presented Figures 1.10(a) and 1.10(b)present the results for the mesh and torus architectures, respectively, showinggains in energy consumption and area, with a negligible penalty in latency.Again, the architectures where congestion is experienced are highlighted.These results are also compared to their equivalent 2D architectures Forthe 8×8 2D NoC (same number of cores as the 4×4 × 4 architecture), it shows25% increased latency and 40% increased energy consumption compared to
the one_side link pattern, whereas the 12× 12 mesh (same number of cores asthe 6× 6 × 4 architecture) shows 46% increase in latency and 49% increase
in energy consumption compared to the same pattern using uniform traffic
In addition, comparing the by_four pattern on the 64-node architecture under
transpose traffic shows 31% and 18% reduced latency and total network sumption, respectively However, in the case of hotspot traffic and employing
con-the two_side link pattern, con-these numbers change to 24% reduced latency and
56% reduced energy consumption
1.5.6 3D NoC Performance under Transpose Traffic
Under the transpose traffic scheme, the by_four link pattern adopted shows
6.5% decrease in total network energy consumption at the expense of 3 cyclesincreased latency InFigure 1.11, the simulation results for the 3D 4× 4 × 4mesh and 6× 6 × 4 torus NoCs are presented for transpose traffic In Figure1.11(a), we can see that we have a 4% gain in the energy consumption of the3D NoCs with a 5% increase in the packet latency Additionally, we gain 6%
in the area occupied by the switching blocks of the NoC Comparing thesepatterns to the 2D NoC (having the same number of nodes) we can have
on average a 14% decrease in energy consumption, a 33% decrease in totalpacket latency But on the area, the cost of the 3D NoC is higher by 23%
In Figure 1.11(b), we can see that the 2D NoC experiences traffic contentionand not being able to cope with that amount of traffic (the actual value of thelatency is close to 5000 cycles per packet) Additionally, 47% gains achieved inenergy consumption When this torus architecture is compared to the “full”3D one, it shows 5% gains in energy consumption with 8% increased latencyand 9% reduced switching block area
1.5.7 Energy Dissipation Breakdown
The analytical results of the Ebit [52] energy model indicate that, when ing to 3D architectures, the energy consumption of the links, crossbars, ar-biters, and buffer read energy decreases, whereas there is an increase in theenergy consumed when writing to the buffer and taking the routing decisions
move-On average, the link energy consumption accounts for 8% of the totalenergy, the crossbar 6%, the buffer’s read energy 23%, and the buffer’s write
Trang 33full_connectivity (b) Experimental results for a 4 × 4 × 4 3D torus.
FIGURE 1.10
Hotspot traffic (medium load) on a 3D NoC for alternative interconnection topologies.
Trang 348×8/mesh by_five by_four by_three center edges
odd one_side three_side two_side
full_connectivity Congestion
(b) Experimental results for a 6 × 6 × 4 3D torus.
FIGURE 1.11
Transpose traffic on a 3D NoC for alternative interconnection topologies.
Trang 35Link Crossbar Router Arbiter Buffer Read Buffer Write
Energy Dissipation Breakdown
8x8/mesh by_five by_four by_three center edges odd one_side three_side two_side full_connectivity
in the first column The next two columns present the gains [min to max values(in%)] for the energy dissipation The fourth and fifth columns show the min
to max values for the average packet latency, respectively It can been seen thatenergy reduction up to 29% can be achieved But gains in energy dissipation
TABLE 1.1
Experimental Results: Min-Max Impact on Costs
(Energy and Latency) with Medium Traffic Load
Trang 36cannot be reached without paying a penalty in average packet latency It isthe responsibility of the designer, utilizing this exploration methodology, tochoose a 3D NoC topology and vertical interconnection patterns that bestmeet the requirements of the system.
1.6 Conclusions
Networks-on-Chips are becoming more and more popular as a solution able
to accommodate large numbers of IP cores, offering an efficient and scalableinterconnection network Three-dimensional NoCs are taking advantage ofthe progress of integration and packaging technologies offering advantageswhen compared to 2D ones Existing 3D NoCs assume that every router of agrid can communicate directly with the neighboring routers of the same gridand with the ones of the adjacent planes This communication can be achieved
by employing wire bonding, microbumb, or through-silicon vias [35].All of these technologies have their advantages and disadvantages Reduc-ing the number of vertical connections makes the design and final fabrication
of 3D systems easier The goal of the proposed methodology is to find erogeneous 3D NoC topologies with a mix of 2D and 3D routers and verticallink interconnection patterns that performs best to the incoming traffic Inthis way, the exploration process evaluates the incoming traffic and the in-terconnection network, proposing an incoming traffic-specific alternative 3DNoC Aiming in this direction, we have presented a methodology that shows
het-by employing an alternative 3D NoC vertical link interconnection network,
in essence proposing an NoC with less vertical links, we can achieve gains inenergy consumption (up to 29%), in the average packet latency (up to 2%),and in the area occupied by the routers of the NoC (up to 18%)
Extensions of this work could include not only more heterogeneous 3Darchitectures but also different router architectures, providing better adap-tive routing algorithms and performing further customizations targeting het-erogeneous NoC architectures In this way it would be able to create evenmore heterogeneous 3D NoCs For providing stimuli to the NoCs, a movetoward using real applications would be useful apart from using even moretypes of synthetic traffic By doing so, it would become feasible to proposeapplication-domain-specific 3D NoC architectures
Acknowledgments
The authors would like to thank Dr Antonis Papanikolaou (IMEC vzw.,Belgium) for his helpful comments and suggestions This research is sup-ported by the 03ED593 research project, implemented within the framework
Trang 37of the “Reinforcement Program of Human Research Manpower” (PENED)and cofinanced by national and community funds (75% from EuropeanUnion—European Social Fund and 25% from the Greek Ministry ofDevelopment—General Secretariat of Research and Technology).
References
1 Semiconductor Industry Association, “International technology roadmapfor semiconductors,” 2006 [Online] Available: http://www.itrs.net/Links/
2 S Murali and G D Micheli, “Bandwidth-constrained mapping of cores onto
NoC architectures,” In Proc of DATE Washington, DC: IEEE Computer Society,
2004, 896–901
3 J Hu and R Marculescu, “Energy- and performance-aware mapping for regular
NoC architectures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24 (2005) (4): 551–562.
4 L Benini and G de Micheli, “Networks on chips: a new SoC paradigm,”
Computer 35 (2002) (1): 70–78.
5 A Jantsch and H Tenhunen, eds., Networks on Chip New York: Kluwer Academic
Publishers, 2003
6 K Goossens, J Dielissen, and A Radulescu, “The Æthereal network on chip:
Concepts, architectures, and implementations,” IEEE Des Test, 22 (2005) (5):
414–421
7 STMicroelectronics, “STNoC: Building a new system-on-chip paradigm,” WhitePaper, 2005
8 S Vangal, J Howard, G Ruhl, S Dighe, H Wilson, J Tschanz, D Finan, et al.,
“An 80-tile 1.28 TFLOPS network-on-chip in 65nm CMOS,” In Proc of tional Solid-State Circuits Conference (ISSCC) IEEE, 2007, 98–589.
Interna-9 U Ogras and R Marculescu, “Application-specific network-on-chip architecture
customization via long-range link insertion,” In Proc of ICCAD (6–10 Nov.) 2005,
246–253
10 E Bolotin, I Cidon, R Ginosar, and A Kolodny, “Cost considerations in network
on chip,” Integr VLSI J 38 (2004) (1): 19–42.
11 E Beyne, “3D system integration technologies,” In International Symposium on VLSI Technology, Systems, and Applications, Hsinchu, Taiwan, April 2006, 1–9.
12 ——, “The rise of the 3rd dimension for system integration,” In Proc of tional Interconnect Technology Conference, Burlingame, CA 5–7 June, 2006, 1–5.
Interna-13 J Joyner, R Venkatesan, P Zarkesh-Ha, J Davis, and J Meindl, “Impact of
three-dimensional architectures on interconnects in gigascale integration,” IEEE actions on Very Large Scale Integration (VLSI) Systems, 9 (Dec 2001) (6): 922–928.
Trans-14 R Marculescu, U Y Ogras, and N H Zamora, “Computation and tion refinement for multiprocessor SoC design: A system-level perspective,” In
communica-Proc of DAC New York: ACM Press, 2004, 564–592.
15 J Duato, S Yalamanchili, and N Lionel, Interconnection Networks: An Engineering Approach San Francisco, CA: Morgan Kaufmann Publishers Inc., 2002.
16 W Dally and B Towles, Principles and Practices of Interconnection Networks.
San Francisco, CA: Morgan Kaufmann Publishers Inc., 2003
Trang 3817 H G Lee, N Chang, U Y Ogras, and R Marculescu, “On-chip communicationarchitecture exploration: A quantitative evaluation of point-to-point, bus, and
network-on-chip approaches,” ACM Trans Des Autom Electron Syst., 12 (2007)
(3): 23
18 Z Lu, R Thid, M Millberg, E Nilsson, and A Jantsch, “NNSE: Nostrum
network-on-chip simulation environment,” In Proc of SSoCC, April 2005.
19 V Soteriou, N Eisley, H Wang, B Li, and L.-S Peh, “Polaris: A system-level
roadmap for on-chip interconnection networks,” In Proc of ICCD, October 2006.
[Online] Available:http://www.gigascale.org/pubs/930.html
20 M Dall’Osso, G Biccari, L Giovannini, D Bertozzi, and L Benini, “xPipes:
a latency insensitive parameterized network-on-chip architecture for
multi-processor SoCs,” In Proc of ICCD IEEE Computer Society, 2003.
21 Open SystemC Initiative, IEEE Std 1666-2005: IEEE Standard SystemC Language Reference Manual IEEE Computer Society, March 2006.
22 V Puente, J Gregorio, and R Beivide, “SICOSYS: An integrated frameworkfor studying interconnection network performance in multiprocessor systems,”
In Proc of 10th Euromicro Workshop on Parallel, Distributed and Network-Based Processing, 2002, 15–22.
23 V Soteriou, H Wang, and L.-S Peh, “A statistical traffic model for on-chip
inter-connection networks,” In Proc of MASCOTS Washington, DC: IEEE Computer
Society, 2006, 104–116
24 W Heirman, J Dambre, and J V Campenhout, “Synthetic traffic generation as
a tool for dynamic interconnect evaluation,” In Proc of SLIP New York: ACM
Press, 2007, 65–72
25 F Ridruejo and J Miguel-Alonso, “INSEE: An interconnection network
sim-ulation and evaluation environment,” In Proc of Euro-Par Parallel Processing,
3648/2005 Berlin: Springer, 2005, 1014–1023
26 U Y Ogras, J Hu, and R Marculescu, “Key research problems in NoC design:
A holistic perspective,” In Proc of CODES+ISSS, 2005, 69–74.
27 F Li, C Nicopoulos, T Richardson, Y Xie, V Narayanan, and M Kandemir,
“Design and management of 3D chip multiprocessors using
network-in-memory,” In Proc of ISCA Washington, DC: IEEE Computer Society, 2006,
130–141
28 M Koyanagi, H Kurino, K W Lee, K Sakuma, N Miyakawa, and H Itani,
“Future system-on-silicon lsi chips,” IEEE Micro 18 (1998) (4): 17–22.
29 K Lee, T Nakamura, T Ono, Y Yamada, T Mizukusa, H Hashimoto, K Park,
H Kurino, and M Koyanagi, “Three-dimensional shared memory fabricated
using wafer stacking technology,” IEDM Technical Digest, Electron Devices
Meeting (2000) 165–168
30 A Iwata, M Sasaki, T Kikkawa, S Kameda, H Ando, K Kimoto, D Arizono,and H Sunami, “A 3D integration scheme utilizing wireless interconnectionsfor implementing hyper brains,” 2005
31 J Meindl, “Interconnect opportunities for gigascale integration,” IEEE Micro
23 (IEEE Computer Society Press, May/June 2003) (3): 28–35
32 J Joyner, P Zarkesh-Ha, J Davis, and J Meindl, “A three-dimensional stochastic
wire-length distribution for variable separation of strata,” In Proc of the IEEE
2000 International Interconnect Technology Conference IEEE, 2000, 126–128.
33 J Joyner and J Meindl, “Opportunities for reduced power dissipation using
three-dimensional integration,” In Proc of the IEEE 2002 International Interconnect Technology Conference IEEE, 2002, 148–150.
Trang 3934 P Benkart, A Kaiser, A Munding, M Bschorr, H.-J Pfleiderer, E Kohn,
A Heittmann, H Huebner, and U Ramacher, “3D chip stack technology
using through-chip interconnects,” IEEE Des Test 22 (2005) (6): 512–518.
35 W R Davis, J Wilson, S Mick, J Xu, H Hua, C Mineo, A M Sule, M Steer,and P D Franzon, “Demystifying 3D ICs: The pros and cons of going vertical,”
IEEE Des Test 22 (2005) (6): 498–510.
36 C Ababei, Y Feng, B Goplen, H Mogal, T Zhang, K Bazargan, and S
Sapatnekar, “Placement and routing in 3D integrated circuits,” IEEE Des Test
39 S Im and K Banerjee, “Full chip thermal analysis of planar (2-D) and vertically
integrated (3-D) high performance ICs,” In International Electron Devices Meeting, IEDM Technical Digest., 2000, 727–730.
40 T.-Y Chiang, S Souri, C O Chui, and K Saraswat, “Thermal analysis of
het-erogeneous 3D ICs with various integration scenarios,” In Proc of International Electron Devices Meeting, 2001.
41 K Puttaswamy and G H Loh, “Thermal analysis of a 3D die-stacked
high-performance microprocessor,” In Proc of the 16th ACM Great Lakes Symposium
on VLSI New York: ACM, 2006, 19–24.
42 C Addo-Quaye, “Thermal-aware mapping and placement for 3-D NoC
designs,” In Proc of IEEE SOC, 2005, 25–28.
43 B Goplen and S Sapatnekar, “Thermal via placement in 3D ICs,” In Proc of the
2005 International Symposium on Physical Design ACM, 2005, 167–174.
44 J Cong and Y Zhang, “Thermal via planning for 3-D ICs,” In Proc of the 2005 IEEE/ACM International Conference on Computer-Aided Design Washington, DC:
IEEE Computer Society, 2005, 745–752
45 U Y Ogras and R Marculescu, “Analytical router modeling for
networks-on-chip performance analysis,” In Proc of the Conference on Design, Automation and Test in Europe EDA Consortium, 2007, 1096–1101.
46 P P Pande, C Grecu, M Jones, A Ivanov, and R Saleh, “Performance evaluation
and design trade-offs for networks-on-chip interconnect architectures,” IEEE Trans on Comp., 54 (Aug 2005) (8): 1025–1040.
47 B Feero and P P Pande, “Performance evaluation for three-dimensional
networks-on-chip,” In Proc of ISVLSI, 2007, 305–310.
48 V F Pavlidis and E G Friedman, “3-D topologies for networks-on-chip,” IEEE Trans on VLSI Sys., 15 (2007) (10): 1081–1090.
49 J Kim, C Nicopoulos, D Park, R Das, Y Xie, V Narayanan, M S Yousif, and
C R Das, “A novel dimensionally-decomposed router for on-chip
communi-cation in 3D architectures,” In Proc of ISCA ACM Press, 2007, 138–149.
50 K Siozios, K Sotiriadis, V F Pavlidis, and D Soudris, “Exploring alternative
3D FPGA architectures: Design methodology and CAD tool support,” In Proc.
of FPL, 2007.
51 L M Ni and P K McKinley, “A survey of wormhole routing techniques in
direct networks,” Computer 26 (1993) (2): 62–76.
Trang 4052 T Ye, L Benini, and G De Micheli, “Analysis of power consumption on switch
fabrics in network routers,” In Proc of DAC (10–14 June) 2002, 524–529.
53 R Reif, A Fan, K.-N Chen, and S Das, “Fabrication technologies for
three-dimensional integrated circuits,” In Proc of International Symposium on Quality Electronic Design (18–21 March) 2002, 33–37.
54 MIT Lincoln Labs, Mitll Low-Power FDSOI CMOS Process Design Guide,
September 2006
55 A W Topol, J D C La Tulipe, L Shi, D J Frank, K Bernstein, S E Steen,
A Kumar, et al., “Three-dimensional integrated circuits,” IBM J Res Dev 50
(2006) (4/5): 491–506
56 A W Topol, J D C La Tulipe, L Shi, D J Frank, K Bernstein, S E Steen,
A Kumar, “Techniques for producing 3D ICs with high-density interconnect,”
In VLSI Multi-Level Interconnection Conference, 2004.
57 S M Alam, R E Jones, S Rauf, and R Chatterjee, “Inter-strata connectioncharacteristics and signal transmission in three-dimensional (3D) integration
technology,” In ISQED ’07: Proceedings of the 8th International Symposium on Quality Electronic Design Washington, DC: IEEE Computer Society, 2007,
580–585
58 G.-M Chiu, “The odd-even turn model for adaptive routing,” IEEE Trans Parallel Distrib Syst 11 (2000) (7): 729–738.