Networks on chips theory and practice

As shown in Figure 1.2, the simulator now sup-ports 3D NoC architectures 3D mesh and 3D torus, as shown inFigure 1.3and vertical link interconnection patterns.. The type of input traffic

Trang 1

NETWORKS- CHIPS Theory and Practice

Trang 2

NETWORKS-

ON-CHIPS Theory and Practice

Edited by

FAYEZGEBALI

HAYTHAM ELMILIGI HQHAHED WATHEQ EL-KHARASHI

CRC Press

Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the

Taylor & Francis Group, an inform,! business

Trang 3

6000 Broken Sound Parkway NW, Suite 300

Boca Raton, FL 33487-2742

CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S Government works

Printed in the United States of America on acid-free paper

10 9 8 7 6 5 4 3 2 1

International Standard Book Number-13: 978-1-4200-7978-4 (Hardcover)

This book contains information obtained from authentic and highly regarded sources Reasonable eﬀorts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced

in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so

we may rectify in any future reprint.

Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microﬁlming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access right.com ( http://www.copyright.com /) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-proﬁt organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

www.copy-Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and

are used only for identiﬁcation and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Networks-on-chips : theory and practice / editors, Fayez Gebali, Haytham

Elmiligi, Mohamed Watheq El-Kharashi.

p cm.

“A CRC title.”

Includes bibliographical references and index.

ISBN 978-1-4200-7978-4 (hardcover : alk paper)

1 Networks on a chip I Gebali, Fayez II Elmiligi, Haytham III El-Kharashi,

Mohamed Watheq IV Title.

Trang 4

Preface vii

About the Editors xi

Contributors xiii

1 Three-Dimensional Networks-on-Chip Architectures 1

Alexandros Bartzas, Kostas Siozios, and Dimitrios Soudris 2 Resource Allocation for QoS On-Chip Communication 29

Axel Jantsch and Zhonghai Lu 3 Networks-on-Chip Protocols .65

Michihiro Koibuchi and Hiroki Matsutani 4 On-Chip Processor Traffic Modeling for Networks-on-Chip Design 95

Antoine Scherrer, Antoine Fraboulet, and Tanguy Risset 5 Security in Networks-on-Chips 123

Leandro Fiorin, Gianluca Palermo, Cristina Silvano, and Mariagiovanna Sami 6 Formal Verification of Communications in Networks-on-Chips .155

Dominique Borrione, Amr Helmy, Laurence Pierre, and Julien Schmaltz 7 Test and Fault Tolerance for Networks-on-Chip Infrastructures .191

Partha Pratim Pande, Cristian Grecu, Amlan Ganguly, Andre Ivanov, and Resve Saleh 8 Monitoring Services for Networks-on-Chips 223

George Kornaros, Ioannis Papaeystathiou, and Dionysios Pnevmatikatos 9 Energy and Power Issues in Networks-on-Chips .255 Seung Eun Lee and Nader Bagherzadeh

v

Trang 5

10 The CHAINworks Tool Suite: A Complete Industrial

Design Flow for Networks-on-Chips 281 John Bainbridge

Coding Applications .307 Dragomir Milojevic, Anthony Leroy, Frederic Robert,

Philippe Martin, and Diederik Verkest

Trang 6

Networks-on-chip (NoC) is the latest development in VLSI integration creasing levels of integration resulted in systems with different types of ap-plications, each having its own I/O traffic characteristics Since the early days

In-of VLSI, communication within the chip dominated the die area and dictatedclock speed and power consumption Using buses is becoming less desirable,especially with the ever growing complexity of single-die multiprocessor sys-tems As a consequence, the main feature of NoC is the use of networkingtechnology to establish data exchange within the chip

Using this NoC paradigm has several advantages, the main being theseparation of IP design and functionality from chip communicationrequirements and interfacing This has a side benefit of allowing the designer

to use different IPs without worrying about IP interfacing because wrappermodules can be used to interface IPs to the communication network Need-less to say, the design of complex systems, such as NoC-based applications,involves many disciplines and specializations spanning the range of systemdesign methodologies, CAD tool development, system testing, communica-tion protocol design, and physical design such as using photonics

This book addresses many challenging topics related to the NoC researcharea The book starts by studying 3D NoC architectures and progresses to adiscussion on NoC resource allocation, processor traffic modeling, and for-mal verification NoC protocols are examined at different layers of abstrac-tion Several emerging research issues in NoC are highlighted such as NoCquality of service (QoS), testing and verification methodologies, NoC secu-rity requirements, and real-time monitoring The book also tackles powerand energy issues in NoC-based designs, as power constraints are currentlyconsidered among the bottlenecks that limit embedding more processingelements on a single chip Following that, the CHAINworks, an industrial

design flow from Silistix, is introduced to address the complexity issues ofcombining various design techniques using NoC technology A case study

of Multiprocessor SoC (MPSoC) for video coding applications is presentedusing Arteris NoC The proposed MPSoC is a flexible platform, which allowsdesigners to easily implement other multimedia applications and evaluatethe future video encoding standards

This book is organized as follows.Chapter 1 discusses the design of 3DNoCs, which are multi-layer-architecture networks with each layer designed

as a 2D NoC grid The chapter explores the design space of 3D NoCs, takinginto account consumed energy, packet latency, and area overhead as cost fac-tors Aiming at the best performance for incoming traffic, the authors present

a methodology for designing heterogeneous 3D NoC topologies with a bination of 2D and 3D routers and vertical links

com-vii

Trang 7

Chapter 2studies resource allocation schemes that provide shared NoCcommunication resources, where well-defined QoS characteristics are ana-lyzed The chapter considers delay, throughput, and jitter as the performancemeasures The authors consider three main categories for resource allocationtechniques: circuit switching, time division multiplexing (TDM), and aggre-gate resource allocation The first technique, circuit switching, allocates allnecessary resources during the lifetime of a connection The second tech-nique, TDM, allocates resources to a specific user during well-defined timeperiods, whereas the third one, aggregate resource allocation, provides a flex-ible allocation scheme The chapter also elaborates on some aspects of priorityschemes and fairness of resource allocation As a case study, an example of acomplex telecom system is presented at the end of the chapter.

flow control These issues are vital for any on-chip interconnection networkbecause they affect transfer latency, silicon area, power consumption, andoverall performance Switch-to-switch and end-to-end flow control techni-ques are discussed with emphasis on switching and channel buffer manage-ment Different algorithms are also explained with a focus on performancemetrics The chapter concludes with a detailed list of practical issues includ-ing a discussion on research trends in relevant areas Following are the trendsdiscussed: reliability and fault tolerance, power consumption and its relation

to routing algorithms, and advanced flow control mechanisms

performance Predictable communication schemes are required for trafficmodeling and generation of dedicated IPs (e.g., for multimedia and signalprocessing applications) Precise traffic modeling is essential to build an effi-cient tool for predicting communication performance Although it is possible

to generate traffic that is similar to that produced by an application IP, it ismuch more difficult to model processor traffic because of the difficulty inpredicting cache behavior and operating system interrupts A common way

to model communication performance is using traffic generators instead ofreal IPs This chapter discusses the details of traffic generators It first detailsvarious steps involved in the design of traffic generation environment Then,

as an example, an MPEG environment is presented

scala-bility, efficiency, and reliability could be undermined by a security weakness.However, NoCs could contribute to the overall security of any system byproviding additional means to monitor system behavior and detect specificattacks The chapter presents and analyzes security solutions to counteractvarious security threats It overviews typical attacks that could be carried outagainst the communication subsystem of an embedded system The authorsfocus on three main aspects: data protection for NoC-based systems, security

in NoC-based reconfigurable architectures, and protection from side-channelattacks

with an emphasis on the application of formal methods The authors formalize

Trang 8

two dimensions of the NoC design space: the communication infrastructureand the communication paradigm as a functional model in the ACL2 logic Foreach essential design decision—topology, routing algorithm, and schedulingpolicy—a meta-model is given Meta-model properties and constraints areidentified to guarantee the overall correctness of the message delivery overthe NoC Results presented are general and thus application-independent.

To ensure correct message delivery on a particular NoC design, one has toinstantiate the meta-model with the specific topology, routing, and schedul-ing, and demonstrate that each one of these main instantiated functions sat-isfies the expected properties and constraints

their particular nature, NoCs are exposed to a range of faults that can cape the classic test procedures Among such faults: crosstalk, faults in thebuffers of the NoC routers, and higher-level faults such as packet misroutingand data scrambling These fault types add to the classic faults that must betested postfabrication for all ICs Moreover, an issue of concern in the case

es-of communication-intensive platforms, such as NoCs, is the integrity es-of thecommunication infrastructure By incorporating novel error correcting codes(ECC), it is possible to protect the NoC communication fabric against transienterrors and at the same time lower the energy dissipation

Network monitoring is the process of extracting information regarding theoperation of a network for purposes that range from management functions

to debugging and diagnostics NoC monitoring faces a number of challenges,including the volume of information to be monitored and the distributedoperation of the system The chapter details the objectives and opportuni-ties of network monitoring and the required interfaces to extract informationfrom the distributed monitor points It then describes the overall NoC mon-itoring architecture and the implementation issues of monitoring in NoCs,such as cost, the effects on the design process, etc A case study is presented,where several approaches to provide complete NoC monitoring services arediscussed

includ-ing dynamic and static power consumptions, and the energy model for NoCare studied The techniques for managing power and energy consumption

on NoC are discussed, starting with micro-architectural-level techniques, lowed by system-level power and energy optimizations Micro-architectural-level power-reduction methodologies are highlighted based on the powermodel for CMOS technology Parameters such as low-swing signaling, linkencoding, RTL optimization, multi-threshold voltage, buffer allocation, andperformance enhancement of a switch are investigated to reduce the powerconsumption of the network On the other hand, system-level approaches,such as dynamic voltage scaling (DVS), on–off links, topology selection, andapplication mapping, are addressed For each technique, recent efforts to solvethe power problem in NoC are presented To evaluate the dissipation of com-munication energy in NoC, energy models for each NoC component are used

Trang 9

fol-Power modeling methodologies, which are capable of providing a cycleaccurate power profile and enable power exploration at the system level,are also introduced in this chapter.

clock-less NoC IP blocks that fit into the existing ASIC flows and are used forthe design and synthesis of CHAIN networks that meet the critical chal-

lenges in complex devices This chapter takes the reader on a guided tourthrough the steps involved in the design of an NoC-based system using theCHAINworks tool suite As part of this process, aspects of the vast range of

trade-offs possible in building an NoC-based design are investigated Also,some of the additional challenges and benefits of using a self-timed NoC toachieve true top-level asynchrony between endpoint blocks are highlighted

in this chapter

Interuniver-sity Microelectronics Center (IMEC), Leuven, Belgium in partnership withSamsung Electronics and Freescale, using Arteris NoC as communicationinfrastructure This MPSoC platform is dedicated to high-performance HDTVimage resolution, low-power, real-time video coding applications using state-of-the-art video encoding algorithms such as MPEG-4, AVC/H.264, and Scal-able Video Coding (SVC) The presented MPSoC platform is built using sixCoarse Grain Array ADRES processors, also developed at IMEC, four on-chip memory nodes, one external memory interface, one control processor,one node that handles input and output of the video stream, and Arteris NoC

as communication infrastructure The proposed MPSoC platform is designed

to be flexible, allowing easy implementation of different multimedia tions, and scalable to the future evolutions of video encoding standards andother mobile applications in general

applica-The editors would like to give special thanks to all authors who contributed

to this book Also, special thanks to Nora Konopka and Jill Jurgensen fromTaylor & Francis Group for their ongoing help and support

Fayez Gebali Haytham El-Miligi

M Watheq El-Kharashi

Victoria, BC, Canada

Trang 10

Fayez Gebalireceived a B.Sc degree in electrical engineering (first class ors) from Cairo University, Cairo, Egypt, a B.Sc degree in applied mathemat-ics from Ain Shams University, Cairo, Egypt, and a Ph.D degree in electricalengineering from the University of British Columbia, Vancouver, BC, Canada,

hon-in 1972, 1974, and 1979, respectively For the Ph.D degree he was a holder of anNSERC postgraduate scholarship He is currently a professor in the Depart-ment of Electrical and Computer Engineering, University of Victoria, Victoria,

BC, Canada He joined the department at its inception in 1984, where he was

an assistant professor from 1984 to 1986, associate professor from 1986 to 1991,and professor from 1991 to the present Gebali is a registered professional en-gineer in the Province of British Columbia, Canada, since 1985 and a seniormember of the IEEE since 1983 His research interests include networks-on-chips, computer communications, computer arithmetic, computer security,parallel algorithms, processor array design for DSP, and optical holographicsystems

Engineering Department, University of Victoria, Victoria, BC, Canada, sinceJanuary 2006 His research interests include Networks-on-Chip (NoC) mod-eling, optimization, and performance analysis and reconfigurable Systems-on-Chip (SoC) design Elmiligi worked in the industry for four years as ahardware design engineer He also acted as an advisory committee memberfor the Wighton Engineering Product Development Fund (Spring 2008) at theUniversity of Victoria, a publication chair for the 2007 IEEE Pacific Rim Con-ference on Communications, Computers and Signal Processing (PACRIM’07),

Victoria, BC, Canada, and a reviewer for the International Journal of

Communi-cation Networks and Distributed Systems (IJCNDS), Journal of Circuits, Systems and Computers (JCSC), and Transactions on HiPEAC.

from the University of Victoria, Victoria, BC, Canada, in 2002, and B.Sc (firstclass honors) and M.Sc degrees in computer engineering from Ain ShamsUniversity, Cairo, Egypt, in 1992 and 1996, respectively He is currently anassociate professor in the Department of Computer and Systems Engineering,Ain Shams University, Cairo, Egypt and an adjunct assistant professor in theDepartment of Electrical and Computer Engineering, University of Victoria,Victoria, BC, Canada His research interests include advanced microprocessordesign, simulation, performance evaluation, and testability, Systems-on-Chip(SoC), Networks-on-Chip (NoC), and computer architecture and computernetworks education El-Kharashi has published about 70 papers in refereedinternational journals and conferences

xi

Trang 11

Manchester Technology Centre

Manchester, United Kingdom

John.bainbridge@silistix.com

Alexandros Bartzas

VLSI Design and Testing Center

Department of Electrical

and Computer Engineering

Democritus University of Thrace

Thrace, Greece

ampartza@ee.duth.gr

Dominique Borrione

TIMA Laboratory, VDS Group

Grenoble Cedex, France

kornaros@epp.teiher.gr

Seung Eun Lee

The Henry Samueli School

of EngineeringUniversity of CaliforniaIrvine, California

seunglee@uci.edu

xiii

Trang 12

Partha Pratim Pande

Washington State University

Pullman, Washington

pande@eecs.wsu.edu

Ioannis Papaeystathiou

Technical University of Crete

Kounoupidiana, Chania, Greece

sami@elet.polimi.it

Antoine Scherrer

Laboratoire de PhysiqueUniversit´e de LyonENS-Lyon, France

antoine.scherrer@ens-lyon.fr

Julien Schmaltz

Radboud University NijmegenInstitute for Computing andInformation SciencesHeijendaalseweg, The Netherlands

julien@cs.ru.nl

Trang 13

and Computer Engineering

Democritus University of Thrace

dsoudris@ee.duth.gr

Diederik Verkest

Interuniversity MicroelectronicsCentre - IMEC

Leuven, Belgium

Diederik.Verkest@imec.be

Trang 14

Three-Dimensional Networks-on-Chip

Architectures

Alexandros Bartzas, Kostas Siozios, and Dimitrios Soudris

CONTENTS

1.1 Introduction 1

1.2 Related Work 3

1.3 Alternative Vertical Interconnection Topologies 5

1.4 Overview of the Exploration Methodology 7

1.5 Evaluation—Experimental Results 9

1.5.1 Experimental Setup 9

1.5.2 Routing Procedure 12

1.5.3 Impact of Traffic Load 13

1.5.4 3D NoC Performance under Uniform Traffic 14

1.5.5 3D NoC Performance under Hotspot Traffic 16

1.5.6 3D NoC Performance under Transpose Traffic 19

1.5.7 Energy Dissipation Breakdown 19

1.5.8 Summary 22

1.6 Conclusions 23

Acknowledgments 23

References 24

1.1 Introduction

Future integrated systems will contain billions of transistors [1], composing tens to hundreds of IP cores These IP cores, implementing emerging complex multimedia and network applications, should be able to deliver rich multi-media and networking services An efficient cooperation among these IP cores (e.g., efficient data transfers) can be achieved through innovations of on-chip communication strategies

The design of such complex systems includes several challenges One chal-lenge is designing on-chip interconnection networks that efficiently connect the IP cores Another challenge is application mapping that makes efficient

1

Trang 15

use of available hardware resources [2,3] An architecture that is able to modate such a high number of cores, satisfying the need for communicationand data transfers, is the networks-on-chip (NoC) architecture [4,5] For thesereasons NoC became a popular choice for designing the on-chip interconnect.The industry has initiated different NoC-based designs such as the ÆtherealNoC [6] from Philips, the STNoC [7] from STMicroelectronics, and an 80-coreNoC from Intel [8] The key design challenges of emerging NoC designs, aspresented by Ogras and Marculescu [9], are (a) the communication infras-tructure, (b) the communication paradigm selection, and (c) the applicationmapping optimization.

accom-The type of IP cores, as well as the topology and interconnection scheme,plays an important role in determining how efficiently an NoC will performfor a certain application or a set of applications Furthermore, the applicationfeatures (e.g., data transfers, communication, and computation needs) play

an equally important role in the overall performance of the NoC system Anoverview of the cost considerations for the design of NoCs is given by Bolotin

et al [10]

Up to now NoC designs were limited to two dimensions But emerging 3Dintegration technology exhibits two major advantages, namely, higher per-formance and smaller energy consumption [11] A survey of the existing 3Dfabrication technologies is presented by Beyne [12] The survey shows theavailable 3D interconnection architectures and illustrates the main researchissues in current and future 3D technologies Through process/integrationtechnology advances, it is feasible to design and manufacture NoCs that willexpand in the third dimension (3D NoCs) Thus, it is expected that 3D inte-gration will satisfy the demands of the emerging systems for scaling, perfor-mance, and functionality A considerable reduction in the number and length

of global interconnect using 3D integration is expected [13]

In this chapter, we present a methodology for designing alternative 3DNoC architectures We define 3D NoCs as architectures that use severalactive silicon planes Each plane is divided into a grid where 2D or 3D routermodules are placed The main objective of the methodology is to derive 3DNoC topologies with a mix of 2D and 3D routers and vertical link intercon-nection patterns that offer best performance for the given chip traffic The costfactors we consider are (i) energy consumption, (ii) average packet latency,and (iii) total switch block area We make comparisons with an NoC in whichall the routers are 3D ones We have employed and extended the Worm_SimNoC simulator [14], which is able to model these heterogeneous architecturesand simulate them, gathering information on their performance The hetero-geneous NoC architecture can be achieved using a combined implementation

of 2D and 3D routers in each layer

The rest of the chapter is organized as follows: In Section 1.2 the relatedwork is described In Section 1.3 we present the 3D NoC topologies under con-sideration, whereas in Section 1.4 the proposed methodology is introduced

In Section 1.5 the simulation process and the achieved results are presented.Finally, in Section 1.6 the conclusions are drawn and future work is outlined

Trang 16

1.2 Related Work

On-chip interconnection is a widely studied research field and good views are presented [15,16], which illustrate the various interconnectionschemes available for present ICs and emerging Multiprocessor Systems-on-Chip (MPSoC) architectures An NoC-based interconnection is able toprovide an efficient and scalable infrastructure, which is able to handle theincreased communication needs Lee et al [17] present a quantitative evalu-ation of 2D point-to-point, bus, and NoC interconnection approaches In thiswork, an MPEG-2 implementation is studied and it proved that the NoC-based solution scales very well in terms of area, performance, and powerconsumption

over-To evaluate NoC designs, a number of simulators has been developed,such as the Nostrum [18], Polaris [19], XPipes [20], and Worm_Sim [14], usingC++ and/or SystemC [21] To provide adequate input/stimuli to an NoCdesign, synthetic traffic is usually used Several synthetic traffic generatorshave been proposed in several texts [22–25] to provide adequate inputs toNoC simulators for evaluation and exploration of proposed designs

A methodology that synthesizes NoC architectures is proposed by Ogras,

Hu, and Marculescu [26] where long-range links are inserted on top of amesh network In this methodology, the NoC design is addressed using anapplication specific approach, but it is limited to two dimensions Li et al [27]presented a mesh-based 3D network-in-memory architecture, using a hybridNoC/bus interconnection fabric, to accommodate efficiently processors andL2 cache memories in 3D NoCs It is demonstrated that by using a 3D L2memory architecture, better results are achieved compared to 2D designs.Koyanagi et al [28] presented a 3D integration technique of vertical stack-ing and gluing of several wafers By utilizing this technology, the authorswere able to increase the connectivity while reducing the number of long in-terconnections A fabricated 3D shared memory is presented by Lee et al [29].The memory module has three planes and can perform wafer stacking usingthe following technologies: (i) formation of buried interconnection, (ii) mi-crobumps, (iii) wafer thinning, (iv) wafer alignment, and (v) wafer bonding.Another 3D integration scheme is proposed by Iwata et al [30], where wirelessinterconnections are employed to offer connectivity

An overview of the available interconnect solutions for Systems-on-Chip(SoC) are presented by Meindl [31] This study includes interconnects for 3DICs and shows that 3D integration reduces the length of the longest globalinterconnects [32] and reduces the total required wire length, and thus thedissipated energy [33]

Benkart et al [34] presented an overview of the 3D chip stacking technologyusing throughchip interconnects In their work, the trade-off between the highnumber of vertical interconnects versus the circuit density is highlighted.Furthermore, Davis et al [35] show the implementation of an FFT in a 3D ICachieving 33% reduction in maximum wire length, thereby proving that the

Trang 17

move to 3D ICs is beneficial However, the heat dissipation is highlighted asone of the limiting factors.

The placement and routing in 3D integrated circuits are studied by Ababei

et al [36] Also, a system on package solution for 3D network is presented

by Lim [37] However, the heat dissipation of 3D circuits remains a big lenge [38] To tackle this challenge, several analysis techniques have beenproposed [39–41] One approach is to perform thermal-aware placement andmapping for 3D NoCs, such as the work presented by Quaye [42] Further-more, the insertion of thermal vias can lower the chip temperature as illus-trated in several texts [43,44]

chal-A generalized NoC router model is presented; based on that, Ogras andMarculescu performed NoC performance analysis Using the aforementionedrouter model, it is feasible to perform NoC evaluation, which is significantlyfaster than performing simulation Additionally, Pande et al [46] presented

an evaluation methodology to compare the performance and other metrics

of a variety of NoC architectures But, this comparison is made only among2D NoC architectures The work of Feero and Pande [47] extended the afore-mentioned work considering 3D NoCs, and illustrated that the 3D NoCs areadvantageous when compared to 2D ones (with both having the same number

of components in total) It is demonstrated that besides reducing the footprint

in a fabricated design, 3D network structures provide a better performancecompared to traditional 2D architectures This work shows that despite thecost of a small area penalty, 3D NoCs achieve significant gains in terms ofenergy, latency, and throughput

Pavlidis and Friedman [48] presented and evaluated various 3D NoCtopologies They also proposed an analytic model for 3D NoCs where a meshtopology is considered under a zero-load latency Kim et al [49] presented

an exploration of communication architectures on 3D NoCs A dimensionallydecomposed router and its comparison with a hop-by-hop router connectionand hybrid NoC-bus architecture is presented The aforementioned works,both from the physical level as well as adding more communication archi-tectures, such as full 3D crossbar and bus-based communication, are comple-mentary to the one presented here and can be used for the extension of themethodology

The main difference between the related work and the one presented here

is that we do not assume full vertical interconnection (as shown inFigure 1.1),but rather a heterogeneous interconnection fabric, composed of a mix of3D and 2D routers An additional motivation for this heterogeneous design

is not only for the reduction of total interconnection network length, butalso to get the reduced size of the 2D routers when compared to the 3Dones [47] Reducing the number of vertical interconnection links simplifiesthe fabrication of the design and frees up more active chip area for availablelogic/memory blocks Two-dimensional routers are routers that have con-nections with neighboring ones of the same grid By comparison, a 3D routerhas direct, hop-by-hop connections with neighboring routers belonging to thesame grid and those belonging to the adjacent planes This difference between

Trang 18

(a) Full vertical interconnection (100%)

for a 3D NoC.

(b) Uniform distribution of vertical links.

(c) Positioning of vertical links at the

center of the NoC.

(d) Positioning of vertical links at the periphery of the NoC.

3D Router Y

Z

X

2D Router

3D Router Y

1.3 Alternative Vertical Interconnection Topologies

We consider four different groups of interconnection patterns, as well as 10vertical interconnection topologies in the context of this work Consider a 3D

NoC composed of Z 2D active silicon planes Each 2D plane has dimensions

Trang 19

X × Y We also denote 0 ≤ K ≤ 100 as the percentage of the routers that have connections in the vertical direction (called 3D routers) The available

scenarios of how these 3D routers can be placed on a grid in each plane are

as follows:

1 Uniform: 3D routers are uniformly distributed over the different

planes Using this scheme, we “spread” the 3D routers along everyplane of the 3D NoC To find the place of each router we work likethis:

• Place the first 3D router at position (0, 0, z) where z = 0, 1, · · · ,

Z− 1

• Place the four neighboring 2D routers in the positions (x+r +1,

y, z), (x − r − 1, y, z), (x, y + r + 1, z), and (x, y − r − 1, z) The step size r is defined as:

r= 1

r represents the number of 2D routers between consecutive 3D

ones This scheme is illustrated inFigure 1.1(b), showing one

plane of a 3D NoC, with K = 25% and r = 3.

2 Center: All the 3D routers are positioned at the center of each plane,

as shown in Figure 1.1(c) Because the 3D routers are located in thecenter of the plane, the 2D routers are distributed in the outer region

of the NoC grid, connecting only to the neighboring routers of thesame plane

3 Periphery: The 3D routers are positioned at the periphery of each

plane [as shown in Figure 1.1(d)] In this case, the NoC is focused

on serving best the communication needs of the outer cores

4 Full custom: The position of the 3D routers is fully customized

matching the needs of the application with the NoC architecture.This solution fits best the needs of the application, while it mini-mizes the number of 3D routers However, derivation of a full cus-tom solution requires high design time, because this exploration isgoing to be performed for every application Furthermore, this willcreate a nonregular design that will not adjust well to the potentialchange of functionality, the number of applications that are going

to be executed, etc

The aforementioned patterns are based on the 3D FPGAs work presented

by Siozios et al [50] To perform an exploration toward full customized connection schemes, real applications and/or application traces are needed

inter-In this chapter, we adopt various types of synthetic traffic, so the explorationfor full customized interconnections schemes is out of the scope More specif-ically, we focus on pattern-based vertical interconnection topologies (cate-gories 1–3) We consider 10 different vertical link interconnection topologies.For each of these topologies, the number of 3D routers is given and the value

Trang 20

of K given in parentheses For a 4×4×4 NoC architecture we use the notation

64 (K ).

• Full: Where all the routers of the NoC are 3D ones [number of 3D

routers: 64 (100%)]

• Uniform based: Pattern-based topologies withr value equal to three

[by_three pattern, as shown inFigure 1.1(b)], four (by_four), and five (by_five) Correspondingly, the number of 3D routers is 44 (68.75%),

48 (75%), and 52 (81.25%)

• Odd: In this pattern, all the routers belonging to the same row are

of the same type Two adjacent rows never have the same type ofrouter [number of 3D routers: 32 (50%)]

• Edges: Where the center (dimensionsx×y) of the 3D NoC has only2D routers [number of 3D routers: 48 (75%)]

• Center: Where only the center (dimensionsx×y) of the 3D NoChas 3D routers [number of 3D routers: 16 (25%)]

• Side based: Where a side (e.g., outer row) of each plane has 2D

routers Patterns evaluated have one (one_side), two (two_side), or three (three_side) sides as “2D routers only.” The number of 3D

routers for each pattern is 48 (75%), 36 (56.25%), and 24 (37.5%),respectively

Each of the aforementioned vertical interconnection schemes has tages and disadvantages These schemes perform on the basis of the behavior

advan-of the applications that are implemented on the NoC Experimental results inSection 1.5 show that a wrong choice may diminish the gains of using a 3Darchitecture

1.4 Overview of the Exploration Methodology

An overview of the proposed methodology is shown in Figure 1.2 To form the exploration of alternative topologies for 3D NoC architectures, theWorm_Sim NoC simulator [14], which utilizes wormhole switching, isused [51] (this is the center block inFigure 1.2)

per-To support 3D architectures/topologies, we have extended this tor to adapt to the provided routing schemes, and be compatible with theTrident traffic format [23] As shown in Figure 1.2, the simulator now sup-ports 3D NoC architectures (3D mesh and 3D torus, as shown inFigure 1.3)and vertical link interconnection patterns Each of these 3D architectures iscomposed of many grids, and each grid is composed of tiles that are con-nected to each other using mesh or torus interconnection networks Eachtile is composed of a processing core and a router Because we are consider-ing 3D architectures, the router is connected to the neighboring tiles and its

Trang 21

simula-3D NoC Architectures

Existing tools Extensions Output of new tools

NoC Simulator

Real application traﬃc

Vertical link interconnection patterns

An overview of the exploration methodology of alternative topologies for 3D Networks-on-Chip.

local processing core via channels, consisting of bidirectional point-to-pointlinks

The NoC simulator can be configured using the following parameters(shown in Figure 1.2):

1 The NoC architecture (2D or 3D mesh or torus) as well as defining

the specific x, y, and z parameters

2 The type of input traffic (uniform, transpose, or hotspot) as well ashow heavy the traffic load will be

Legend:

Link to upper layer Link to lower layer

FIGURE 1.3

3D NoC architectures.

Trang 22

3 The routing scheme

4 The vertical link configuration file, which defines the locations ofthe vertical links

5 The router model as well as the models used to calculate the energyand delay figures

The output of the simulation is a log file that contains the relevant ated cost factors, such as overall latency, average latency per packet, and theenergy breakdown of the NoC, providing values for link energy consump-tion, crossbar and router energy consumption, etc From these energy figures,

evalu-we calculate the total energy consumption of the 3D NoCs

The 3D architectures to be explored may have a mix of 2D and 3D routers,ranging from very few 3D routers to only 3D routers To steer the exploration,

we use different patterns (as presented in Section 1.3) The proposed 3D NoCscan be constructed by placing a number of identical 2D NoCs on individualplanes, providing communication by interplane vias among vertically adja-cent routers This means that the position of silicon vias is exactly the same foreach plane Hence, the router configuration is extended to the third dimen-sion, whereas the structure of the individual logic blocks (IP cores) remainsunchanged

1.5 Evaluation—Experimental Results

The main objective of the methodology and the exploration process is to findalternative irregular 3D NoC topologies with a mix of 2D and 3D routers Thenew topologies exhibit vertical link interconnection patterns that acquire thebest performance Our primary cost function is the energy consumption, withthe other cost factors being the average packet latency and total switch blockarea We compare these patterns against the fully vertically interconnected3D NoC as well as the 2D one (all having the same number of nodes)

1.5.1 Experimental Setup

The 3D router used here has a 7× 7 crossbar switch, whereas the 2D routeruses a 5× 5 crossbar switch Additionally, each router has a routing table andbased on the source/destination address, the routing table decides whichoutput link the outgoing packet should use The routing table is built usingthe algorithm described inFigure 1.4

The NoC simulator uses the Ebit energy model, proposed by Benini and

de Micheli [52] We make the assumption (based on the work presented by

Reif et al [53]) that the vertical communication links between the planes are

elec-trically equivalent to horizontal routing tracks with the same length Based on this

assumption, the energy consumption of a vertical link between two routers

Trang 23

1: functionR OUTING XYZ

4:

5: findCoordinates();//returns src.x, src.y, src.z, dst.x, dst.y and dst.z

6:

7: for allplane ∈ NoC do

9: findTmpDestination();//find a temporary destination of the packet for each plane

of the NoC that the packet passes from

21: for allvalid Nodes ∈ plane do

the vertical interconnections patterns input file.

with the smallest Manhattan distance

Routing algorithm modifications (// denotes a comment in the algorithm)

equals the consumption of a link between two neighboring routers at the same

plane (if they have the same length).

More specifically because the 3D integration technology, which providescommunication among layers using through-silicon vias (TSVs), has not beenexplored sufficiently yet, 3D-based systems design still needs to be addressed.Due to the large variation of the 3D TSV parameters, such as diameter, length,dielectric thickness, and fill material among alternative process technolo-gies, a wide range of measured resistances, capacitances, and inductanceshave been reported in the literature Typical values for the size (diameter) ofTSVs is about 4× 4μm, with a minimum pitch around 8–10μm, whereastheir total length starting from plane T1 and terminating on plane T3 is17.94 μm, implying wafer thinning of planes T2 and T3 to approximately10–15μm [54–56]

The different TSV fabrication processes lead to a high variation in the responding electrical characteristics The resistance of a single 3D via variesfrom 20 m to as high as 600 m [55,56], with a feasible value (in terms of

cor-fabrication) around 30 m Regarding the capacitances of these vias, their

Trang 24

values vary from 40 fF to over 1 pF [57], with feasible value for fabrication

to be around 180 fF In the context of this work, we assume a resistance of

350 m and a capacitance of 2.5 fF.

Using our extended version of the NoC simulator, we have performedsimulations involving a 64-node and a 144-node architecture with 3D meshand torus topologies with synthetic traffic patterns The configuration filesdescribing the corresponding link patterns are supplied to the simulator as

an input The sizes of the 3D NoCs we simulated were 4× 4 × 4 and 6 × 6 × 4,whereas the equivalent 2D ones were 8× 8 and 12 × 12 We have used threetypes of input (synthetic traffic) and three traffic loads (heavy, normal, andlow) The traffic schemes used are as follows:

• Uniform: Where we have uniform distribution of the traffic across

the NoC with the nodes receiving approximately the same number

of packets

• Transpose: In this traffic scheme, packets originating from node

(a , b, c) is destined to node (X − a, Y − b, Z − c), where X, Y, and

Z are the dimensions of the 3D NoC.

• Hotspot: Where some nodes (a minority) receive more packets than

the majority of the nodes The hotspot nodes in the 2D grids arepositioned in the middle of every quadrant, where the size of thequadrant is specified by the dimensions of each plane in the 3D NoCarchitecture under simulation, whereas in the 3D NoC, a hotspot islocated in the middle of each plane

We have used the three routing schemes presented in Worm_Sim [14], andextended them in order to function in a 3D NoC as follows:

• XYZ-old: Which is an extended version of XY routing.

• XYZ: Which is based on XY routing but routes the packet along the

direction with least delay

• Odd-even: Which is the odd-even routing scheme presented by

Chiu [58] In this scheme, the packets take some turns in order toavoid deadlock situations

From the simulations performed, we have extracted figures regarding theenergy consumption (in joules) and the average packet latency (in clockcycles) Additionally, for each vertical interconnection pattern, as well as forthe 2D NoC, we calculated the occupied area of the switching block, based onthe gate equivalent of the switching fabric presented by Feero and Pande [47]

A good design is the one that exhibits lower values in the aforementioned rics when compared to the 2D NoC as well as to the 3D NoC which has fullvertical connectivity (all the routers are 3D ones) Furthermore, all the simu-lation measurements were taken for the same number of operational cycles(200,000 cycles)

Trang 25

met-1.5.2 Routing Procedure

To route packets over the 3D topologies, we modified the routing procedure,

as shown inFigure 1.4 The modified routing procedure is valid for all routingschemes This modification allows customization of the routing scheme toefficiently cope with the heterogeneous topologies, based on vertical linkconnectivity patterns

The steps of the routing algorithm are as follows:

1 For each packet, we know the source and destination nodes and canfind the positions of these nodes in the topology The on-chip “coor-dinates” of the nodes for the destination one aredst.x, dst.y,

2 By doing so, we can formulate the temporary destinations, one for

each plane For the number of planes a packet has to traverse toarrive at its final destination, the algorithm initially sets the route

to a temporary destination located at positiondst.x, dst.y,

the packet is going to follow across the planes (i.e., if it is going to

an upper or lower plane according to its “source” plane) and finds

the nearest valid link at each plane This outputs, as an outcome

to update properly, thezcoefficient of the temporary destination’s

position Valid link is every vertical interconnection link available in

the plane in which the packet traverses This information is obtainedfrom the vertical interconnection patterns file A link is uniquelyidentified by the node that is connected and its direction So, forall the specified valid links that are located at the same plane, theheader flit of the packet checks if the desired route is matched to thedestination up or down link

3 If there is no match between them, compute the Manhattan distance(in case of 3D torus topology, we have modified it to produce thecorrect Manhattan distance between the two nodes)

4 Finally, the valid link with the smallest Manhattan distance is sen, and its corresponding node is chosen to be the temporary des-tination at each plane the packet is going to traverse

cho-5 After finding a set of temporary destinations (each one located at adifferent plane), they are stored into the header flit of the packet Theaforementioned temporary destinations may or may not be used,

as the packet is being routed during the simulation, so they are

“candidate” temporary destinations The decision of being just acandidate or the actual destination per plane is taken based on one

of two scenarios: (1) if a set of vertical links, which exhibited tively high utilization during a previous simulation with the samenetwork parameters, achieved the desired minimum link commu-nication volume or (2) according to a given vertical link pattern such

rela-as the one presented in Section 1.1

Trang 26

The modification of the algorithm essentially checks if a vertical link exists

in the temporary destination of the packet, otherwise the closest router withsuch a link is chosen Thus the routing complexity is kept low

1.5.3 Impact of Traffic Load

Three different traffic loads were used (heavy, medium/normal, low) In thisway, by altering the packet generation rate, it is possible to test the perfor-mance of the NoC The heavy load has 50% increased traffic, whereas the lowone has 90% decreased traffic compared to the medium load, respectively.The behavior of the NoCs in terms of the average packet latency is shown

in Figure 1.5 In this figure, the latency is normalized to the average packet

latency of the full_connectivity 3D NoC under medium load and for each traffic

scheme The impact of the traffic load (latency increases as the load increases)

can be observed, and also we can see that NoCs can cope with the increasedtraffic as well as the differences between different traffic schemes

Mesh topologies exhibit similar behavior, though the latency figures arehigher due to the decreased connectivity when compared to torus topologies.This is shown inFigure 1.6where the latency of 64-node mesh and torus NoCsare compared (the basis for the latency normalization is the average packet

latency of the full_connectivity 3D torus) From this comparison, it is shown

that the mesh topologies have an increased packet latency of 34% compared

to the torus ones (for the same traffic scheme, load, and routing algorithm)

90%

8×8/torus by_ﬁve by_four by_three center edg

es odd one_side three_side two_side

full_connectivity

Latency behavior for 64-node NoCs(torus topology, xyz routing)

Hotspot (heavy) Hotspot (normal) Hotspot (low) Transpose (heavy) Transpose (normal) Transpose (low) Uniform (heavy) Uniform (normal) Uniform (low)

FIGURE 1.5

Impact of traffic load on 2D and 3D NoCs (for all different types of traffic used).

Trang 27

8×8/mesh by_ﬁve by_four by_three center edges

odd one_

side three_side two_side

Latency behavior for 64-node mesh and torus NoCs

(uniform traﬃc, xyz routing)

Mesh (heavy uniform) Mesh (medium uniform) Mesh (low uniform) Torus (heavy uniform) Torus (medium uniform) Torus (low uniform)

FIGURE 1.6

Impact of traffic load on 2D and 3D mesh and torus NoCs (for uniform traffic).

1.5.4 3D NoC Performance under Uniform Traffic

to 3D mesh networks by using uniform traffic, medium load, and xyz-old

routing We compared the total energy consumption, average packet latency,total area of the switching blocks (routers), and the percentage of 2D routers(having 5 I/O ports instead of 7) under 4× 4 × 4 [Figure 1.7(a)] and 6 × 6 × 4

[Figure 1.7(b)] mesh architectures In the x-axis all the interconnection patterns are presented In the y-axis, and in a normalized manner (used as the basis

for the figures of the full vertically interconnected 3D NoC), the cost factorsfor total energy consumption, average packet latency, total switching blockarea, and percentage of vertical links are presented

The advantages of 3D NoCs when compared to 2D ones are shown inFigure 1.7(a) In this case, the 8× 8 mesh dissipates 39% more energy and has29% higher packet delivery latency However, the switching area is 71% of thearea of the fully interconnected 3D NoC because all its routers are 2D ones

Employing the by_five link pattern results in 3% reduction in energy and 5%

increase in latency In this pattern, only 81% of the routers are 3D ones so thearea of the switching logic is reduced by 5% (when compared to the area of thefully interconnected 3D NoC) Figure 1.7(b) shows that more patterns exhibit

Trang 28

(a) Experimental results for a 4 × 4 × 4 3D mesh.

(b) Experimental results for a 6 × 6 × 4 3D mesh.

#Links

12×12/mesh

by_ﬁve by_four by_three center edges

odd one_side three_side two_side

Trang 29

better results It is worth noticing that the overall performance of the 2D NoCsignificantly decreases, exhibiting around 50% increase in energy and latency.When we increase the traffic load by increasing the packet generation rate

by 50%, we see that all patterns have worse behavior than the full_connectivity

3D NoC The reason is that by using a pattern-based 3D NoC, we decreasethe number of 3D routers by decreasing the number of vertical links, therebyreducing the connectivity within the NoC As expected, this reduced connec-tivity has a negative impact in cases where there is an increased traffic.For low traffic load NoC, the patterns can become beneficial because there

is not that high need for communication resources This effect is illustrated in

and 3D NoCs under low uniform traffic and xyz routing The exception is the

edges pattern in the 64-node 3D NoC [Figure 1.8(a)], where all the 3D routers

reside on the edges of each plane of the 3D NoC This results in a 7% increase

in the packet latency Again it is worth noticing that as the NoC dimensionsincrease, the performance of the 2D NoC decreases This can be clearly seen

in Figure 1.8(b), where the 2D NoC has 38% increased energy dissipation

We have also compared the performance of the proposed approach againstthat achievable with a torus network, which provides wraparound linksadded in a systematic manner Note that the vertical links connecting thebottom with the upper planes are not removed, as this is the additional fea-ture of the torus topology when compared to the mesh Our simulationsshow that using the transpose traffic scheme, the vertical link patterns exhibitnotable results; this pattern continues as the dimensions of the NoC get bigger.The explanation is that the flow of packets between a source and a destina-tion follows a diagonal course among the nodes at each plane At the sametime, the wraparound links of the torus topology play a significant role inpreserving the performance even when some vertical links are removed Theresults show that increasing the dimensions of the NoC increases the energysavings, when the link patterns are applied But, this is not true for the case ofmesh topology In particular, in the 6× 6 × 4 3D torus architecture, using the

by_five, by_four, by_three, one_side, and two_side patterns show better results as

far as the energy consumption is concerned For instance, the two_side pattern

exhibits 7.5% energy savings and 32.84 cycles increased latency relative to the

30 cycles of the fully vertical connected 3D torus topology

1.5.5 3D NoC Performance under Hotspot Traffic

In the case of hotspot traffic (Figure 1.9), testing the 4× 4 × 4 3D mesh tecture, seven out of the nine link patterns perform better relative to the fully

archi-vertically connected topology For instance, the two_side pattern exhibits 2%

decrease in network energy consumption, whereas the increase in latency is2.5 cycles Note that only 56.25% of the vertical links are present The hotspottraffic in 3D mesh topologies favors cube topologies (e.g., 6× 6 × 6) Even so,

in 6× 6 × 4 mesh architecture, the center and two_side patterns exhibit similar

performance regarding average cycles per packet compared to that of fully

Trang 30

8×8/mesh by_ﬁve by_four by_three center edg

es odd one_

side three_side two_side

full_connectivi ty

#Links

12×12/mesh

full_connectivi ty

(b) Experimental results for a 6 × 6 × 4 3D mesh.

FIGURE 1.8

Uniform traffic (low load) on a 3D NoC for alternative interconnection topologies.

Trang 31

8×8/mesh by_ﬁve by_four by_three center edg

es odd one_side three_side two_side

full_connectivi ty

#Links

12×12/mesh

full_connectivity (b) Experimental results for a 6 × 6 × 4 3D mesh.

FIGURE 1.9

Hotspot traffic (low load) on a 3D NoC for alternative interconnection topologies.

Trang 32

vertical connected architecture (that was expected due to the location wherethe hotspot nodes were positioned).

triggered by a hotspot-type traffic are presented Figures 1.10(a) and 1.10(b)present the results for the mesh and torus architectures, respectively, showinggains in energy consumption and area, with a negligible penalty in latency.Again, the architectures where congestion is experienced are highlighted.These results are also compared to their equivalent 2D architectures Forthe 8×8 2D NoC (same number of cores as the 4×4 × 4 architecture), it shows25% increased latency and 40% increased energy consumption compared to

the one_side link pattern, whereas the 12× 12 mesh (same number of cores asthe 6× 6 × 4 architecture) shows 46% increase in latency and 49% increase

in energy consumption compared to the same pattern using uniform traffic

In addition, comparing the by_four pattern on the 64-node architecture under

transpose traffic shows 31% and 18% reduced latency and total network sumption, respectively However, in the case of hotspot traffic and employing

con-the two_side link pattern, con-these numbers change to 24% reduced latency and

56% reduced energy consumption

1.5.6 3D NoC Performance under Transpose Traffic

Under the transpose traffic scheme, the by_four link pattern adopted shows

6.5% decrease in total network energy consumption at the expense of 3 cyclesincreased latency InFigure 1.11, the simulation results for the 3D 4× 4 × 4mesh and 6× 6 × 4 torus NoCs are presented for transpose traffic In Figure1.11(a), we can see that we have a 4% gain in the energy consumption of the3D NoCs with a 5% increase in the packet latency Additionally, we gain 6%

in the area occupied by the switching blocks of the NoC Comparing thesepatterns to the 2D NoC (having the same number of nodes) we can have

on average a 14% decrease in energy consumption, a 33% decrease in totalpacket latency But on the area, the cost of the 3D NoC is higher by 23%

In Figure 1.11(b), we can see that the 2D NoC experiences traffic contentionand not being able to cope with that amount of traffic (the actual value of thelatency is close to 5000 cycles per packet) Additionally, 47% gains achieved inenergy consumption When this torus architecture is compared to the “full”3D one, it shows 5% gains in energy consumption with 8% increased latencyand 9% reduced switching block area

1.5.7 Energy Dissipation Breakdown

The analytical results of the Ebit [52] energy model indicate that, when ing to 3D architectures, the energy consumption of the links, crossbars, ar-biters, and buffer read energy decreases, whereas there is an increase in theenergy consumed when writing to the buffer and taking the routing decisions

move-On average, the link energy consumption accounts for 8% of the totalenergy, the crossbar 6%, the buffer’s read energy 23%, and the buffer’s write

Trang 33

full_connectivity (b) Experimental results for a 4 × 4 × 4 3D torus.

FIGURE 1.10

Hotspot traffic (medium load) on a 3D NoC for alternative interconnection topologies.

Trang 34

8×8/mesh by_ﬁve by_four by_three center edges

full_connectivity Congestion

(b) Experimental results for a 6 × 6 × 4 3D torus.

FIGURE 1.11

Transpose traffic on a 3D NoC for alternative interconnection topologies.

Trang 35

Link Crossbar Router Arbiter Buﬀer Read Buﬀer Write

Energy Dissipation Breakdown

8x8/mesh by_ﬁve by_four by_three center edges odd one_side three_side two_side full_connectivity

in the first column The next two columns present the gains [min to max values(in%)] for the energy dissipation The fourth and fifth columns show the min

to max values for the average packet latency, respectively It can been seen thatenergy reduction up to 29% can be achieved But gains in energy dissipation

TABLE 1.1

Experimental Results: Min-Max Impact on Costs

(Energy and Latency) with Medium Traffic Load

Trang 36

cannot be reached without paying a penalty in average packet latency It isthe responsibility of the designer, utilizing this exploration methodology, tochoose a 3D NoC topology and vertical interconnection patterns that bestmeet the requirements of the system.

1.6 Conclusions

Networks-on-Chips are becoming more and more popular as a solution able

to accommodate large numbers of IP cores, offering an efficient and scalableinterconnection network Three-dimensional NoCs are taking advantage ofthe progress of integration and packaging technologies offering advantageswhen compared to 2D ones Existing 3D NoCs assume that every router of agrid can communicate directly with the neighboring routers of the same gridand with the ones of the adjacent planes This communication can be achieved

by employing wire bonding, microbumb, or through-silicon vias [35].All of these technologies have their advantages and disadvantages Reduc-ing the number of vertical connections makes the design and final fabrication

of 3D systems easier The goal of the proposed methodology is to find erogeneous 3D NoC topologies with a mix of 2D and 3D routers and verticallink interconnection patterns that performs best to the incoming traffic Inthis way, the exploration process evaluates the incoming traffic and the in-terconnection network, proposing an incoming traffic-specific alternative 3DNoC Aiming in this direction, we have presented a methodology that shows

het-by employing an alternative 3D NoC vertical link interconnection network,

in essence proposing an NoC with less vertical links, we can achieve gains inenergy consumption (up to 29%), in the average packet latency (up to 2%),and in the area occupied by the routers of the NoC (up to 18%)

Extensions of this work could include not only more heterogeneous 3Darchitectures but also different router architectures, providing better adap-tive routing algorithms and performing further customizations targeting het-erogeneous NoC architectures In this way it would be able to create evenmore heterogeneous 3D NoCs For providing stimuli to the NoCs, a movetoward using real applications would be useful apart from using even moretypes of synthetic traffic By doing so, it would become feasible to proposeapplication-domain-specific 3D NoC architectures

Acknowledgments

The authors would like to thank Dr Antonis Papanikolaou (IMEC vzw.,Belgium) for his helpful comments and suggestions This research is sup-ported by the 03ED593 research project, implemented within the framework

Trang 37

of the “Reinforcement Program of Human Research Manpower” (PENED)and cofinanced by national and community funds (75% from EuropeanUnion—European Social Fund and 25% from the Greek Ministry ofDevelopment—General Secretariat of Research and Technology).

References

1 Semiconductor Industry Association, “International technology roadmapfor semiconductors,” 2006 [Online] Available: http://www.itrs.net/Links/

2 S Murali and G D Micheli, “Bandwidth-constrained mapping of cores onto

NoC architectures,” In Proc of DATE Washington, DC: IEEE Computer Society,

2004, 896–901

3 J Hu and R Marculescu, “Energy- and performance-aware mapping for regular

NoC architectures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24 (2005) (4): 551–562.

4 L Benini and G de Micheli, “Networks on chips: a new SoC paradigm,”

Computer 35 (2002) (1): 70–78.

5 A Jantsch and H Tenhunen, eds., Networks on Chip New York: Kluwer Academic

Publishers, 2003

6 K Goossens, J Dielissen, and A Radulescu, “The Æthereal network on chip:

Concepts, architectures, and implementations,” IEEE Des Test, 22 (2005) (5):

414–421

7 STMicroelectronics, “STNoC: Building a new system-on-chip paradigm,” WhitePaper, 2005

8 S Vangal, J Howard, G Ruhl, S Dighe, H Wilson, J Tschanz, D Finan, et al.,

“An 80-tile 1.28 TFLOPS network-on-chip in 65nm CMOS,” In Proc of tional Solid-State Circuits Conference (ISSCC) IEEE, 2007, 98–589.

Interna-9 U Ogras and R Marculescu, “Application-specific network-on-chip architecture

customization via long-range link insertion,” In Proc of ICCAD (6–10 Nov.) 2005,

246–253

10 E Bolotin, I Cidon, R Ginosar, and A Kolodny, “Cost considerations in network

on chip,” Integr VLSI J 38 (2004) (1): 19–42.

11 E Beyne, “3D system integration technologies,” In International Symposium on VLSI Technology, Systems, and Applications, Hsinchu, Taiwan, April 2006, 1–9.

12 ——, “The rise of the 3rd dimension for system integration,” In Proc of tional Interconnect Technology Conference, Burlingame, CA 5–7 June, 2006, 1–5.

Interna-13 J Joyner, R Venkatesan, P Zarkesh-Ha, J Davis, and J Meindl, “Impact of

three-dimensional architectures on interconnects in gigascale integration,” IEEE actions on Very Large Scale Integration (VLSI) Systems, 9 (Dec 2001) (6): 922–928.

Trans-14 R Marculescu, U Y Ogras, and N H Zamora, “Computation and tion refinement for multiprocessor SoC design: A system-level perspective,” In

communica-Proc of DAC New York: ACM Press, 2004, 564–592.

15 J Duato, S Yalamanchili, and N Lionel, Interconnection Networks: An Engineering Approach San Francisco, CA: Morgan Kaufmann Publishers Inc., 2002.

16 W Dally and B Towles, Principles and Practices of Interconnection Networks.

San Francisco, CA: Morgan Kaufmann Publishers Inc., 2003

Trang 38

17 H G Lee, N Chang, U Y Ogras, and R Marculescu, “On-chip communicationarchitecture exploration: A quantitative evaluation of point-to-point, bus, and

network-on-chip approaches,” ACM Trans Des Autom Electron Syst., 12 (2007)

(3): 23

18 Z Lu, R Thid, M Millberg, E Nilsson, and A Jantsch, “NNSE: Nostrum

network-on-chip simulation environment,” In Proc of SSoCC, April 2005.

19 V Soteriou, N Eisley, H Wang, B Li, and L.-S Peh, “Polaris: A system-level

roadmap for on-chip interconnection networks,” In Proc of ICCD, October 2006.

[Online] Available:http://www.gigascale.org/pubs/930.html

20 M Dall’Osso, G Biccari, L Giovannini, D Bertozzi, and L Benini, “xPipes:

a latency insensitive parameterized network-on-chip architecture for

multi-processor SoCs,” In Proc of ICCD IEEE Computer Society, 2003.

21 Open SystemC Initiative, IEEE Std 1666-2005: IEEE Standard SystemC Language Reference Manual IEEE Computer Society, March 2006.

22 V Puente, J Gregorio, and R Beivide, “SICOSYS: An integrated frameworkfor studying interconnection network performance in multiprocessor systems,”

In Proc of 10th Euromicro Workshop on Parallel, Distributed and Network-Based Processing, 2002, 15–22.

23 V Soteriou, H Wang, and L.-S Peh, “A statistical traffic model for on-chip

inter-connection networks,” In Proc of MASCOTS Washington, DC: IEEE Computer

Society, 2006, 104–116

24 W Heirman, J Dambre, and J V Campenhout, “Synthetic traffic generation as

a tool for dynamic interconnect evaluation,” In Proc of SLIP New York: ACM

Press, 2007, 65–72

25 F Ridruejo and J Miguel-Alonso, “INSEE: An interconnection network

sim-ulation and evaluation environment,” In Proc of Euro-Par Parallel Processing,

3648/2005 Berlin: Springer, 2005, 1014–1023

26 U Y Ogras, J Hu, and R Marculescu, “Key research problems in NoC design:

A holistic perspective,” In Proc of CODES+ISSS, 2005, 69–74.

27 F Li, C Nicopoulos, T Richardson, Y Xie, V Narayanan, and M Kandemir,

“Design and management of 3D chip multiprocessors using

network-in-memory,” In Proc of ISCA Washington, DC: IEEE Computer Society, 2006,

130–141

28 M Koyanagi, H Kurino, K W Lee, K Sakuma, N Miyakawa, and H Itani,

“Future system-on-silicon lsi chips,” IEEE Micro 18 (1998) (4): 17–22.

29 K Lee, T Nakamura, T Ono, Y Yamada, T Mizukusa, H Hashimoto, K Park,

H Kurino, and M Koyanagi, “Three-dimensional shared memory fabricated

using wafer stacking technology,” IEDM Technical Digest, Electron Devices

Meeting (2000) 165–168

30 A Iwata, M Sasaki, T Kikkawa, S Kameda, H Ando, K Kimoto, D Arizono,and H Sunami, “A 3D integration scheme utilizing wireless interconnectionsfor implementing hyper brains,” 2005

31 J Meindl, “Interconnect opportunities for gigascale integration,” IEEE Micro

23 (IEEE Computer Society Press, May/June 2003) (3): 28–35

32 J Joyner, P Zarkesh-Ha, J Davis, and J Meindl, “A three-dimensional stochastic

wire-length distribution for variable separation of strata,” In Proc of the IEEE

2000 International Interconnect Technology Conference IEEE, 2000, 126–128.

33 J Joyner and J Meindl, “Opportunities for reduced power dissipation using

three-dimensional integration,” In Proc of the IEEE 2002 International Interconnect Technology Conference IEEE, 2002, 148–150.

Trang 39

34 P Benkart, A Kaiser, A Munding, M Bschorr, H.-J Pfleiderer, E Kohn,

A Heittmann, H Huebner, and U Ramacher, “3D chip stack technology

using through-chip interconnects,” IEEE Des Test 22 (2005) (6): 512–518.

35 W R Davis, J Wilson, S Mick, J Xu, H Hua, C Mineo, A M Sule, M Steer,and P D Franzon, “Demystifying 3D ICs: The pros and cons of going vertical,”

IEEE Des Test 22 (2005) (6): 498–510.

36 C Ababei, Y Feng, B Goplen, H Mogal, T Zhang, K Bazargan, and S

Sapatnekar, “Placement and routing in 3D integrated circuits,” IEEE Des Test

39 S Im and K Banerjee, “Full chip thermal analysis of planar (2-D) and vertically

integrated (3-D) high performance ICs,” In International Electron Devices Meeting, IEDM Technical Digest., 2000, 727–730.

40 T.-Y Chiang, S Souri, C O Chui, and K Saraswat, “Thermal analysis of

het-erogeneous 3D ICs with various integration scenarios,” In Proc of International Electron Devices Meeting, 2001.

41 K Puttaswamy and G H Loh, “Thermal analysis of a 3D die-stacked

high-performance microprocessor,” In Proc of the 16th ACM Great Lakes Symposium

on VLSI New York: ACM, 2006, 19–24.

42 C Addo-Quaye, “Thermal-aware mapping and placement for 3-D NoC

designs,” In Proc of IEEE SOC, 2005, 25–28.

43 B Goplen and S Sapatnekar, “Thermal via placement in 3D ICs,” In Proc of the

2005 International Symposium on Physical Design ACM, 2005, 167–174.

44 J Cong and Y Zhang, “Thermal via planning for 3-D ICs,” In Proc of the 2005 IEEE/ACM International Conference on Computer-Aided Design Washington, DC:

IEEE Computer Society, 2005, 745–752

45 U Y Ogras and R Marculescu, “Analytical router modeling for

networks-on-chip performance analysis,” In Proc of the Conference on Design, Automation and Test in Europe EDA Consortium, 2007, 1096–1101.

46 P P Pande, C Grecu, M Jones, A Ivanov, and R Saleh, “Performance evaluation

and design trade-offs for networks-on-chip interconnect architectures,” IEEE Trans on Comp., 54 (Aug 2005) (8): 1025–1040.

47 B Feero and P P Pande, “Performance evaluation for three-dimensional

networks-on-chip,” In Proc of ISVLSI, 2007, 305–310.

48 V F Pavlidis and E G Friedman, “3-D topologies for networks-on-chip,” IEEE Trans on VLSI Sys., 15 (2007) (10): 1081–1090.

49 J Kim, C Nicopoulos, D Park, R Das, Y Xie, V Narayanan, M S Yousif, and

C R Das, “A novel dimensionally-decomposed router for on-chip

communi-cation in 3D architectures,” In Proc of ISCA ACM Press, 2007, 138–149.

50 K Siozios, K Sotiriadis, V F Pavlidis, and D Soudris, “Exploring alternative

3D FPGA architectures: Design methodology and CAD tool support,” In Proc.

of FPL, 2007.

51 L M Ni and P K McKinley, “A survey of wormhole routing techniques in

direct networks,” Computer 26 (1993) (2): 62–76.

Trang 40

52 T Ye, L Benini, and G De Micheli, “Analysis of power consumption on switch

fabrics in network routers,” In Proc of DAC (10–14 June) 2002, 524–529.

53 R Reif, A Fan, K.-N Chen, and S Das, “Fabrication technologies for

three-dimensional integrated circuits,” In Proc of International Symposium on Quality Electronic Design (18–21 March) 2002, 33–37.

54 MIT Lincoln Labs, Mitll Low-Power FDSOI CMOS Process Design Guide,

September 2006

55 A W Topol, J D C La Tulipe, L Shi, D J Frank, K Bernstein, S E Steen,

A Kumar, et al., “Three-dimensional integrated circuits,” IBM J Res Dev 50

(2006) (4/5): 491–506

56 A W Topol, J D C La Tulipe, L Shi, D J Frank, K Bernstein, S E Steen,

A Kumar, “Techniques for producing 3D ICs with high-density interconnect,”

In VLSI Multi-Level Interconnection Conference, 2004.

57 S M Alam, R E Jones, S Rauf, and R Chatterjee, “Inter-strata connectioncharacteristics and signal transmission in three-dimensional (3D) integration

technology,” In ISQED ’07: Proceedings of the 8th International Symposium on Quality Electronic Design Washington, DC: IEEE Computer Society, 2007,

580–585

58 G.-M Chiu, “The odd-even turn model for adaptive routing,” IEEE Trans Parallel Distrib Syst 11 (2000) (7): 729–738.

Định dạng
Số trang	364
Dung lượng	4,29 MB