1. Trang chủ
  2. » Giáo Dục - Đào Tạo

ultra low-power electronics and design

290 1,1K 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Ultra Low-Power Electronics and Design
Người hướng dẫn Enrico Macii, Editor
Trường học Politecnico di Torino
Chuyên ngành Electronics and Design
Thể loại Edited volume
Năm xuất bản 2004
Thành phố Milan
Định dạng
Số trang 290
Dung lượng 5,28 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

We have to solve the power issue by a combination of design and process technology innovations; examples of current approaches to power management include multiple transistor thresholds,

Trang 2

ULTRA LOW-POWER ELECTRONICS AND DESIGN

Trang 3

This page intentionally left blank

Trang 4

Ultra Low-Power Electronics and Design

Edited by

Enrico Macii

Politecnico di Torino,

Italy

KLUWER ACADEMIC PUBLISHERS

NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW

Trang 5

eBook ISBN: 1-4020-8076-X

Print ISBN: 1-4020-8075-1

©2004 Springer Science + Business Media, Inc.

Print © 2004 Kluwer Academic Publishers

All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Springer's eBookstore at: http://www.ebooks.kluweronline.com

and the Springer Global Website Online at: http://www.springeronline.com

Dordrecht

Trang 6

CONTRIBUTORS……….VII PREFACE……….……… IX INTRODUCTION………XIII

1 ULTRA-LOW-POWER DESIGN: DEVICE AND LOGIC DESIGN

APPROACHES……….……….1

2 ON-CHIP OPTICAL INTERCONNECT FOR LOW-POWER………21

3 NANOTECHNOLOGIES FOR LOW POWER……….……….40

4 STATIC LEAKAGE REDUCTION THROUGH SIMULTANEOUS

V t /T ox AND STATE ASSIGNMENT……….56

5 ENERGY-EFFICENT SHARED MEMORY ARCHITECTURES FOR

8 ARCHITECTURES AND DESIGN TECHNIQUES FOR ENERGY

EFFICIENT EMBEDDED DSP AND MULTIMEDIA PROCESSING……….….141

9 SOURCE-LEVEL MODELS FOR SOFTWARE POWER OPTIMIZATION… 156

10 TRANSMITTANCE SCALING FOR REDUCING POWER DISSIPATION

OF A BACKLIT TFT-LCD……… 172

Trang 7

11 POWER-AWARE NETWORK SWAPPING FOR WIRELESS PALMTOP PCS……… 198

12 ENERGY EFFICIENT NETWORK-ON-CHIP DESIGN………214

13 SYSTEM LEVEL POWER MODELING AND SIMULATION OF

HIGH-END INDUSTRIAL NETWORK-ON-CHIP……….233

14 ENERGY AWARE ADAPTATIONS FOR END-TO-END VIDEO

STREAMING TO MOBILE HANDHELD DEVICES……….255

Trang 8

Contributors

Trang 9

F Vahid University of California, Riverside

and University of California, Irvine

and K.U.Leuven

Trang 10

Today we are beginning to have to face up to the consequences of the stunning success of Moore’s Law, that astute observation by Intel’s Gordon Moore which predicts that integrated circuit transistor densities will double every 12 to 18 months This observation has now held true for the last 25 years or more, and there are many indications that it will continue to hold true for many years to come This book appears at a time when the first examples of complex circuits in 65nm CMOS technology are beginning to appear, and these products already must take advantage of many of the techniques to be discussed and developed in this book So why then should our increasing success at miniaturization, as evidenced by the success of Moore’s Law, be creating so many new difficulties in power management in circuit designs?

The principal source and the physical origin of the problem lies in the differential scaling rates of the many factors that contribute to power dissipation in an IC – transistor speed/density product goes up faster than the energy per transition comes down, so the power dissipation per unit area increases in a general sense as the technology evolves

Secondly, the “natural” transistor switching speed increase from one generation to the next is becoming downgraded due to the greater parasitic losses in the wiring of the devices The technologists are offsetting this problem to some extent by introducing lower permittivity dielectrics (“low-k”) and lower resistivity conductors (copper) – but nonetheless to get the needed circuit performance, higher speed devices using techniques such as silicon-on-insulator (SOI) substrates, enhanced carrier mobility (“strained silicon”) and higher field (“overdrive”) operation are driving power densities ever upwards In many cases, these new device architectures are increasingly leaky, so static power dissipation becomes a major headache in power management, especially for portable applications

Trang 11

A third factor is system or application driven – having all this integration capability available encourages us to combine many different functional blocks into one system IC This means that in many cases, a large part of the chip’s required functionality will come from software executing on and between multiple on-chip execution units; how the optimum partitioning between hardware architecture and software implementation is obtained is a vast subject, but clearly some implementations will be more energy efficient than others Given that, in many of today’s designs, more than 50% of the total development effort is on the software that runs on the chip, getting this partitioning right in terms of power dissipation can be critical to the success

of (or instrumental in the failure of!) the product

A final motivation comes from the practical and environmental consequences of how we design our chips – state-of-the-art high performance circuits are dissipating up to 100W per square centimeter – we only need 500 square meters of such silicon to soak up the output of a small nuclear power station A related argument, based on battery lifetime, shows that the “converged” mobile phone application combining telephony, data transmission, multimedia and PDA functions that will appear shortly is demanding power at the limit of lithium-ion or even methanol-water fuel cell battery technology We have to solve the power issue by a combination of design and process technology innovations; examples of current approaches

to power management include multiple transistor thresholds, triple gate oxide, dynamic supply voltage adjustment and memory architectures

Multiple transistor thresholds is a technique, practiced for several years now, that allows the designer to use high performance (low Vt) devices where he needs the speed, and low leakage (high Vt) devices elsewhere This benefits both static power consumption (through less sub-threshold leakage) and dynamic power consumption (through lower overall switching currents) High threshold devices can also be used to gate the supplies to different parts

of the circuit, allowing blocks to be put to sleep until needed

Similar to the previous technique, triple gate oxide (TGO) allows circuit partitioning between those parts that need performance and other areas of the circuit that don’t It has the additional benefit of acting on both sub-threshold leakage and gate leakage The third oxide is used for I/O and possibly mixed-signal It is expected over the next few years that the process technologists will eventually replace the traditional silicon dioxide gate dielectric of the CMOS devices by new materials such as rare earth oxides with much higher dielectric constants that will allow the gate leakage problem to be completely suppressed

Trang 12

Dynamic supply voltage adjustment allows the supply voltage to different blocks of the circuit to be adjusted dynamically in response to the immediate performance needs for the block – this very sophisticated technique will take some time to mature

Finally, many, if not most, advanced devices use very large amounts of memory for which the contents may have to be maintained during standby; this consumes a substantial amount of power, either through refreshing dynamic RAM or through the array leakage for static RAM Traditional non-volatile memories have writing times that are orders of magnitude too slow

to allow them to substitute these on-chip memories New developments, such as MRAM, offer the possibility of SRAM-like performance coupled with unlimited endurance and data retention, making them potential candidates to replace the traditional on-chip memories and remove this component of standby power consumption

Most of the approaches to power management described briefly above will be employed in 65nm circuits, but there are a lot more good ideas waiting to be applied to the problem, many of which you will find clearly and concisely explained in this book

Mike Thompson, Philippe Magarshack

STMicroelectronics, Central R&D Crolles, France

Trang 13

This page intentionally left blank

Trang 14

The 2004 edition of the DATE (Design Automation and Test in Europe) conference has devoted an entire Special Focus Day to the power problem and its implications on the design of future electronic systems In particular, keynote presentations and invited talks by outstanding researchers in the field

of low-power design, as well as several technical papers from the regular conference sessions have addressed the difficulties ahead and advanced strategies and principles for achieving ultra low-power design solutions Purpose of this book is to integrate into a single volume a selection of these contributions, duly extended and transformed by the authors into chapters proposing a mix of tutorial material and advanced research results

The manuscript consists of a total of 14 chapters, addressing different aspects

of ultra low-power electronics and design Chapter 1 opens the volume by providing an insight to innovative transistor devices that are capable of operating with a very low threshold voltage, thus contributing to a significant reduction of the dynamic component of power consumption Solutions for limiting leakage power during stand-by mode are also discussed The chapter closes with a quick overview of low-power design techniques applicable at

Chapter 2 focuses on the problem of reducing power in the interconnect network by investigating alternatives to traditional metal wires In fact, according to the 2003 ITRS roadmap, metallic interconnections may not be able to provide enough transmission speed and to keep power under control for the upcoming technology nodes (65nm and below) A possible solution, explored in the chapter, consists of the adoption of optical interconnect networks Two applications are presented: Clock distribution and data communication using wavelength division multiplexing

Trang 15

In Chapter 3, the power consumption problem is faced from the technology point of view by looking at innovative nano-devices, such as single-electron

or few-electron transistors The low-power characteristics and potential of these devices are reviewed in details Other devices, including carbon nano-tube transistors, resonant tunnelling diodes and quantum cellular automata are also treated

Chapter 4 is entirely dedicated to advanced design methodologies for reducing sub-threshold and gate leakage currents in deep-submicron CMOS circuits by properly choosing the states to which gates have to be driven when in stand-by mode, as well as the values of the threshold voltage and of the gate oxide thickness The authors formulate the optimization problem for

and propose both an exact method for its optimal solution and two practical heuristics with reasonable run-time Experimental results obtained on a number of benchmark circuits demonstrate the viability of the proposed methodology

Chapter 5 is concerned with the issue of minimizing power consumption of the memory subsystem in complex, multi-processor systems-on-chip (MPSoCs), such as those employed in multi-media applications The focus is

on design solutions and methods for synthesizing memory architectures containing both single-ported and multi-ported memory banks Power efficiency is achieved by casting the memory partitioning design paradigm to the case of heterogeneous memory structures, in which data need to be accessed in a shared manner by different processing units

Chapter 6 addresses the relevant problem of minimizing the power consumed

by the cache hierarchy of a microprocessor Several design techniques are discussed, including application-driven automatic and dynamic cache parameter tuning, adoption of configurable victim buffers and frequent-value data encoding and compression

Power optimization for parallel, variable-voltage/frequency processors is the subject of Chapter 7 Given a processor with such an architecture, this chapter investigates the energy/performance tradeoffs that can be spanned in parallelizing array-intensive applications, taking into account the possibility that individual processing units can operate at different voltage/frequency levels In assigning voltage levels to processing units, compiler analysis is used to reveal hetherogeneity between the loads of the different units in parallel execution

Trang 16

Chapter 8 provides guidelines for the design and implementation of DSP and multi-media applications onto programmable embedded platforms The RINGS architecture is first introduced, followed by a detailed discussion on power-efficient design of some of the platform components, namely, the DSPs Next, design exploration, co-design and co-simulation challenges are addressed, with the goal of offering to the designers the capability of including into the final architecture the right level of programmability (or reconfigurability) to guarantee the required balance between system performance and power consumption

Chapter 9 targets software power minimization through source code optimization Different classes of code transformations are first reviewed; next, the chapter outlines a flow for the estimation of the effects that the application of such transformations may have on the power consumed by a software application At the core of the estimation methodology there is the development of power models that allow the decoupling of processor-independent analysis from all the aspects that are tightly related to processor architecture and implementation The proposed approach to software power minimization is validated through several experiments conducted on a number of embedded processors for different types of benchmark applications

Reduction of the power consumed by TFT liquid crystal displays, such as those commonly used in consumer electronic products is the subject of Chapter 10 More specifically, techniques for reducing power consumption

of transmissive TFT-LCDs using a cold cathode fluorescent lamp backlight are proposed The rationale behind such techniques is that the transmittance function of the TFT-LCD panel can be adjusted (i.e., scaled) while meeting

an upper bound on a contrast distortion metric Experimental results show that significant power savings can be achieved for still images with very little penalty in image contrast

Chapter 11 addresses the issue of efficiently accessing remote memories from wireless systems This problem is particularly important for devices such as palmtops and PDAs, for which local memory space is at a premium and networked memory access is required to support virtual memory swapping The chapter explores performance and energy of network swapping in comparison with swapping on local microdrives and FLASH memories Results show that remote swapping over power-manageable wireless network interface cards can be more efficient than local swapping and that both energy and performance can be optimized by means of power-aware reshaping of data requests In other words, dummy data accesses can

be preemptively inserted in the source code to reshape page requests in order

to significantly improve the effectiveness of dynamic power management

Trang 17

Chapter 12 focuses on communication architectures for multi-processor SoCs The network-on-chip (NoC) paradigm is reviewed, touching upon several issues related to power optimization of such kinds of communication architectures The analysis goes on a layer-by-layer basis, and particular emphasis is given to customized, domain-specific networks, which represent the most promising scenario for communication-energy minimization in multi-processor platforms

Chapter 13 provides a natural follow up to the theory of NoCs covered in the previous chapter by describing an industrial application of this type of communication architecture In particular, the authors introduce an innovative methodology for automatically generating the power models of a versatile and parametric on-chip communication IP, namely the STBus by STMicroelectronics The methodology is validated on a multi-processor hardware platform including four ARM cores accessing a number of peripheral targets, such as SRAM banks, interrupt slaves and ROM memories

The last contribution, offered in Chapter 14, proposes an integrated end power management approach for mobile video streaming applications that unifies low-level architectural optimizations (e.g., CPU, memory, registers), OS power-saving mechanisms (e.g., dynamic voltage scaling) and adaptive middleware techniques (e.g., admission control, trans-coding, network traffic regulation) Specifically, interaction parameters between the different levels are identified and optimized to achieve a reduction in the power consumption

end-to-Closing this introductory chapter, the editor would like to thank all the authors for their effort in producing their outstanding contributions in a very short time A special thank goes to Mike Thompson and Philippe Magarshack of STMicroelectronics for their keynote presentation at DATE

2004 and for writing the foreword to this book The editor would also like to acknowledge the support offered by Mark De Jongh and the Kluwer staff during the preparation of the final version of the manuscript Last, but not least, the editor is grateful to Agnieszka Furman for taking care of most of the “dirty work” related to book editing, paging and preparation of the camera-ready material

Trang 18

Infineon Technologies AG; 2 Technische Universität München

Abstract Power consumption increasingly is becoming the bottleneck in the design of

ICs in advanced process technologies We give a brief introduction into the major causes of power consumption Then we report on experiments in an advanced process technology with ultra-low threshold voltage (V th ) devices It turns out that in contrast to older process technologies, this approach increasingly is becoming less suitable for industrial usage in advanced process technologies Following, we describe methodologies to reduce power consumption by optimizations in logic design, specifically by utilizing multiple levels of supply voltage Vdd and threshold voltage Vth We evaluate them from an industrial product development perspective We also give a brief outlook to proposals on other levels in the design flow and to future work

Keywords: Low-power design, dynamic power reduction, leakage power reduction,

ultra-low-Vth devices, multi-Vdd, multi-Vth, CVS

1.1 INTRODUCTION

The progress of silicon process technology marches on relentlessly As predicted by Gordon Moore decades ago, silicon process technology continues to achieve improvements at an astonishing pace [1] The number

of transistors that can be integrated on a single IC approximately doubles every 2 years [2,3] This engineering success has created innovative new industries (e.g personal computers and peripherals, consumer electronics) and revolutionized other industries (e.g communications)

Today, however, it is becoming increasingly difficult to achieve improvements at the pace that the industry has become accustomed to More and more technical challenges appear that require increasing resources to be

Trang 19

solved [4] One such problem is the increasing power consumption of integrated circuits It becomes even more critical as an increasing number of today’s high-volume consumer products are battery-powered

In the following, we will consider the sources of power consumption and their development over time We will show why reduction of power consumption increasingly is becoming critical to product success and will review traditional approaches in Sections 1.1 and 1.2 In Section 1.3 we will then analyze a potential solution based on introduction of an optimized

and discuss logic-level design optimizations for power reduction in Section 1.4 Also, we will briefly point out potential optimizations on higher levels Our observations are made from the perspective of industrial IC product development where technical optimizations must be carefully evaluated against the cost associated with achieving and implementing them Mostly, the presented methodologies are already being utilized in leading-edge industrial ICs

Depending on the type of end-product and its application, different aspects of power consumption are the primary concern: dynamic power or leakage power

Reduction of dynamic power consumption is a concern for almost all

IC products today For battery-powered products, reduced power consumption directly results in longer operating time for the product, which

is a very desirable characteristic Even for non-battery-powered products, reduced power consumption brings many advantages, such as reduced cost because of cheaper packaging or higher performance because of lower temperatures Finally, reduced power consumption often leads to lower system cost (no fans required; no or cheaper air conditioning for data / telecom center etc.)

Dynamic power consumption is caused by the charging and discharging

of capacitances when a circuit switches In addition, during switching a short-circuit current flows, but this current is typically much smaller, and will therefore be neglected in the following The dynamic current due to capacitance charging and discharging is determined by the following well-known relationship:

2

Trang 20

Based on constant electrical field scaling, Vdd and CL each are reduced by 30% in each successive process generation Also, delay decreases by 30%, resulting in 43% increase in frequency Therefore, the dynamic power consumption per device is reduced by 50% from one process generation to the next As scaling also doubles the number of devices that can be implemented in a given die area, dynamic power consumption per area should stay roughly identical However, historically frequency has increased

by significantly more than 43% from one process generation to the next (e.g

in microprocessors, it has roughly doubled, due to architectural optimizations, such as deeper pipeline stages), and in addition, die sizes have increased with each new process technology, further increasing the power consumption, due to an increased number of active devices [5] For these reasons, dynamic power consumption has increased exponentially, as is shown in Figure 1-1 for the example of microprocessors

Reduction of leakage power consumption today is primarily a concern

for products that are powered by battery and spend most of their operating hours in some type of standby mode, such as cell phones

For many process generations, however, leakage has increased roughly

by a factor of 10 for every two process nodes [6] Due to this dramatic increase with newer process generations, leakage is becoming a significant contribution to overall IC power consumption even in normal operating mode, as can be seen in Figure 1-1 as well Leakage was estimated to increase from 0.01% of overall power consumption in a 1.0µm technology,

to 10% in a 0.1µm technology [6] For a microprocessor, Intel estimated leakage power consumption at more than 50W for a 100nm technology node[3] This figure probably is extreme, and leakage depends strongly on a

temperature T) Nevertheless, for an increasing number of products leakage power consumption is turning into a problem, even when they are not battery-powered

Trang 21

Figure 1-1 Development of dynamic and leakage power consumption over time [3,7]

the key levers to reduce dynamic power:

• Reduce operating frequency

• Reduce driven capacity

• Reduce supply voltage

has the side effect of reducing performance as well, primarily because gate

Trang 22

overdrive (the difference between Vdd and Vth) diminishes if the threshold

( )α

th dd

dd L

d

V V

V C

t

=

1.0V, the reductions in gate overdrive are more pronounced than previously

In addition, newer process technologies give significantly less of a performance boost compared to the previous process generation than has traditionally been the case, therefore a further reduction in performance is highly undesirable Finally, the power reduction achieved by moving to a new process generation has trended down over time, since supply voltages have been scaled by increasingly less than the 30% prescribed by the constant electrical field scaling paradigm

Consequently, more advanced approaches are required

In the following, our main focus will be on dynamic power consumption, but we will also consider leakage power consumption

1.4 ZERO-VTH DEVICES

overcomes the diminishing gate overdrive by radically setting the threshold voltage of the active devices to zero It has been shown [9], that the optimum

the devices will never completely switch off But from an overall power perspective the gain in active power consumption is tremendous

Using these transistors the supply voltage of 130nm circuits can be

performance degradation Alternatively, the circuit can be operated at twice the clock frequency when keeping the supply voltage at 1.2V, as shown in

the complete circuits are switched-off or are set into a low leakage mode to cope with the very high leakage contribution The low leakage mode is achieved by ‘active well’ control, which denotes the use of the body effect

Trang 23

reverse back biasing: a negative well-to-source voltage Usb is used

be generated Furthermore, active well is required to compensate the

below 40°C For some high-end computer equipment the costs for active chip cooling are affordable to achieve this junction temperature But this is definitely not the case for cost-driven consumer products For this

in some applications the specified worst-case ambient temperature is even

changes and adaptations

Figure 1-2 Simulated performance curves of transistors with ultra-low Vth Compared to

low-V th , either a performance gain or a V dd reduction can be achieved Curves for reg-V th and

high-V th transistors of a 130 nm technology are included

device with about 150mV threshold voltage proved to be the best

Trang 24

compromise between zero-Vth and current low-Vth of about 300mV within a

130 nm CMOS technology

is shown for a high activity circuit (ı= 20%) with various options for the

for the other transistor options were reduced to meet that reference performance

Figure 1-3 Power dissipation at T=125°C in active mode for several transistor options with

reduced V th A minimum power consumption is achieved at 150mV V th (At T=55°C the minimum is achieved for the same option but process variations show less impact)

The reduced supply voltage leads to lower overall active power

a rule of thumb a 100mV reduction of the threshold voltage allows for a V

Trang 25

reduction by § 0.15V but on the other hand results in a tenfold increase of

the leakage current From Figure 1-3 also the impact of technology

variations is visible Due to the high leakage contribution a power reduction

of only 25% is achieved under fast process conditions Using back biasing in

reverse mode, the high performance of fast transistors can be reduced

decreases and allows a power reduction by 50% (stippled arrow)

A process modification has been developed to manufacture devices with

the threshold voltage of 150 mV, which proves to be the most efficient for

the target application domain of mobile consumer products [10] In Table1-1

the key transistor parameters of our ultra-low-Vth FETs (ulv) and of the

which translates into an average decrease of the CV/I-metric delay by 29%

Table 1-1 Extracted key parameters of the ulv-FETSs in comparison with the target values

and the low- V th FETs

NFET / PFET

130nm ulv-FET NFET / PFET

compensation, back biasing has also to be used to compensate for this strong

technology variation

Trang 26

The values of the body effect are also included in Table 1-1 The body

decrease of body effect in combination with the increased roll-off reduces the leverage of back biasing for ulv-FETs very significantly The leverage is not even sufficient to compensate the technology variation, since the value

of the roll-off is higher than that of the body effect As an example, the NFET shows roll-off values of 65mV/10nm and 100mV/15nm and a body effect of only 60mV/V

ulv-To investigate the migration potential of the ulv-FETs for future

90nm hardware, were used Based on this measurement data the leverage of

been analyzed For supply voltages of 1.2V and 0.75V a reverse back biasing voltage of 0.5V has been applied For the NFET, the back biasing results in a leakage reduction by 50% to 70% for all transistor widths and for both

similar (60% to 80%) for transistors with W> 0.5µm For very narrow

narrow FETs are used within SRAMs, which contribute a major part of the circuit’s standby current, this small reduction for narrow transistors in addition reduces significantly the leverage of active well The root cause is

an additional leakage mechanism based on tunnelling currents across the drain-well junction, which limits the reverse back biasing to 0.5V This tunnelling current depends exponentially on the drain-well voltage and is working against any reduction of the sub-threshold current via active well

is therefore lower In this case the effect of back biasing is not compensated

by a rising tunnelling current and a leakage current reduction by 70% is still achieved

For a 90nm technology the limit of 0.5V for the well potential swing limits the reduction of the leakage currents to a factor between 2 and 4 This

is still a major contribution of all feasible measures to reduce standby power consumption, but the leverage becomes quite small compared to the reduction ratios of several orders of magnitude obtained in previous

This is due to the ever decreasing gate oxide thickness and also due to the

by well biasing reducing the leverage of active well even further

Trang 27

In summary the zero-Vth-devices have become very susceptible to process and temperature variations Significant yield is only achievable with back biasing via active well control and with active cooling The latter approach is not feasible for mobile applications Therefore a more

150mV threshold voltage proved to be the best compromise between

affects some standard methods to overcome short-channel effects The so called halo- or pocket-implantation had to be removed to bring the threshold voltage down Unfortunately short-channel effects are now heavily

of the channel length Finally this effect was prohibitive for the overall

For leading-edge products which need to optimize both power consumption and system performance, optimization techniques on architecture and design level have been proposed and partly already been implemented While academic research often focuses on the tradeoff between power consumption and performance, industrial product development must also take other variables into consideration

• Product cost: often, power optimization design techniques increase die area, directly affecting manufacturing cost Also, utilization of additional

consequently manufacturing cost, and additionally requires up-front expenditures for the development of such devices Finally, increased manufacturing complexity poses the risk of lowered manufacturing yield

• Product robustness: it must be ensured that optimized products still work across the specified range of operating conditions, also taking manufacturing variations into account

Trang 28

1.5.1 Multi-V dd Design

preferred option to reduce dynamic power consumption However, as

check by the need to maintain performance

design Most effective regarding power reduction, and also easiest to

performance of the IC design, this often is not an option On a lower

rather simple to implement, but if only modules are chosen such that overall

IC performance is not impacted, the achieved gains in power reduction will often be very moderate

Finally, a reduction in supply voltage can be applied specifically to individual gates, such that the overall system performance is not reduced This approach, as shown in Figure 1-4, recognizes that in a typical design, most logic paths are not critical They can be slowed down, often significantly, without reducing the overall system performance This slowing

non-critical paths, which results in lowered power consumption

Trang 29

Figure 1-4 Multi-Vdd design

This technique will modify the distribution of path delays in a design to a distribution skewed towards paths with higher delay, as indicated Figure 1-5 [14]

Figure 1-5 Distribution of path delays under single and multiple supply voltages

Non-critical path runs with reduced supply voltage

Trang 30

A number of studies have shown significant variation in dynamic power

from less than 10% up to almost 50%, with 40% being the average [15,16] Rules of thumb for selecting appropriate supply voltage levels have been

The benefit of using multiple supply voltages quickly saturates The

this to ever more supply voltage levels yields only small incremental benefits [18,19], even when the overhead introduced by multiple supply voltages (see below) is not taken into consideration

The power reduction achieved by this technique roughly depends on two

is applied

Regarding the first parameter, it has been pointed out some years ago that the leverage of this concept decreases as process technologies are scaled down further [18]

devices, which are essential for low standby power design due to their lower

system performance is greatly reduced It is shown that from 0.25µm down

introduction of variable threshold voltages, as will be seen later

Regarding the second parameter, experience has shown that especially in

skewed to higher delays already, thus reducing the number of gates that can

be slowed down further [14]

For the selection of those gates which will receive the lower supply

is the concept of clustered voltage scaling (CVS) It recognizes that it is desirable to have clusters of gates assigned to the same voltage, since

This concept has been enhanced by extended clustered voltage scaling (ECVS)[17] which essentially allows an arbitrary assignment of supply

Trang 31

voltage levels to gates This strategy implies more frequent insertion of level shifters into the design However, usually only power consumption and delay are considered in the literature The additional area cost is neglected

In industry, this certainly is not feasible

poses a number of challenges

dc-to-dc converter, unless the voltage already exists externally This results

in area overhead, and in power consumption for the converter

• Level-shifters are required between different supply domains It is feasible to integrate level shifters into flip-flops [21]

The penalties in area, power consumption and delay resulting from these effects are not always taken into account by work published in the literature Studies indicate that a 10% area overhead will result from implementing a

An additional consideration for industrial IC product development is that

rudimentary It is not sufficient to have a single point tool which can perform power-performance tradeoffs Instead, this methodology needs to encompass the entire design flow (e.g power distribution in layout; automated insertion

of level shifters etc.)

1.5.2 Multi-V th Design

Another essential technique is the use of different transistor threshold

consumption, thus increasing standby time of battery-powered ICs As leakage power consumption becomes an increasingly important component

of overall power consumption in modern process technologies, this technique increasingly also helps to reduce overall power consumption significantly, as design moves to more advanced process technologies The

performance are implemented with special leakage-reduced transistors

in Figure 1-6

Trang 32

Figure 1-6 Multi-Vth design

A typical industrial approach today is to first create a design using lower

to reduce leakage

Studies in the literature have reported reductions in leakage of around

provided by the process technology (through doping variations) and propose

performance is not compromised [23, 24] Recently, it has also been

gate oxide thickness Tox [25]

Design-tool support for this technique is also rudimentary at best While

it is becoming established to design different modules of an IC with different

transistors within a module The primary reason is that the entire design flow must be able to handle cells with identical functionality and size, which differ in their electrical properties This poses no principal algorithmic problems, but must be consistently implemented in all EDA tools within a design flow

high Vthigh V

t

Trang 33

1.5.3 Hybrid Approaches

Recently approaches have been suggested in the literature which combine implementation of multiple supply voltages and multiple threshold voltages for further power reduction Especially for designs where minimization of total power consumption is key (as compared to e.g minimization of standby power for mobile products), it is possible to trade off leakage and dynamic

literature indicate a total power optimum when leakage power contributes 10% to 30% [26,12] This ratio depends significantly on the process technology, operating environment, and clock frequency of a design

For applications where leakage power minimization is critical (e.g mobile products), this approach usually is not feasible, as it requires a

With the increasing significance of gate leakage currents, variations of gate oxide thickness Tox have also been proposed

An overall framework for using two supply voltages and two threshold voltages as well has been presented [19] Theoretically, it is shown that more than 60% of total power consumption can be saved this way (not considering required overhead such as level shifters, routing etc.) Rules of

performance

This approach has been applied to the practical example of an ARM processor in [27] Due to specific layout considerations it was not possible to

different libraries were implemented Using a CVS algorithm, a reduction in dynamic power by 15% was achieved for a 0.18µm process technology Leakage power was reduced by 40% As leakage power was more than 1000x smaller than dynamic power, overall active power reduction was 15% To achieve this, a 14% increase in area was required

A very recent approach considers also transistor width sizing in addition

approach, total power savings of 37% on average over a suite of benchmark circuits are reported In this study, the threshold voltage is chosen rather low,

so that leakage represents 20-50% of total power consumption Therefore, optimization of both leakage and dynamic power consumption is essential, which is achieved with the presented approach

Trang 34

An enhanced approach for leakage power consumption considers multiple gate oxide thicknesses Tox in addition to multi-Vth [29] It is motivated by the fact that gate leakage increases very dramatically with newer process technologies Gate leakage is of the same order of magnitude

as subthreshold leakage at the 90nm process node Their relationship also depends significantly on the operating temperature T The key observation that an OFF transistor suffers from subthreshold leakage, an ON transistor from gate leakage, motivates the approach to analyze transistor states in

is minimized Leakage reductions of 5-6x are obtained on benchmark

Previous approaches that included Tox into the optimization varied Tox only for different design modules, not on critical paths within modules These newer approaches promise further reductions in power consumption This will come, however, at a price (as seen e.g in the ARM example) Design complexity increases significantly when variations in many parameters are made available at the same time In some studies, the resulting overhead is not considered

1.5.4 Cost Tradeoffs

This overhead must be considered, however, since it is quite significant:

additional supply voltages (area)

• Multi-Tox: additional masks (manufacturing costs)

• In addition, IC development costs increase due to more complex design

qualified and continuously monitored For each such option, the design library must be electrically characterized, modelled for all EDA tools, and potentially optimized regarding circuit design and layout It must be maintained and regularly updated (changes in electrical parameters, changes in tools in the design flow) over a long period of time as well If

a very specialized manufacturing flow is developed to fully optimize a given product, it will be very difficult to shift manufacturing of this product to a different fab (e.g a foundry in case additional capacity is required)

For these and potentially other reasons, we are not yet aware of industrial products that have implemented such proposals in a fine-grained manner (i.e

Trang 35

Some approaches in the literature also determine optimum levels of threshold voltages depending on a given design In industry, this is rarely feasible Typically, a manufacturing process has to be taken as given, with

LEVELS

The approaches outlined above on gate level and device level can be (and often must be) supported by measures on higher levels of abstraction

Some of the most promising concepts are as follows:

• partitioning the system such that large areas can be powered off for significant periods of time (block turnoff)

• especially partitioning memory systems such that large parts can be turned off in standby mode

• clock gating is an essential method which reduces dynamic power consumption by local off-switching of non-active gates

• coding strategies (e.g for buses) can reduce switching and thus dynamic power consumption

There is no single “silver bullet” to solve the challenge of power

devices is a conceptually very convincing concept, its widespread implementation is hindered by manufacturing concerns An extrapolation of current technology trends indicates that such a concept will become even more difficult in the future

Today, design techniques are the most promising approach to reduce power – both dynamic and leakage

The concepts outlined here can be further extended It is feasible to dynamically adjust supply and threshold voltages These are theoretically promising concepts which however still require more investigation especially with regard to feasibility under industrial boundary conditions Quite likely, in the future even more emphasis than today will have to be placed on power reduction schemes on algorithmic and system level On these levels, the levers to reduce power consumption are largest

Acknowledgement

The authors wish to acknowledge and thank Jörg Berthold and Tim Schönauer for their contributions and fruitful discussions

Trang 36

[4] U Schlichtmann, Systems are Made from Transistors: UDSM Technology Creates New Challenges for Library and IC Development, IEEE Euromicro Symposium on Digital System Design, 2002, pp 1-2

[5] S Borkar, Design Challenges of Technology Scaling, IEEE Micro, July/August 1999, pp 23-29.

[6] S Thompson, P Packan, and M Bohr, MOS Scaling: Transistor Challenges for the 21st Century, Intel Technology Journal, Q3 1998

[7] N Kim et al., Leakage Current: Moore's Law Meets Static Power, IEEE Computer, Vol

[15] K Usami, M Igarashi, Low-Power Design Methodology and Applications utilizing Dual Supply Voltages, Proceedings of the Asia and South Pacific Design Automation Conference 2000, pp 123-128

[16] M Donno, L Macchiarulo, A Macii, E Macii, M Poncino, Enhanced Clustered Voltage Scaling for Low Power, Proceedings of the 12th ACM Great Lakes Symposium

on VLSI, 2002, pp 18-23

[17] K Usami et al., Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media Processor, IEEE Journal of Solid-State Circuits, Vol 33, No 3, March 1998, pp 463-472

[18] M Hamada, Y Ootaguro, T Kuroda, Utilizing Surplus Timing for Power Reduction, Proceedings IEEE Custom Integrated Circuits Conference CICC, 2001, pp 89-92 [19] A Srivastava, D Sylvester, Minimizing Total Power by Simultaneous Vdd/Vth Assignment, Proceedings of the Asia and South Pacific Design Automation Conference

2003, pp 400-403

[20] K Usami, M Horowitz, Clustered Voltage Scaling Technique for Low-Power Design, Proceedings of the International Symposium on Low Power Design ISLPD, 1995, pp 3- 8.

Trang 37

[21] K Usami et al., Design Methodology of Ultra Low-power MPEG4 Codec Core Exploiting Voltage Scaling Techniques, Proceedings of the 35th Design Automation Conference 1998, pp 483-488

[22] C Yeh, Y.-S Kang, Layout Techniques Supporting the Use of Dual Supply Voltages for Cell-Based Designs, Proceedings of the 36th Design Automation Conference 1999, pp 62-67.

[23] Q Wang, S Vrudhula, Algorithms for Minimizing Standby Power in Deep Submicrometer, Dual-Vt CMOS Circuits, IEEE Transactions on CAD, Vol 21, No 3, March 2002, pp 306/318

[24] L Wei, Z Chen, K Roy, M Johnson, Y Ye, V De, Design and Optimization of Threshold Circuits for Low-Voltage Low-Power Applications, IEEE Transactions on Very Large Scale Integration (VLSI), Vol 7, No 1, March 1999, pp 16-24

Dual-[25] N Sirisantana, K Roy, Low-Power Design Using Multiple Channel Lengths and Oxide Thicknesses, IEEE Design & Test of Computers, January-February 2004, pp 56-63 [26] K Nose, T Sakurai, Optimization of V DD and V TH for Low-Power and High-Speed Applications, Proceedings of the Asia and South Pacific Design Automation Conference

2000, pp 469-474

[27] R Bai, S Kulkarni, W Kwong, A Srivastava, D Sylvester, D Blaauw, An Implementation of a 32-bit ARM Processor Using Dual Power Supplies and Dual Threshold Voltages, IEEE International Symposium on VLSI, 2003, pp 149-154 [28] A Srivastava, D Sylvester, D Blaauw, Concurrent Sizing, Vdd and V th Assignment for Low-Power Design, Proceedings of the Design, Automation and Test in Europe Conference DATE, 2003, pp 718-719

[29] D Lee, H Deogun, D Blaauw, D Sylvester, Simultaneous State, Vt and Tox Assignment for Total Standby Power Minimization, Proceedings of the Design, Automation and Test in Europe Conference DATE, 2003, pp 494-499

Trang 38

Chapter 2

ON-CHIP OPTICAL INTERCONNECT FOR

LOW-POWER

Ian O’Connor and Fr´ed´eric Gaffiot

Ecole Centrale de Lyon

Abstract It is an accepted fact that process scaling and operating frequency both contribute

to increasing integrated circuit power dissipation due to interconnect ing this trend leads to a red brick wall which only radically different interconnect architectures and/or technologies will be able to overcome The aim of this chap- ter is to explain how, by exploiting recent advances in integrated optical devices, optical interconnect within systems on chip can be realised We describe our vision for heterogeneous integration of a photonic “above-IC" communication layer Two applications are detailed: clock distribution and data communication using wavelength division multiplexing For the first application, a design method will be described, enabling quantitative comparisons with electrical clock trees For the second, more long-term, application, our views will be given on the use

Extrapolat-of various photonic devices to realize a network on chip that is reconfigurable in terms of the wavelength used.

Keywords: Interconnect technology, optical interconnect, optical network on chip

In the 2003 edition of the ITRS roadmap [17], the interconnect problem wassummarised thus: “For the long term, material innovation with traditional scal-ing will no longer satisfy performance requirements Interconnect innovationwith optical, RF, or vertical integration will deliver the solution” Continu-ally shrinking feature sizes, higher clock frequencies, and growth in complexityare all negative factors as far as switching charges on metallic interconnect isconcerned Even with low resistance metals such as copper and low dielectricconstant materials, bandwidths for long interconnect will be insufficient for fu-ture operating frequencies Already the use of metal tracks to transport a signalover a chip has a high cost in terms of power: clock distribution for instance

Trang 39

requires a significant part (30-50%) of total chip power in high-performancemicroprocessors.

A promising approach to the interconnect problem is the use of an opticalinterconnect layer, which could empower an increase in the ratio between datarate and power dissipation At the same time it would enable synchronous op-eration within the circuit and with other circuits, relax constraints on thermaldissipation and sensitivity, signal interference and distortion, and also free uprouting resources for complex systems However, this comes at a price Firstly,

high-speed and low-power interface circuits are required, design of which is

not easy and has a direct influence on the overall performance of optical connect Another important constraint is the fact that all fabrication steps have

inter-to be compatible with future IC technology and also that the additional costincurred remains affordable Additionally, predictive design technology is re-quired to quantify the performance gain of optical interconnect solutions, whereinformation is scant and disparate concerning not only the optical technology,but also the CMOS technologies for which optics could be used (post-45nmnode)

In section 2.2, we will describe the “above-IC” optical technology Sections2.3 and 2.4 describe an optical clock distribution network and a quantitativeelectrical-optical power comparison respectively A proposal for a novel opticalnetwork on chip in discussed in section 2.5

Various technological solutions may be proposed for integrating an opticaltransport layer in a standard CMOS system In our opinion, the most promisingapproach makes use of hybrid (3D) integration of the optical layer above acomplete CMOS IC, as shown in fig 2.1 The basic CMOS process remainsthe same, since the optical layer can be fabricated independently The weakness

of this approach is in the complex electrical link between the CMOS interfacecircuits and the optical sources (via stack and advanced bonding)

In the system shown in fig 2.1, a CMOS source driver circuit modulatesthe current flowing through a biased III-V microsource through a via stackmaking the electrical connection between the CMOS devices and the opticallayer III-V active devices are chosen in preference to Si-based optical devicesfor high-speed and high-wavelength operation The microsource is coupled to

silicon technology and silicon is an excellent material for transmitting

dB/cm has been demonstrated [10]) The waveguide structure transports theoptical signal to a III-V photodetector (or possibly to several, as in the case of

Trang 40

circuit

receiver circuit

electrical

contact

III−V photodetector III−V

laser source

Si photonic waveguide (n=3.5)

SiO2 waveguide cladding (n=1.5)

CMOS IC

Figure 2.1. Cross-section of hybridised interconnection structure

a broadcast function) where it is converted to an electrical photocurrent, whichflows through another via stack to a CMOS receiver circuit which regeneratesthe digital output signal This signal can then if necessary be distributed over asmall zone by a local electrical interconnect network

NETWORK

In this section we present the structure of the optical clock distribution work, and detail the characteristics of each component part in the system: ac-tive optoelectronic devices (external VCSEL source and PIN detector), passivewaveguides, interface (driver and receiver) circuits The latter represent ex-tremely critical parts to the operation of the overall link and require particularlycareful design

net-An optical clock distribution network, shown in fig 2.2, requires a singlephotonic source coupled to a symmetrical waveguide structure routing to anumber of optical receivers At the receivers the high-speed optical signal isconverted to an electrical one and provided to local electrical networks Hencethe primary tree is optical, while the secondary tree is electrical It is not feasible

to route the optical signal all the way down to the individual gate level sinceeach drop point requires a receiver circuit which consumes area and power.The clock signal is thus routed optically to a number of drop points which willcover a zone over which the last part of the clock distribution will be carried out

Ngày đăng: 01/06/2014, 11:43

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[1] P. Aldworth, “System-on-a-Chip Bus Architecture for Embedded Applications,” IEEE International Conference on Computer Design, pp. 297-298, 1999 Sách, tạp chí
Tiêu đề: System-on-a-Chip Bus Architecture for Embedded Applications
Tác giả: P. Aldworth
Nhà XB: IEEE International Conference on Computer Design
Năm: 1999
[2] “ × pipes: a Latency Insensitive Parameterized Network-on-chip A rchitecture For Multi-Processor SoCs”, M.Dall’Osso, G.Biccari, L.Giovannini, D.Bertozzi, L.Benini, Int. Conf. on Computer Design, pp.536-541, October 2003 Sách, tạp chí
Tiêu đề: × pipes: a Latency Insensitive Parameterized Network-on-chip A rchitecture For Multi-Processor SoCs
Tác giả: M.Dall’Osso, G.Biccari, L.Giovannini, D.Bertozzi, L.Benini
Nhà XB: Int. Conf. on Computer Design
Năm: 2003
[3] E.Bolotin, I.Cidon, R.Ginosar, A.Kolodny, "QNoC: QoS architecture and design process for Network on Chip", Journal on Systems Architecture, Special Issue on Networks on Chip, December 2003 Sách, tạp chí
Tiêu đề: QNoC: QoS architecture and design process for Network on Chip
Tác giả: E.Bolotin, I.Cidon, R.Ginosar, A.Kolodny
Nhà XB: Journal on Systems Architecture
Năm: 2003
[4] I.Saastamoinen, D.S.Tortosa, J.Nurmi, "Interconnect IP Node for Future System-on-Chip Designs", IEEE Int. Work. on Electronic Design, Test and Applications, pp.116-120, January 2002 Sách, tạp chí
Tiêu đề: Interconnect IP Node for Future System-on-Chip Designs
[5] C.T. Hsieh, M. Pedram, ”Architectural Energy Optimization by Bus Splitting,” IEEE Trans. CAD, Vol.21, issue 4, pp.408-414, April 2002 Sách, tạp chí
Tiêu đề: IEEE Trans. CAD
[7] S.Murali, G.De Micheli, "Bandwidth-Constrained Mapping of Cores onto NoC Architectures", De- sign Automation and Testing in Europe, 2004, pp.20896-20901 Sách, tạp chí
Tiêu đề: Bandwidth-Constrained Mapping of Cores onto NoC Architectures
[8] L.Bisdounis, C.Dre, S.Blionas, D.Metafas, A.Tatsaki, F.Ieromninon, E.Macii, P.Rouzet, R.Zafalon, L.Benini "Low-Power System-on-Chip Architecture for Wireless LANs," IEE Proc.-Comput. Digit.Tech., Vol.151, no1, January 2004 Sách, tạp chí
Tiêu đề: Low-Power System-on-Chip Architecture for Wireless LANs
Tác giả: L.Bisdounis, C.Dre, S.Blionas, D.Metafas, A.Tatsaki, F.Ieromninon, E.Macii, P.Rouzet, R.Zafalon, L.Benini
Nhà XB: IEE Proc.-Comput. Digit.Tech.
Năm: 2004
[9] K. Lee, S.J. Lee, S.E. Kim, H.M. Choi, D. Kim, S. Kim, M.W. Lee, H.J. Yoo, "A 51mW 1.6GHz On-Chip Network for Low Power Heterogeneous SoC Platform", IEEE Int.Solid-State Circuits Con- ference, pp.1-3, 2004 Sách, tạp chí
Tiêu đề: A 51mW 1.6GHz On-Chip Network for Low Power Heterogeneous SoC Platform
Tác giả: K. Lee, S.J. Lee, S.E. Kim, H.M. Choi, D. Kim, S. Kim, M.W. Lee, H.J. Yoo
Nhà XB: IEEE Int.Solid-State Circuits Conference
Năm: 2004
[10] S.J. Lee et al., "An 800MHz Star-Connected On-Chip Network for Application to Systems on a Chip", IEEE Int.Solid-State Circuits Conference, pp.468-469, February 2003 Sách, tạp chí
Tiêu đề: An 800MHz Star-Connected On-Chip Network for Application to Systems on a Chip
Tác giả: S.J. Lee, et al
Nhà XB: IEEE Int.Solid-State Circuits Conference
Năm: 2003
[11] W. Bainbridge, S. Furber, “Delay insensitive system-on-chip interconnect using 1-of-4 data encod- ing,” IEEE International Symposium on Asynchronous Circuits and Systems, pp. 118-126, 2001 Sách, tạp chí
Tiêu đề: Delay insensitive system-on-chip interconnect using 1-of-4 data encoding
Tác giả: W. Bainbridge, S. Furber
Nhà XB: IEEE International Symposium on Asynchronous Circuits and Systems
Năm: 2001
[12] D. Bertozzi, L. Benini and G. De Micheli, “Low-Power Error-Resilient Encoding for On-chip Data Busses,” DATE, International Conference on Design and Test Europe Paris, 2000, pp. 102-109 Sách, tạp chí
Tiêu đề: Low-Power Error-Resilient Encoding for On-chip Data Busses
Tác giả: D. Bertozzi, L. Benini, G. De Micheli
Nhà XB: DATE, International Conference on Design and Test Europe
Năm: 2000
[13] Dally, W.; Towles, B.; “Route Packets, Not Wires: On-Chip Interconnection Networks” 38th Design Automation Conference, 2001. Proceedings Sách, tạp chí
Tiêu đề: Route Packets, Not Wires: On-Chip Interconnection Networks”"38th Design
[14] B. Cordan, “An efficient bus architecture for system-on-chip design,” IEEE Custom Integrated Cir- cuits Conference, pp. 623–626, 1999 Sách, tạp chí
Tiêu đề: An efficient bus architecture for system-on-chip design,”"IEEE Custom Integrated Cir-"cuits Conference
[15] Dally, W.J; Aoki, H. “Deadlock -free adaptive routing in multicomputer networks using virtual channels” IEEE Trans. on Parallel and Distributed Systems, April 1993 Sách, tạp chí
Tiêu đề: Deadlock -free adaptive routing in multicomputer networks using virtualchannels”
[16] W. Dally and J. Poulton, Digital Systems Engineering, Cambridge University Press, 1998 Sách, tạp chí
Tiêu đề: Digital Systems Engineering
[17] J. Duato, S. Yalamanchili, L. Ni, Interconnection Networks: an Engineering Approach. IEEE Com- puter Society Press, 1997 Sách, tạp chí
Tiêu đề: Interconnection Networks: an Engineering Approach
Tác giả: J. Duato, S. Yalamanchili, L. Ni
Nhà XB: IEEE Computer Society Press
Năm: 1997
[18] R. Hegde, N. Shanbhag, “Toward Achieving Energy Efficiency in Presence of Deep Submicron Noise,” IEEE Transactions on VLSI Systems, pp. 379–391, vol. 8, no. 4, August 2000 Sách, tạp chí
Tiêu đề: Toward Achieving Energy Efficiency in Presence of Deep SubmicronNoise,”"IEEE Transactions on VLSI Systems
[19] R. Hegde, N. Shanbhag, “Toward achieving energy efficiency in presence of deep submicron noise,”IEEE Transactions on VLSI Systems, pp. 379–391, vol. 8, no. 4, August 2000 Sách, tạp chí
Tiêu đề: Toward achieving energy efficiency in presence of deep submicron noise,”"IEEE Transactions on VLSI Systems
[33] IBM CoreConnect bus architecture,”http://www-3.ibm.com/chips/products/coreconnect” Link
[34] AMBA Multi-Layer AHB and AHB-Lite,”http://www.arm.com/products/solutions/AMBAAHBandLite.html” Link

TỪ KHÓA LIÊN QUAN

w