We have to solve the power issue by a combination of design and process technology innovations; examples of current approaches to power management include multiple transistor thresholds,
Trang 2ULTRA LOW-POWER ELECTRONICS AND DESIGN
Trang 3This page intentionally left blank
Trang 4Ultra Low-Power Electronics and Design
Edited by
Enrico Macii
Politecnico di Torino,
Italy
KLUWER ACADEMIC PUBLISHERS
NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
Trang 5eBook ISBN: 1-4020-8076-X
Print ISBN: 1-4020-8075-1
©2004 Springer Science + Business Media, Inc.
Print © 2004 Kluwer Academic Publishers
All rights reserved
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher
Created in the United States of America
Visit Springer's eBookstore at: http://www.ebooks.kluweronline.com
and the Springer Global Website Online at: http://www.springeronline.com
Dordrecht
Trang 6CONTRIBUTORS……….VII PREFACE……….……… IX INTRODUCTION………XIII
1 ULTRA-LOW-POWER DESIGN: DEVICE AND LOGIC DESIGN
APPROACHES……….……….1
2 ON-CHIP OPTICAL INTERCONNECT FOR LOW-POWER………21
3 NANOTECHNOLOGIES FOR LOW POWER……….……….40
4 STATIC LEAKAGE REDUCTION THROUGH SIMULTANEOUS
V t /T ox AND STATE ASSIGNMENT……….56
5 ENERGY-EFFICENT SHARED MEMORY ARCHITECTURES FOR
8 ARCHITECTURES AND DESIGN TECHNIQUES FOR ENERGY
EFFICIENT EMBEDDED DSP AND MULTIMEDIA PROCESSING……….….141
9 SOURCE-LEVEL MODELS FOR SOFTWARE POWER OPTIMIZATION… 156
10 TRANSMITTANCE SCALING FOR REDUCING POWER DISSIPATION
OF A BACKLIT TFT-LCD……… 172
Trang 711 POWER-AWARE NETWORK SWAPPING FOR WIRELESS PALMTOP PCS……… 198
12 ENERGY EFFICIENT NETWORK-ON-CHIP DESIGN………214
13 SYSTEM LEVEL POWER MODELING AND SIMULATION OF
HIGH-END INDUSTRIAL NETWORK-ON-CHIP……….233
14 ENERGY AWARE ADAPTATIONS FOR END-TO-END VIDEO
STREAMING TO MOBILE HANDHELD DEVICES……….255
Trang 8Contributors
Trang 9F Vahid University of California, Riverside
and University of California, Irvine
and K.U.Leuven
Trang 10Today we are beginning to have to face up to the consequences of the stunning success of Moore’s Law, that astute observation by Intel’s Gordon Moore which predicts that integrated circuit transistor densities will double every 12 to 18 months This observation has now held true for the last 25 years or more, and there are many indications that it will continue to hold true for many years to come This book appears at a time when the first examples of complex circuits in 65nm CMOS technology are beginning to appear, and these products already must take advantage of many of the techniques to be discussed and developed in this book So why then should our increasing success at miniaturization, as evidenced by the success of Moore’s Law, be creating so many new difficulties in power management in circuit designs?
The principal source and the physical origin of the problem lies in the differential scaling rates of the many factors that contribute to power dissipation in an IC – transistor speed/density product goes up faster than the energy per transition comes down, so the power dissipation per unit area increases in a general sense as the technology evolves
Secondly, the “natural” transistor switching speed increase from one generation to the next is becoming downgraded due to the greater parasitic losses in the wiring of the devices The technologists are offsetting this problem to some extent by introducing lower permittivity dielectrics (“low-k”) and lower resistivity conductors (copper) – but nonetheless to get the needed circuit performance, higher speed devices using techniques such as silicon-on-insulator (SOI) substrates, enhanced carrier mobility (“strained silicon”) and higher field (“overdrive”) operation are driving power densities ever upwards In many cases, these new device architectures are increasingly leaky, so static power dissipation becomes a major headache in power management, especially for portable applications
Trang 11A third factor is system or application driven – having all this integration capability available encourages us to combine many different functional blocks into one system IC This means that in many cases, a large part of the chip’s required functionality will come from software executing on and between multiple on-chip execution units; how the optimum partitioning between hardware architecture and software implementation is obtained is a vast subject, but clearly some implementations will be more energy efficient than others Given that, in many of today’s designs, more than 50% of the total development effort is on the software that runs on the chip, getting this partitioning right in terms of power dissipation can be critical to the success
of (or instrumental in the failure of!) the product
A final motivation comes from the practical and environmental consequences of how we design our chips – state-of-the-art high performance circuits are dissipating up to 100W per square centimeter – we only need 500 square meters of such silicon to soak up the output of a small nuclear power station A related argument, based on battery lifetime, shows that the “converged” mobile phone application combining telephony, data transmission, multimedia and PDA functions that will appear shortly is demanding power at the limit of lithium-ion or even methanol-water fuel cell battery technology We have to solve the power issue by a combination of design and process technology innovations; examples of current approaches
to power management include multiple transistor thresholds, triple gate oxide, dynamic supply voltage adjustment and memory architectures
Multiple transistor thresholds is a technique, practiced for several years now, that allows the designer to use high performance (low Vt) devices where he needs the speed, and low leakage (high Vt) devices elsewhere This benefits both static power consumption (through less sub-threshold leakage) and dynamic power consumption (through lower overall switching currents) High threshold devices can also be used to gate the supplies to different parts
of the circuit, allowing blocks to be put to sleep until needed
Similar to the previous technique, triple gate oxide (TGO) allows circuit partitioning between those parts that need performance and other areas of the circuit that don’t It has the additional benefit of acting on both sub-threshold leakage and gate leakage The third oxide is used for I/O and possibly mixed-signal It is expected over the next few years that the process technologists will eventually replace the traditional silicon dioxide gate dielectric of the CMOS devices by new materials such as rare earth oxides with much higher dielectric constants that will allow the gate leakage problem to be completely suppressed
Trang 12Dynamic supply voltage adjustment allows the supply voltage to different blocks of the circuit to be adjusted dynamically in response to the immediate performance needs for the block – this very sophisticated technique will take some time to mature
Finally, many, if not most, advanced devices use very large amounts of memory for which the contents may have to be maintained during standby; this consumes a substantial amount of power, either through refreshing dynamic RAM or through the array leakage for static RAM Traditional non-volatile memories have writing times that are orders of magnitude too slow
to allow them to substitute these on-chip memories New developments, such as MRAM, offer the possibility of SRAM-like performance coupled with unlimited endurance and data retention, making them potential candidates to replace the traditional on-chip memories and remove this component of standby power consumption
Most of the approaches to power management described briefly above will be employed in 65nm circuits, but there are a lot more good ideas waiting to be applied to the problem, many of which you will find clearly and concisely explained in this book
Mike Thompson, Philippe Magarshack
STMicroelectronics, Central R&D Crolles, France
Trang 13This page intentionally left blank
Trang 14The 2004 edition of the DATE (Design Automation and Test in Europe) conference has devoted an entire Special Focus Day to the power problem and its implications on the design of future electronic systems In particular, keynote presentations and invited talks by outstanding researchers in the field
of low-power design, as well as several technical papers from the regular conference sessions have addressed the difficulties ahead and advanced strategies and principles for achieving ultra low-power design solutions Purpose of this book is to integrate into a single volume a selection of these contributions, duly extended and transformed by the authors into chapters proposing a mix of tutorial material and advanced research results
The manuscript consists of a total of 14 chapters, addressing different aspects
of ultra low-power electronics and design Chapter 1 opens the volume by providing an insight to innovative transistor devices that are capable of operating with a very low threshold voltage, thus contributing to a significant reduction of the dynamic component of power consumption Solutions for limiting leakage power during stand-by mode are also discussed The chapter closes with a quick overview of low-power design techniques applicable at
Chapter 2 focuses on the problem of reducing power in the interconnect network by investigating alternatives to traditional metal wires In fact, according to the 2003 ITRS roadmap, metallic interconnections may not be able to provide enough transmission speed and to keep power under control for the upcoming technology nodes (65nm and below) A possible solution, explored in the chapter, consists of the adoption of optical interconnect networks Two applications are presented: Clock distribution and data communication using wavelength division multiplexing
Trang 15In Chapter 3, the power consumption problem is faced from the technology point of view by looking at innovative nano-devices, such as single-electron
or few-electron transistors The low-power characteristics and potential of these devices are reviewed in details Other devices, including carbon nano-tube transistors, resonant tunnelling diodes and quantum cellular automata are also treated
Chapter 4 is entirely dedicated to advanced design methodologies for reducing sub-threshold and gate leakage currents in deep-submicron CMOS circuits by properly choosing the states to which gates have to be driven when in stand-by mode, as well as the values of the threshold voltage and of the gate oxide thickness The authors formulate the optimization problem for
and propose both an exact method for its optimal solution and two practical heuristics with reasonable run-time Experimental results obtained on a number of benchmark circuits demonstrate the viability of the proposed methodology
Chapter 5 is concerned with the issue of minimizing power consumption of the memory subsystem in complex, multi-processor systems-on-chip (MPSoCs), such as those employed in multi-media applications The focus is
on design solutions and methods for synthesizing memory architectures containing both single-ported and multi-ported memory banks Power efficiency is achieved by casting the memory partitioning design paradigm to the case of heterogeneous memory structures, in which data need to be accessed in a shared manner by different processing units
Chapter 6 addresses the relevant problem of minimizing the power consumed
by the cache hierarchy of a microprocessor Several design techniques are discussed, including application-driven automatic and dynamic cache parameter tuning, adoption of configurable victim buffers and frequent-value data encoding and compression
Power optimization for parallel, variable-voltage/frequency processors is the subject of Chapter 7 Given a processor with such an architecture, this chapter investigates the energy/performance tradeoffs that can be spanned in parallelizing array-intensive applications, taking into account the possibility that individual processing units can operate at different voltage/frequency levels In assigning voltage levels to processing units, compiler analysis is used to reveal hetherogeneity between the loads of the different units in parallel execution
Trang 16Chapter 8 provides guidelines for the design and implementation of DSP and multi-media applications onto programmable embedded platforms The RINGS architecture is first introduced, followed by a detailed discussion on power-efficient design of some of the platform components, namely, the DSPs Next, design exploration, co-design and co-simulation challenges are addressed, with the goal of offering to the designers the capability of including into the final architecture the right level of programmability (or reconfigurability) to guarantee the required balance between system performance and power consumption
Chapter 9 targets software power minimization through source code optimization Different classes of code transformations are first reviewed; next, the chapter outlines a flow for the estimation of the effects that the application of such transformations may have on the power consumed by a software application At the core of the estimation methodology there is the development of power models that allow the decoupling of processor-independent analysis from all the aspects that are tightly related to processor architecture and implementation The proposed approach to software power minimization is validated through several experiments conducted on a number of embedded processors for different types of benchmark applications
Reduction of the power consumed by TFT liquid crystal displays, such as those commonly used in consumer electronic products is the subject of Chapter 10 More specifically, techniques for reducing power consumption
of transmissive TFT-LCDs using a cold cathode fluorescent lamp backlight are proposed The rationale behind such techniques is that the transmittance function of the TFT-LCD panel can be adjusted (i.e., scaled) while meeting
an upper bound on a contrast distortion metric Experimental results show that significant power savings can be achieved for still images with very little penalty in image contrast
Chapter 11 addresses the issue of efficiently accessing remote memories from wireless systems This problem is particularly important for devices such as palmtops and PDAs, for which local memory space is at a premium and networked memory access is required to support virtual memory swapping The chapter explores performance and energy of network swapping in comparison with swapping on local microdrives and FLASH memories Results show that remote swapping over power-manageable wireless network interface cards can be more efficient than local swapping and that both energy and performance can be optimized by means of power-aware reshaping of data requests In other words, dummy data accesses can
be preemptively inserted in the source code to reshape page requests in order
to significantly improve the effectiveness of dynamic power management
Trang 17Chapter 12 focuses on communication architectures for multi-processor SoCs The network-on-chip (NoC) paradigm is reviewed, touching upon several issues related to power optimization of such kinds of communication architectures The analysis goes on a layer-by-layer basis, and particular emphasis is given to customized, domain-specific networks, which represent the most promising scenario for communication-energy minimization in multi-processor platforms
Chapter 13 provides a natural follow up to the theory of NoCs covered in the previous chapter by describing an industrial application of this type of communication architecture In particular, the authors introduce an innovative methodology for automatically generating the power models of a versatile and parametric on-chip communication IP, namely the STBus by STMicroelectronics The methodology is validated on a multi-processor hardware platform including four ARM cores accessing a number of peripheral targets, such as SRAM banks, interrupt slaves and ROM memories
The last contribution, offered in Chapter 14, proposes an integrated end power management approach for mobile video streaming applications that unifies low-level architectural optimizations (e.g., CPU, memory, registers), OS power-saving mechanisms (e.g., dynamic voltage scaling) and adaptive middleware techniques (e.g., admission control, trans-coding, network traffic regulation) Specifically, interaction parameters between the different levels are identified and optimized to achieve a reduction in the power consumption
end-to-Closing this introductory chapter, the editor would like to thank all the authors for their effort in producing their outstanding contributions in a very short time A special thank goes to Mike Thompson and Philippe Magarshack of STMicroelectronics for their keynote presentation at DATE
2004 and for writing the foreword to this book The editor would also like to acknowledge the support offered by Mark De Jongh and the Kluwer staff during the preparation of the final version of the manuscript Last, but not least, the editor is grateful to Agnieszka Furman for taking care of most of the “dirty work” related to book editing, paging and preparation of the camera-ready material
Trang 18Infineon Technologies AG; 2 Technische Universität München
Abstract Power consumption increasingly is becoming the bottleneck in the design of
ICs in advanced process technologies We give a brief introduction into the major causes of power consumption Then we report on experiments in an advanced process technology with ultra-low threshold voltage (V th ) devices It turns out that in contrast to older process technologies, this approach increasingly is becoming less suitable for industrial usage in advanced process technologies Following, we describe methodologies to reduce power consumption by optimizations in logic design, specifically by utilizing multiple levels of supply voltage Vdd and threshold voltage Vth We evaluate them from an industrial product development perspective We also give a brief outlook to proposals on other levels in the design flow and to future work
Keywords: Low-power design, dynamic power reduction, leakage power reduction,
ultra-low-Vth devices, multi-Vdd, multi-Vth, CVS
1.1 INTRODUCTION
The progress of silicon process technology marches on relentlessly As predicted by Gordon Moore decades ago, silicon process technology continues to achieve improvements at an astonishing pace [1] The number
of transistors that can be integrated on a single IC approximately doubles every 2 years [2,3] This engineering success has created innovative new industries (e.g personal computers and peripherals, consumer electronics) and revolutionized other industries (e.g communications)
Today, however, it is becoming increasingly difficult to achieve improvements at the pace that the industry has become accustomed to More and more technical challenges appear that require increasing resources to be
Trang 19solved [4] One such problem is the increasing power consumption of integrated circuits It becomes even more critical as an increasing number of today’s high-volume consumer products are battery-powered
In the following, we will consider the sources of power consumption and their development over time We will show why reduction of power consumption increasingly is becoming critical to product success and will review traditional approaches in Sections 1.1 and 1.2 In Section 1.3 we will then analyze a potential solution based on introduction of an optimized
and discuss logic-level design optimizations for power reduction in Section 1.4 Also, we will briefly point out potential optimizations on higher levels Our observations are made from the perspective of industrial IC product development where technical optimizations must be carefully evaluated against the cost associated with achieving and implementing them Mostly, the presented methodologies are already being utilized in leading-edge industrial ICs
Depending on the type of end-product and its application, different aspects of power consumption are the primary concern: dynamic power or leakage power
Reduction of dynamic power consumption is a concern for almost all
IC products today For battery-powered products, reduced power consumption directly results in longer operating time for the product, which
is a very desirable characteristic Even for non-battery-powered products, reduced power consumption brings many advantages, such as reduced cost because of cheaper packaging or higher performance because of lower temperatures Finally, reduced power consumption often leads to lower system cost (no fans required; no or cheaper air conditioning for data / telecom center etc.)
Dynamic power consumption is caused by the charging and discharging
of capacitances when a circuit switches In addition, during switching a short-circuit current flows, but this current is typically much smaller, and will therefore be neglected in the following The dynamic current due to capacitance charging and discharging is determined by the following well-known relationship:
2
Trang 20Based on constant electrical field scaling, Vdd and CL each are reduced by 30% in each successive process generation Also, delay decreases by 30%, resulting in 43% increase in frequency Therefore, the dynamic power consumption per device is reduced by 50% from one process generation to the next As scaling also doubles the number of devices that can be implemented in a given die area, dynamic power consumption per area should stay roughly identical However, historically frequency has increased
by significantly more than 43% from one process generation to the next (e.g
in microprocessors, it has roughly doubled, due to architectural optimizations, such as deeper pipeline stages), and in addition, die sizes have increased with each new process technology, further increasing the power consumption, due to an increased number of active devices [5] For these reasons, dynamic power consumption has increased exponentially, as is shown in Figure 1-1 for the example of microprocessors
Reduction of leakage power consumption today is primarily a concern
for products that are powered by battery and spend most of their operating hours in some type of standby mode, such as cell phones
For many process generations, however, leakage has increased roughly
by a factor of 10 for every two process nodes [6] Due to this dramatic increase with newer process generations, leakage is becoming a significant contribution to overall IC power consumption even in normal operating mode, as can be seen in Figure 1-1 as well Leakage was estimated to increase from 0.01% of overall power consumption in a 1.0µm technology,
to 10% in a 0.1µm technology [6] For a microprocessor, Intel estimated leakage power consumption at more than 50W for a 100nm technology node[3] This figure probably is extreme, and leakage depends strongly on a
temperature T) Nevertheless, for an increasing number of products leakage power consumption is turning into a problem, even when they are not battery-powered
Trang 21Figure 1-1 Development of dynamic and leakage power consumption over time [3,7]
the key levers to reduce dynamic power:
• Reduce operating frequency
• Reduce driven capacity
• Reduce supply voltage
has the side effect of reducing performance as well, primarily because gate
Trang 22overdrive (the difference between Vdd and Vth) diminishes if the threshold
( )α
th dd
dd L
d
V V
V C
t
−
•
=
1.0V, the reductions in gate overdrive are more pronounced than previously
In addition, newer process technologies give significantly less of a performance boost compared to the previous process generation than has traditionally been the case, therefore a further reduction in performance is highly undesirable Finally, the power reduction achieved by moving to a new process generation has trended down over time, since supply voltages have been scaled by increasingly less than the 30% prescribed by the constant electrical field scaling paradigm
Consequently, more advanced approaches are required
In the following, our main focus will be on dynamic power consumption, but we will also consider leakage power consumption
1.4 ZERO-VTH DEVICES
overcomes the diminishing gate overdrive by radically setting the threshold voltage of the active devices to zero It has been shown [9], that the optimum
the devices will never completely switch off But from an overall power perspective the gain in active power consumption is tremendous
Using these transistors the supply voltage of 130nm circuits can be
performance degradation Alternatively, the circuit can be operated at twice the clock frequency when keeping the supply voltage at 1.2V, as shown in
the complete circuits are switched-off or are set into a low leakage mode to cope with the very high leakage contribution The low leakage mode is achieved by ‘active well’ control, which denotes the use of the body effect
Trang 23reverse back biasing: a negative well-to-source voltage Usb is used
be generated Furthermore, active well is required to compensate the
below 40°C For some high-end computer equipment the costs for active chip cooling are affordable to achieve this junction temperature But this is definitely not the case for cost-driven consumer products For this
in some applications the specified worst-case ambient temperature is even
changes and adaptations
Figure 1-2 Simulated performance curves of transistors with ultra-low Vth Compared to
low-V th , either a performance gain or a V dd reduction can be achieved Curves for reg-V th and
high-V th transistors of a 130 nm technology are included
device with about 150mV threshold voltage proved to be the best
Trang 24compromise between zero-Vth and current low-Vth of about 300mV within a
130 nm CMOS technology
is shown for a high activity circuit (ı= 20%) with various options for the
for the other transistor options were reduced to meet that reference performance
Figure 1-3 Power dissipation at T=125°C in active mode for several transistor options with
reduced V th A minimum power consumption is achieved at 150mV V th (At T=55°C the minimum is achieved for the same option but process variations show less impact)
The reduced supply voltage leads to lower overall active power
a rule of thumb a 100mV reduction of the threshold voltage allows for a V
Trang 25reduction by § 0.15V but on the other hand results in a tenfold increase of
the leakage current From Figure 1-3 also the impact of technology
variations is visible Due to the high leakage contribution a power reduction
of only 25% is achieved under fast process conditions Using back biasing in
reverse mode, the high performance of fast transistors can be reduced
decreases and allows a power reduction by 50% (stippled arrow)
A process modification has been developed to manufacture devices with
the threshold voltage of 150 mV, which proves to be the most efficient for
the target application domain of mobile consumer products [10] In Table1-1
the key transistor parameters of our ultra-low-Vth FETs (ulv) and of the
which translates into an average decrease of the CV/I-metric delay by 29%
Table 1-1 Extracted key parameters of the ulv-FETSs in comparison with the target values
and the low- V th FETs
NFET / PFET
130nm ulv-FET NFET / PFET
compensation, back biasing has also to be used to compensate for this strong
technology variation
Trang 26The values of the body effect are also included in Table 1-1 The body
decrease of body effect in combination with the increased roll-off reduces the leverage of back biasing for ulv-FETs very significantly The leverage is not even sufficient to compensate the technology variation, since the value
of the roll-off is higher than that of the body effect As an example, the NFET shows roll-off values of 65mV/10nm and 100mV/15nm and a body effect of only 60mV/V
ulv-To investigate the migration potential of the ulv-FETs for future
90nm hardware, were used Based on this measurement data the leverage of
been analyzed For supply voltages of 1.2V and 0.75V a reverse back biasing voltage of 0.5V has been applied For the NFET, the back biasing results in a leakage reduction by 50% to 70% for all transistor widths and for both
similar (60% to 80%) for transistors with W> 0.5µm For very narrow
narrow FETs are used within SRAMs, which contribute a major part of the circuit’s standby current, this small reduction for narrow transistors in addition reduces significantly the leverage of active well The root cause is
an additional leakage mechanism based on tunnelling currents across the drain-well junction, which limits the reverse back biasing to 0.5V This tunnelling current depends exponentially on the drain-well voltage and is working against any reduction of the sub-threshold current via active well
is therefore lower In this case the effect of back biasing is not compensated
by a rising tunnelling current and a leakage current reduction by 70% is still achieved
For a 90nm technology the limit of 0.5V for the well potential swing limits the reduction of the leakage currents to a factor between 2 and 4 This
is still a major contribution of all feasible measures to reduce standby power consumption, but the leverage becomes quite small compared to the reduction ratios of several orders of magnitude obtained in previous
This is due to the ever decreasing gate oxide thickness and also due to the
by well biasing reducing the leverage of active well even further
Trang 27In summary the zero-Vth-devices have become very susceptible to process and temperature variations Significant yield is only achievable with back biasing via active well control and with active cooling The latter approach is not feasible for mobile applications Therefore a more
150mV threshold voltage proved to be the best compromise between
affects some standard methods to overcome short-channel effects The so called halo- or pocket-implantation had to be removed to bring the threshold voltage down Unfortunately short-channel effects are now heavily
of the channel length Finally this effect was prohibitive for the overall
For leading-edge products which need to optimize both power consumption and system performance, optimization techniques on architecture and design level have been proposed and partly already been implemented While academic research often focuses on the tradeoff between power consumption and performance, industrial product development must also take other variables into consideration
• Product cost: often, power optimization design techniques increase die area, directly affecting manufacturing cost Also, utilization of additional
consequently manufacturing cost, and additionally requires up-front expenditures for the development of such devices Finally, increased manufacturing complexity poses the risk of lowered manufacturing yield
• Product robustness: it must be ensured that optimized products still work across the specified range of operating conditions, also taking manufacturing variations into account
Trang 281.5.1 Multi-V dd Design
preferred option to reduce dynamic power consumption However, as
check by the need to maintain performance
design Most effective regarding power reduction, and also easiest to
performance of the IC design, this often is not an option On a lower
rather simple to implement, but if only modules are chosen such that overall
IC performance is not impacted, the achieved gains in power reduction will often be very moderate
Finally, a reduction in supply voltage can be applied specifically to individual gates, such that the overall system performance is not reduced This approach, as shown in Figure 1-4, recognizes that in a typical design, most logic paths are not critical They can be slowed down, often significantly, without reducing the overall system performance This slowing
non-critical paths, which results in lowered power consumption
Trang 29Figure 1-4 Multi-Vdd design
This technique will modify the distribution of path delays in a design to a distribution skewed towards paths with higher delay, as indicated Figure 1-5 [14]
Figure 1-5 Distribution of path delays under single and multiple supply voltages
Non-critical path runs with reduced supply voltage
Trang 30A number of studies have shown significant variation in dynamic power
from less than 10% up to almost 50%, with 40% being the average [15,16] Rules of thumb for selecting appropriate supply voltage levels have been
The benefit of using multiple supply voltages quickly saturates The
this to ever more supply voltage levels yields only small incremental benefits [18,19], even when the overhead introduced by multiple supply voltages (see below) is not taken into consideration
The power reduction achieved by this technique roughly depends on two
is applied
Regarding the first parameter, it has been pointed out some years ago that the leverage of this concept decreases as process technologies are scaled down further [18]
devices, which are essential for low standby power design due to their lower
system performance is greatly reduced It is shown that from 0.25µm down
introduction of variable threshold voltages, as will be seen later
Regarding the second parameter, experience has shown that especially in
skewed to higher delays already, thus reducing the number of gates that can
be slowed down further [14]
For the selection of those gates which will receive the lower supply
is the concept of clustered voltage scaling (CVS) It recognizes that it is desirable to have clusters of gates assigned to the same voltage, since
This concept has been enhanced by extended clustered voltage scaling (ECVS)[17] which essentially allows an arbitrary assignment of supply
Trang 31voltage levels to gates This strategy implies more frequent insertion of level shifters into the design However, usually only power consumption and delay are considered in the literature The additional area cost is neglected
In industry, this certainly is not feasible
poses a number of challenges
dc-to-dc converter, unless the voltage already exists externally This results
in area overhead, and in power consumption for the converter
• Level-shifters are required between different supply domains It is feasible to integrate level shifters into flip-flops [21]
The penalties in area, power consumption and delay resulting from these effects are not always taken into account by work published in the literature Studies indicate that a 10% area overhead will result from implementing a
An additional consideration for industrial IC product development is that
rudimentary It is not sufficient to have a single point tool which can perform power-performance tradeoffs Instead, this methodology needs to encompass the entire design flow (e.g power distribution in layout; automated insertion
of level shifters etc.)
1.5.2 Multi-V th Design
Another essential technique is the use of different transistor threshold
consumption, thus increasing standby time of battery-powered ICs As leakage power consumption becomes an increasingly important component
of overall power consumption in modern process technologies, this technique increasingly also helps to reduce overall power consumption significantly, as design moves to more advanced process technologies The
performance are implemented with special leakage-reduced transistors
in Figure 1-6
Trang 32Figure 1-6 Multi-Vth design
A typical industrial approach today is to first create a design using lower
to reduce leakage
Studies in the literature have reported reductions in leakage of around
provided by the process technology (through doping variations) and propose
performance is not compromised [23, 24] Recently, it has also been
gate oxide thickness Tox [25]
Design-tool support for this technique is also rudimentary at best While
it is becoming established to design different modules of an IC with different
transistors within a module The primary reason is that the entire design flow must be able to handle cells with identical functionality and size, which differ in their electrical properties This poses no principal algorithmic problems, but must be consistently implemented in all EDA tools within a design flow
high Vthigh V
t
Trang 331.5.3 Hybrid Approaches
Recently approaches have been suggested in the literature which combine implementation of multiple supply voltages and multiple threshold voltages for further power reduction Especially for designs where minimization of total power consumption is key (as compared to e.g minimization of standby power for mobile products), it is possible to trade off leakage and dynamic
literature indicate a total power optimum when leakage power contributes 10% to 30% [26,12] This ratio depends significantly on the process technology, operating environment, and clock frequency of a design
For applications where leakage power minimization is critical (e.g mobile products), this approach usually is not feasible, as it requires a
With the increasing significance of gate leakage currents, variations of gate oxide thickness Tox have also been proposed
An overall framework for using two supply voltages and two threshold voltages as well has been presented [19] Theoretically, it is shown that more than 60% of total power consumption can be saved this way (not considering required overhead such as level shifters, routing etc.) Rules of
performance
This approach has been applied to the practical example of an ARM processor in [27] Due to specific layout considerations it was not possible to
different libraries were implemented Using a CVS algorithm, a reduction in dynamic power by 15% was achieved for a 0.18µm process technology Leakage power was reduced by 40% As leakage power was more than 1000x smaller than dynamic power, overall active power reduction was 15% To achieve this, a 14% increase in area was required
A very recent approach considers also transistor width sizing in addition
approach, total power savings of 37% on average over a suite of benchmark circuits are reported In this study, the threshold voltage is chosen rather low,
so that leakage represents 20-50% of total power consumption Therefore, optimization of both leakage and dynamic power consumption is essential, which is achieved with the presented approach
Trang 34An enhanced approach for leakage power consumption considers multiple gate oxide thicknesses Tox in addition to multi-Vth [29] It is motivated by the fact that gate leakage increases very dramatically with newer process technologies Gate leakage is of the same order of magnitude
as subthreshold leakage at the 90nm process node Their relationship also depends significantly on the operating temperature T The key observation that an OFF transistor suffers from subthreshold leakage, an ON transistor from gate leakage, motivates the approach to analyze transistor states in
is minimized Leakage reductions of 5-6x are obtained on benchmark
Previous approaches that included Tox into the optimization varied Tox only for different design modules, not on critical paths within modules These newer approaches promise further reductions in power consumption This will come, however, at a price (as seen e.g in the ARM example) Design complexity increases significantly when variations in many parameters are made available at the same time In some studies, the resulting overhead is not considered
1.5.4 Cost Tradeoffs
This overhead must be considered, however, since it is quite significant:
additional supply voltages (area)
• Multi-Tox: additional masks (manufacturing costs)
• In addition, IC development costs increase due to more complex design
qualified and continuously monitored For each such option, the design library must be electrically characterized, modelled for all EDA tools, and potentially optimized regarding circuit design and layout It must be maintained and regularly updated (changes in electrical parameters, changes in tools in the design flow) over a long period of time as well If
a very specialized manufacturing flow is developed to fully optimize a given product, it will be very difficult to shift manufacturing of this product to a different fab (e.g a foundry in case additional capacity is required)
For these and potentially other reasons, we are not yet aware of industrial products that have implemented such proposals in a fine-grained manner (i.e
Trang 35Some approaches in the literature also determine optimum levels of threshold voltages depending on a given design In industry, this is rarely feasible Typically, a manufacturing process has to be taken as given, with
LEVELS
The approaches outlined above on gate level and device level can be (and often must be) supported by measures on higher levels of abstraction
Some of the most promising concepts are as follows:
• partitioning the system such that large areas can be powered off for significant periods of time (block turnoff)
• especially partitioning memory systems such that large parts can be turned off in standby mode
• clock gating is an essential method which reduces dynamic power consumption by local off-switching of non-active gates
• coding strategies (e.g for buses) can reduce switching and thus dynamic power consumption
There is no single “silver bullet” to solve the challenge of power
devices is a conceptually very convincing concept, its widespread implementation is hindered by manufacturing concerns An extrapolation of current technology trends indicates that such a concept will become even more difficult in the future
Today, design techniques are the most promising approach to reduce power – both dynamic and leakage
The concepts outlined here can be further extended It is feasible to dynamically adjust supply and threshold voltages These are theoretically promising concepts which however still require more investigation especially with regard to feasibility under industrial boundary conditions Quite likely, in the future even more emphasis than today will have to be placed on power reduction schemes on algorithmic and system level On these levels, the levers to reduce power consumption are largest
Acknowledgement
The authors wish to acknowledge and thank Jörg Berthold and Tim Schönauer for their contributions and fruitful discussions
Trang 36[4] U Schlichtmann, Systems are Made from Transistors: UDSM Technology Creates New Challenges for Library and IC Development, IEEE Euromicro Symposium on Digital System Design, 2002, pp 1-2
[5] S Borkar, Design Challenges of Technology Scaling, IEEE Micro, July/August 1999, pp 23-29.
[6] S Thompson, P Packan, and M Bohr, MOS Scaling: Transistor Challenges for the 21st Century, Intel Technology Journal, Q3 1998
[7] N Kim et al., Leakage Current: Moore's Law Meets Static Power, IEEE Computer, Vol
[15] K Usami, M Igarashi, Low-Power Design Methodology and Applications utilizing Dual Supply Voltages, Proceedings of the Asia and South Pacific Design Automation Conference 2000, pp 123-128
[16] M Donno, L Macchiarulo, A Macii, E Macii, M Poncino, Enhanced Clustered Voltage Scaling for Low Power, Proceedings of the 12th ACM Great Lakes Symposium
on VLSI, 2002, pp 18-23
[17] K Usami et al., Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media Processor, IEEE Journal of Solid-State Circuits, Vol 33, No 3, March 1998, pp 463-472
[18] M Hamada, Y Ootaguro, T Kuroda, Utilizing Surplus Timing for Power Reduction, Proceedings IEEE Custom Integrated Circuits Conference CICC, 2001, pp 89-92 [19] A Srivastava, D Sylvester, Minimizing Total Power by Simultaneous Vdd/Vth Assignment, Proceedings of the Asia and South Pacific Design Automation Conference
2003, pp 400-403
[20] K Usami, M Horowitz, Clustered Voltage Scaling Technique for Low-Power Design, Proceedings of the International Symposium on Low Power Design ISLPD, 1995, pp 3- 8.
Trang 37[21] K Usami et al., Design Methodology of Ultra Low-power MPEG4 Codec Core Exploiting Voltage Scaling Techniques, Proceedings of the 35th Design Automation Conference 1998, pp 483-488
[22] C Yeh, Y.-S Kang, Layout Techniques Supporting the Use of Dual Supply Voltages for Cell-Based Designs, Proceedings of the 36th Design Automation Conference 1999, pp 62-67.
[23] Q Wang, S Vrudhula, Algorithms for Minimizing Standby Power in Deep Submicrometer, Dual-Vt CMOS Circuits, IEEE Transactions on CAD, Vol 21, No 3, March 2002, pp 306/318
[24] L Wei, Z Chen, K Roy, M Johnson, Y Ye, V De, Design and Optimization of Threshold Circuits for Low-Voltage Low-Power Applications, IEEE Transactions on Very Large Scale Integration (VLSI), Vol 7, No 1, March 1999, pp 16-24
Dual-[25] N Sirisantana, K Roy, Low-Power Design Using Multiple Channel Lengths and Oxide Thicknesses, IEEE Design & Test of Computers, January-February 2004, pp 56-63 [26] K Nose, T Sakurai, Optimization of V DD and V TH for Low-Power and High-Speed Applications, Proceedings of the Asia and South Pacific Design Automation Conference
2000, pp 469-474
[27] R Bai, S Kulkarni, W Kwong, A Srivastava, D Sylvester, D Blaauw, An Implementation of a 32-bit ARM Processor Using Dual Power Supplies and Dual Threshold Voltages, IEEE International Symposium on VLSI, 2003, pp 149-154 [28] A Srivastava, D Sylvester, D Blaauw, Concurrent Sizing, Vdd and V th Assignment for Low-Power Design, Proceedings of the Design, Automation and Test in Europe Conference DATE, 2003, pp 718-719
[29] D Lee, H Deogun, D Blaauw, D Sylvester, Simultaneous State, Vt and Tox Assignment for Total Standby Power Minimization, Proceedings of the Design, Automation and Test in Europe Conference DATE, 2003, pp 494-499
Trang 38Chapter 2
ON-CHIP OPTICAL INTERCONNECT FOR
LOW-POWER
Ian O’Connor and Fr´ed´eric Gaffiot
Ecole Centrale de Lyon
Abstract It is an accepted fact that process scaling and operating frequency both contribute
to increasing integrated circuit power dissipation due to interconnect ing this trend leads to a red brick wall which only radically different interconnect architectures and/or technologies will be able to overcome The aim of this chap- ter is to explain how, by exploiting recent advances in integrated optical devices, optical interconnect within systems on chip can be realised We describe our vision for heterogeneous integration of a photonic “above-IC" communication layer Two applications are detailed: clock distribution and data communication using wavelength division multiplexing For the first application, a design method will be described, enabling quantitative comparisons with electrical clock trees For the second, more long-term, application, our views will be given on the use
Extrapolat-of various photonic devices to realize a network on chip that is reconfigurable in terms of the wavelength used.
Keywords: Interconnect technology, optical interconnect, optical network on chip
In the 2003 edition of the ITRS roadmap [17], the interconnect problem wassummarised thus: “For the long term, material innovation with traditional scal-ing will no longer satisfy performance requirements Interconnect innovationwith optical, RF, or vertical integration will deliver the solution” Continu-ally shrinking feature sizes, higher clock frequencies, and growth in complexityare all negative factors as far as switching charges on metallic interconnect isconcerned Even with low resistance metals such as copper and low dielectricconstant materials, bandwidths for long interconnect will be insufficient for fu-ture operating frequencies Already the use of metal tracks to transport a signalover a chip has a high cost in terms of power: clock distribution for instance
Trang 39requires a significant part (30-50%) of total chip power in high-performancemicroprocessors.
A promising approach to the interconnect problem is the use of an opticalinterconnect layer, which could empower an increase in the ratio between datarate and power dissipation At the same time it would enable synchronous op-eration within the circuit and with other circuits, relax constraints on thermaldissipation and sensitivity, signal interference and distortion, and also free uprouting resources for complex systems However, this comes at a price Firstly,
high-speed and low-power interface circuits are required, design of which is
not easy and has a direct influence on the overall performance of optical connect Another important constraint is the fact that all fabrication steps have
inter-to be compatible with future IC technology and also that the additional costincurred remains affordable Additionally, predictive design technology is re-quired to quantify the performance gain of optical interconnect solutions, whereinformation is scant and disparate concerning not only the optical technology,but also the CMOS technologies for which optics could be used (post-45nmnode)
In section 2.2, we will describe the “above-IC” optical technology Sections2.3 and 2.4 describe an optical clock distribution network and a quantitativeelectrical-optical power comparison respectively A proposal for a novel opticalnetwork on chip in discussed in section 2.5
Various technological solutions may be proposed for integrating an opticaltransport layer in a standard CMOS system In our opinion, the most promisingapproach makes use of hybrid (3D) integration of the optical layer above acomplete CMOS IC, as shown in fig 2.1 The basic CMOS process remainsthe same, since the optical layer can be fabricated independently The weakness
of this approach is in the complex electrical link between the CMOS interfacecircuits and the optical sources (via stack and advanced bonding)
In the system shown in fig 2.1, a CMOS source driver circuit modulatesthe current flowing through a biased III-V microsource through a via stackmaking the electrical connection between the CMOS devices and the opticallayer III-V active devices are chosen in preference to Si-based optical devicesfor high-speed and high-wavelength operation The microsource is coupled to
silicon technology and silicon is an excellent material for transmitting
dB/cm has been demonstrated [10]) The waveguide structure transports theoptical signal to a III-V photodetector (or possibly to several, as in the case of
Trang 40circuit
receiver circuit
electrical
contact
III−V photodetector III−V
laser source
Si photonic waveguide (n=3.5)
SiO2 waveguide cladding (n=1.5)
CMOS IC
Figure 2.1. Cross-section of hybridised interconnection structure
a broadcast function) where it is converted to an electrical photocurrent, whichflows through another via stack to a CMOS receiver circuit which regeneratesthe digital output signal This signal can then if necessary be distributed over asmall zone by a local electrical interconnect network
NETWORK
In this section we present the structure of the optical clock distribution work, and detail the characteristics of each component part in the system: ac-tive optoelectronic devices (external VCSEL source and PIN detector), passivewaveguides, interface (driver and receiver) circuits The latter represent ex-tremely critical parts to the operation of the overall link and require particularlycareful design
net-An optical clock distribution network, shown in fig 2.2, requires a singlephotonic source coupled to a symmetrical waveguide structure routing to anumber of optical receivers At the receivers the high-speed optical signal isconverted to an electrical one and provided to local electrical networks Hencethe primary tree is optical, while the secondary tree is electrical It is not feasible
to route the optical signal all the way down to the individual gate level sinceeach drop point requires a receiver circuit which consumes area and power.The clock signal is thus routed optically to a number of drop points which willcover a zone over which the last part of the clock distribution will be carried out