Adaptive Techniques for Dynamic Processor Optimization Theory and Practice Episode 2 Part 6 ppt

Instead of switching in a separate supply or charge sharing the supply, as in previous examples, the supply to the write columns is simply switched off, floating the column supply lines

Trang 1

2nd Metal

#WE

WE[n+1]

WE[n]

4th Metal

Capacitive Write Assist Circuit

WL

P-Tr[n]

N-Tr[n]

P-Tr[n+1]

N-Tr[n+1]

Nd-Tr

Since extra supplies are not always available in product design, another example [14] uses charge sharing to lower the supply to the columns being written to As shown in Figure 11.4, “downvdd” is precharged to VSS For a write operation, supplies to the selected columns are disconnected from VDD, and shorted to “downvdd” The charge sharing lowers the supply’s voltage to a level determined by the ratio of the capacitances, allowing writes to occur easily

Trang 2

Memory cell

Vssm

Vdd

Figure 11.5 Write column supply switch off [21] (© IEEE 2006)

Yet another example [21] uses a power-line-floating write technique to assist write operations Instead of switching in a separate supply or charge sharing the supply, as in previous examples, the supply to the write columns is simply switched off, floating the column supply lines at VDD (Figure 11.5)

As the cells are written to, the floating supply line (Vddm) discharges through the “0” bitline, as shown in Figure 11.6a The decreased supply voltage allows easy writing to the cells As soon as the cell flips to its intended state, the floating supply line’s discharge path is cut off, preventing the floating supply line from fully discharging (Figure 11.6b)

Iwrite

“L”

“H”

“L”

Figure 11.6 Power-line-floating write [21] (© IEEE 2006)

Trang 3

In all column voltage manipulation schemes, nonselected cells must retain state with the lowered supply

11.2.1.2 Row Voltage Optimization

Similar to the previous section, designers can apply voltage manipulation

in the row direction as well However, unlike column-based voltage optimization, row-based voltage optimization generally cannot simultaneously optimize for both read and write margins in the same operation, as needed in a column-multiplexed design Therefore, row-based voltage manipulation tends to be more suitable for non-column-multiplexed designs where all the columns are written to in a write operation

The most obvious method to apply row-based voltage optimization is to raise the supply for the row of accessed cells in a read operation, or to lower the supply for the row of cells being written to In addition, the following are some other examples of row-based voltage optimization

“L”

“H”

Word Line

Ld1

Dr1

“L”

Tr1

Node A

Vss

PLVC1

Vdd

Ic2

Ic1

sw1 Vss_mem

cellb x3

Ic1 Ic2

cella

Node B

Figure 11.7 Raised source line write [20] (© IEEE 2004)

In [20], the SRAM cells’ source line (SL) (i.e., source terminals of MNs

in Figure 11.1) is disconnected from VSS during write operations The SL

is allowed to float until it is clamped by an NFET diode (Figure 11.7) The raised SL (Vss_mem in Figure 11.7) decreases the drive of the PFETs, which allows easy overwriting of the cell (In this specific example, the floating SL is shared among all the cells in the array, not just the cells in a row However, designers can apply the same technique on a row-by-row

Trang 4

basis at the cost of area overhead.) A variation of this technique would disconnect the SL during both write and standby operations to achieve power savings, and connect the SL to VSS only during read operations when the extra stability margin is needed The drawback to this variation

is the additional delay needed to restore SL to VSS before a read operation can begin

A similar example [13] also floats SL during write operations In addition, the SL is driven to a negative voltage during read operations This allows for faster bitline development, as well as more stable cells during read operations

VGND

BLC PL1 PL2 BLT PL1

PL2

PL3

PL0 WL0

WL1

WL2

SRAM cell Subarray

VDD

Figure 11.8 Supply line coupling [3] (© IEEE 2004)

If a separate supply is not available, another way to boost the internal supply of SRAM cells during a read access to achieve higher stability is through coupling In [3], wordline wires are routed next to the row’s supply lines As seen in Figure 11.8, as the wordline rises, it disconnects the supply lines from VDD, and couples the voltages of the supply lines higher than VDD Assuming insignificant current is sourced from the supply line during a read access, the bootstrapped supply increases the drive on MNs and improves the cell’s stability However, for cell designs with low MN/MA ratios, the “0” storage node may rise higher than MN’s threshold voltage, causing the floating supply lines to discharge

Trang 5

n_arvdd

Replica Access Tr

Read Assist Circuit

WL WL

WL

P WL

N WL

P WL

N WL

P WL

N WL

Figure 11.9 Wordline driver using RATs [14] (© IEEE 2007)

In [14], instead of increasing the SRAM cell’s supply to improve stability, the WL voltage is reduced slightly Reduced wordline voltage degrades the drive of MA, which essentially improves the MN/MA ratio This implementation makes additional efforts to account for global threshold voltage variations Figure 11.9 illustrates the scheme, using

“replica access transistors” (RATs) that have almost the same physical topology as MA to lower the WL voltage In general, lower VTN causes SRAM cells to be less stable Therefore, the RATs lower WL more when

VTN is low, and less when VTN is high, to achieve balance between read margin and read speed

11.2.2 Timing Control

Aside from voltage manipulation, designers can also improve cell stability

by decreasing the amount of time the cell is under stress during a read operation For example, in a design that uses differential sensing, a small bitline voltage drop could be sufficient for sensing the bitcell value Leaving on the wordline longer than necessary would allow the bitlines to continue to disturb the “0” storage node, leading marginal SRAM cells to flip their values

In typical designs, the wordline shutoff is triggered on phase or cycle boundaries If the optimal wordline shutoff time does not align with phase

or cycle boundaries, or if the designer prefers to have the wordline high time independent of the frequency, then the designer could employ a

Trang 6

pulsed wordline scheme, such as the one used in [11] The challenge is to design the appropriate pulse width that is just long enough for reads to complete successfully across different process corners and operating conditions

RC RC RC RC

WL WDR

WDR WDR WDR WOFF

WEN

MPC RB MWR

Figure 11.10 Read and write replica circuits [21] (© IEEE 2006)

In [15], a read replica path, which uses 12 dummy SRAM cells, was used for generating the shutoff edge for wordlines The dummy SRAM cells, which resemble real SRAM cells but have internal values hardwired, help the replica path to track the variation in normal read paths In addition to the read replica circuits [21], a write replica circuit was also added In general, read operations take more time to complete than write operations Therefore, it is advantageous to shut off the wordline during a write operation as soon as the write is completed successfully, which will prevent unselected columns in a column-multiplexed design from conti-nuing to discharge the bitlines, resulting in wasted power Figure 11.10 is

an example illustrating the read and write replica paths together The replica bitline (RB) is precharged to VDD through MPC before read or write operations begin For a read operation, REN activates to “0”, causing the read-replica wordline (RW) to turn on the read dummy cells’ (RC) wordline The RC’s discharge RB, which turns off the wordlines through the WOFF signal In a write operation, RB is discharged through MWR, which also triggers WOFF In general, higher VTN requires the write time to be longer Therefore, dies with higher VTN would have a slower discharge through MWR, providing the write operation more time

to complete

Trang 7

The above illustration is just one example of designs using replica circuits The danger of replica circuits, of course, is no replica can perfectly track real paths through all process and operating corners For example, the write replica circuit above does not track PFET variations, which also impact write margin However, tracking some variation can usually yield more optimal designs than no tracking at all

11.3 Array Power Reduction

With power-per-performance becoming an important parameter, engineers pay increasing attention to reducing the power of embedded SRAM arrays, which often occupy a large percentage of the total die area Since activity factor is generally low for large caches, leakage power represents a significant, if not the dominant, portion of the overall cache power Devices in a SRAM cell typically have channel lengths much greater than the process minimum for variation control; thus, subthreshold leakage has traditionally been limited However, subthreshold leakage has worsened with recent technology nodes and more importantly, gate leakage (and in some cases, junction leakage) is getting significantly worse with oxide scaling As a result, SRAM leakage power now requires careful attention Because leakage power has a strong dependence on voltage, many have experimented with or implemented with “sleeping” the cache’s supply

11.3.1 Sleep Types

In general, cache “sleep” involves providing inactive SRAM cells, which

do not experience read-disturb, with a lowered supply to achieve power savings The lowered supply must be high enough to allow the inactive cells to maintain their data Then, before the cells are accessed, they are

“woken up” by providing a higher supply that can fulfill both read-disturb and access speed requirements

The most straightforward implementation of cache sleep involves providing the cache with two separate, external supplies However, a second supply is an expensive solution, so realistic implementations often choose to generate and regulate the second supply locally In general, these implementations fall into two categories – active and passive

“Active sleep” schemes try to actively maintain the reduced voltage at a certain level, while “passive sleep” schemes rely on voltage division or threshold voltage to determine the reduced voltage

Trang 8

11.3.1.1 Active Sleep

Khellah et al [10] used an op-amp to help control the reduced supply; Figure 11.11 illustrates its general concept When the arrays are active,

“wake” causes SramVSS to be connected to VSS through the strong NFET During idle mode, the strong NFET is turned off, allowing SramVSS to float SramVSS will rise due to array leakage, but the op-amp will prevent SramVSS from rising above VREF Of course, VDD – VREF must be greater than the SRAM cells’ standby VccMin, which is the minimum voltage at which cells are stable, to maintain cell data In this implementation, VREF is externally supplied for ease of controllability Also, an “early wake” signal is provide ahead of “wake”, to reduce the ground-bounce noise due to sudden discharge of SramVSS

Jumel et al [8] used a similar concept as the previous example, but took

it a step further As shown in Figure 11.12, an on-chip bandgap reference generates a reference voltage that is stable across PVT In addition, the voltage regulator is designed to track VDD, so a higher VDD would also allow SramVSS to rise, maintaining VDD – SramVSS close to VccMin Finally, the output of this regulator is trimmed on a die-by-die basis at wafer probe to account for process variations

Trang 9

Startup Circuit Reference Bandgap

-+

Error Amplifier

Analog

Supply

Logic

Supply

GND

SramVSS (to SRAM)

Figure 11.12 Active sleep control with bandgap reference and VDD tracking [8]

11.3.1.2 Passive Sleep

One straightforward way to generate a reduced supply is to use a diode, such as in [1] and illustrated in Figure 11.13 When SramVSS rises to the diode’s threshold voltage, the diode would clamp SramVSS The downside to this scheme is its inflexibility, as the clamping voltage is determined primarily by just the threshold voltage, and cannot be optimized for different supply voltages

wake

SramVSS SRAM array

Figure 11.13 Diode clamping sleep voltage

Trang 10

The example shown in Figure 11.14 aims to remove the SRAM supply’s

dependency on VDD [18] Rather than setting the array supply to VDD –

VT, which can vary depending on VDD, the array supply depends only on

transistor threshold voltages, as specified in Equation (11.1)

In this implementation, the array supply voltage specified in Equation

(11.1) is assumed to be sufficient for satisfying VccMin requirements To

adapt to different PVT conditions, the bias generator is built using replica

transistors The two replica load PFETs drop A1’s voltage to

Similarly, the two replica driver NFETs drop A2’s voltage to

Finally, the matching P1 and P1’ FETs clamp SramVSS at A1, while the

matching P2 and P2’ FETs clamp SramVSS at A2 The resulting

SramVSS is the lower of A1 and A2, producing Equation (11.1)

Trang 11

SramVSS shutoff

Sleep biasing control

MND MPB

In yet another example of passive cache sleep [4], a group of NFETs of different sizes were built in parallel between SramVSS and VSS, as shown

in Figure 11.15 In this implementation, VSS is gated by a shut-off FET to support cache power-down During silicon characterization, the optimal combination of these NFETs is determined to maximize leakage power savings while maintaining cell stability To provide better immunity from temperature variation, MND and MPB were added to the bias generator

In high temperature regions, the increased cell leakage would cause SramVSS to rise, reducing the supply to the memory cells and compromising stability In such regions, the reduced VTs for MND and MPS due to the high temperature would strengthen the pull-down, and reduce the amount that SramVSS rises

11.3.2 P Versus N Sleep

All the examples shown above use N-sleep, which provides the SRAM cells with true VDD and regulates SramVSS Before accessing the SRAM cells, NFETs are used to restore SramVSS to VSS Of course, designers can also implement the complementary P-sleep In P-sleep designs, SRAM cells are provided with true VSS and a regulated SramVDD Before accessing the P-sleep SRAM cells, PFETs are used to restore SramVDD

to VDD

Trang 12

“1”

Figure 11.16 Junction leakage paths in SRAM cell

At first glance, N-sleep seems the obvious favorite because the superior current driving capability of NFETs allows for smaller wake-up transistors, thus producing more efficient designs However, designers must consider additional factors to make the appropriate choice For example, the VSS net in a SRAM array often has more capacitance than the VDD net, so the larger SramVSS capacitance that must be discharged may negate the increase in the NFET’s drive strength per Also, P-sleep could provide additional power savings to processes that have non-negligible junction leakage Figure 11.16 shows the junction leakage components in a typical SRAM cell, which includes 4 N-diffusion to body paths (solid arrows) and 1 P-diffusion to N-well path (dotted line arrow) Because of the greater number of N-diffusion to body paths, and because junction leakage from N-diffusion is usually worse than the junction leakage from P-diffusion, lowering VDD reduces the junction leakage more than raising VSS would This is especially important for designs that leverage the bias circuitry to help shut off portions of the cache, such as in [4] Shutting off VSS would cause SramVSS to rise, but the rise would eventually be halted by the increase in N-diffusion junction leakage as more N-diffusions are no longer at VSS Shutting off VDD, on the other hand, could allow SramVDD to drop more significantly as P-diffusion leakage is less severe than N-diffusion leakage Therefore, the proper choice between P and N sleep should be evaluated based on the specific process and SRAM cell design

11.3.3 Entering and Exiting Sleep

The goal for sleep mode is to reduce power consumption However, each time the cache enters or exits sleep mode, some active power is dissipated

Định dạng
Số trang	20
Dung lượng	472,38 KB