Instead of switching in a separate supply or charge sharing the supply, as in previous examples, the supply to the write columns is simply switched off, floating the column supply lines
Trang 12nd Metal
#WE
WE[n+1]
WE[n]
4th Metal
Capacitive Write Assist Circuit
WL
WL
P-Tr[n]
N-Tr[n]
P-Tr[n+1]
N-Tr[n+1]
Nd-Tr
Figure 11.4 Charge sharing for supply reduction [14] (© 2007 IEEE)
Since extra supplies are not always available in product design, another example [14] uses charge sharing to lower the supply to the columns being written to As shown in Figure 11.4, “downvdd” is precharged to VSS For a write operation, supplies to the selected columns are disconnected from VDD, and shorted to “downvdd” The charge sharing lowers the supply’s voltage to a level determined by the ratio of the capacitances, allowing writes to occur easily
Trang 2Memory cell
Memory cell
Vssm
Vdd
Figure 11.5 Write column supply switch off [21] (© IEEE 2006)
Yet another example [21] uses a power-line-floating write technique to assist write operations Instead of switching in a separate supply or charge sharing the supply, as in previous examples, the supply to the write columns is simply switched off, floating the column supply lines at VDD (Figure 11.5)
As the cells are written to, the floating supply line (Vddm) discharges through the “0” bitline, as shown in Figure 11.6a The decreased supply voltage allows easy writing to the cells As soon as the cell flips to its intended state, the floating supply line’s discharge path is cut off, preventing the floating supply line from fully discharging (Figure 11.6b)
Iwrite
“L”
“L”
“H”
“H”
“H”
“L”
Figure 11.6 Power-line-floating write [21] (© IEEE 2006)
Trang 3In all column voltage manipulation schemes, nonselected cells must retain state with the lowered supply
11.2.1.2 Row Voltage Optimization
Similar to the previous section, designers can apply voltage manipulation
in the row direction as well However, unlike column-based voltage optimization, row-based voltage optimization generally cannot simultaneously optimize for both read and write margins in the same operation, as needed in a column-multiplexed design Therefore, row-based voltage manipulation tends to be more suitable for non-column-multiplexed designs where all the columns are written to in a write operation
The most obvious method to apply row-based voltage optimization is to raise the supply for the row of accessed cells in a read operation, or to lower the supply for the row of cells being written to In addition, the following are some other examples of row-based voltage optimization
“L”
“L”
“H”
“H”
“H”
“H”
“H”
“H”
Word Line
Ld1
Dr1
“L”
Tr1
Node A
Vss
PLVC1
Vdd
Ic2
Ic1
sw1 Vss_mem
cellb x3
Ic1 Ic2
cella
Node B
Figure 11.7 Raised source line write [20] (© IEEE 2004)
In [20], the SRAM cells’ source line (SL) (i.e., source terminals of MNs
in Figure 11.1) is disconnected from VSS during write operations The SL
is allowed to float until it is clamped by an NFET diode (Figure 11.7) The raised SL (Vss_mem in Figure 11.7) decreases the drive of the PFETs, which allows easy overwriting of the cell (In this specific example, the floating SL is shared among all the cells in the array, not just the cells in a row However, designers can apply the same technique on a row-by-row
Trang 4basis at the cost of area overhead.) A variation of this technique would disconnect the SL during both write and standby operations to achieve power savings, and connect the SL to VSS only during read operations when the extra stability margin is needed The drawback to this variation
is the additional delay needed to restore SL to VSS before a read operation can begin
A similar example [13] also floats SL during write operations In addition, the SL is driven to a negative voltage during read operations This allows for faster bitline development, as well as more stable cells during read operations
VGND
BLC PL1 PL2 BLT PL1
PL2
PL3
PL0 WL0
WL1
WL2
SRAM cell Subarray
VDD
VDD
VDD
VDD
VDD
VDD
Figure 11.8 Supply line coupling [3] (© IEEE 2004)
If a separate supply is not available, another way to boost the internal supply of SRAM cells during a read access to achieve higher stability is through coupling In [3], wordline wires are routed next to the row’s supply lines As seen in Figure 11.8, as the wordline rises, it disconnects the supply lines from VDD, and couples the voltages of the supply lines higher than VDD Assuming insignificant current is sourced from the supply line during a read access, the bootstrapped supply increases the drive on MNs and improves the cell’s stability However, for cell designs with low MN/MA ratios, the “0” storage node may rise higher than MN’s threshold voltage, causing the floating supply lines to discharge
Trang 5n_arvdd
Replica Access Tr
Read Assist Circuit
WL WL
WL
P WL
N WL
P WL
N WL
P WL
N WL
Figure 11.9 Wordline driver using RATs [14] (© IEEE 2007)
In [14], instead of increasing the SRAM cell’s supply to improve stability, the WL voltage is reduced slightly Reduced wordline voltage degrades the drive of MA, which essentially improves the MN/MA ratio This implementation makes additional efforts to account for global threshold voltage variations Figure 11.9 illustrates the scheme, using
“replica access transistors” (RATs) that have almost the same physical topology as MA to lower the WL voltage In general, lower VTN causes SRAM cells to be less stable Therefore, the RATs lower WL more when
VTN is low, and less when VTN is high, to achieve balance between read margin and read speed
11.2.2 Timing Control
Aside from voltage manipulation, designers can also improve cell stability
by decreasing the amount of time the cell is under stress during a read operation For example, in a design that uses differential sensing, a small bitline voltage drop could be sufficient for sensing the bitcell value Leaving on the wordline longer than necessary would allow the bitlines to continue to disturb the “0” storage node, leading marginal SRAM cells to flip their values
In typical designs, the wordline shutoff is triggered on phase or cycle boundaries If the optimal wordline shutoff time does not align with phase
or cycle boundaries, or if the designer prefers to have the wordline high time independent of the frequency, then the designer could employ a
Trang 6pulsed wordline scheme, such as the one used in [11] The challenge is to design the appropriate pulse width that is just long enough for reads to complete successfully across different process corners and operating conditions
RC RC RC RC
WL WDR
WDR WDR WDR WOFF
WEN
MPC RB MWR
Figure 11.10 Read and write replica circuits [21] (© IEEE 2006)
In [15], a read replica path, which uses 12 dummy SRAM cells, was used for generating the shutoff edge for wordlines The dummy SRAM cells, which resemble real SRAM cells but have internal values hardwired, help the replica path to track the variation in normal read paths In addition to the read replica circuits [21], a write replica circuit was also added In general, read operations take more time to complete than write operations Therefore, it is advantageous to shut off the wordline during a write operation as soon as the write is completed successfully, which will prevent unselected columns in a column-multiplexed design from conti-nuing to discharge the bitlines, resulting in wasted power Figure 11.10 is
an example illustrating the read and write replica paths together The replica bitline (RB) is precharged to VDD through MPC before read or write operations begin For a read operation, REN activates to “0”, causing the read-replica wordline (RW) to turn on the read dummy cells’ (RC) wordline The RC’s discharge RB, which turns off the wordlines through the WOFF signal In a write operation, RB is discharged through MWR, which also triggers WOFF In general, higher VTN requires the write time to be longer Therefore, dies with higher VTN would have a slower discharge through MWR, providing the write operation more time
to complete
Trang 7The above illustration is just one example of designs using replica circuits The danger of replica circuits, of course, is no replica can perfectly track real paths through all process and operating corners For example, the write replica circuit above does not track PFET variations, which also impact write margin However, tracking some variation can usually yield more optimal designs than no tracking at all
11.3 Array Power Reduction
With power-per-performance becoming an important parameter, engineers pay increasing attention to reducing the power of embedded SRAM arrays, which often occupy a large percentage of the total die area Since activity factor is generally low for large caches, leakage power represents a significant, if not the dominant, portion of the overall cache power Devices in a SRAM cell typically have channel lengths much greater than the process minimum for variation control; thus, subthreshold leakage has traditionally been limited However, subthreshold leakage has worsened with recent technology nodes and more importantly, gate leakage (and in some cases, junction leakage) is getting significantly worse with oxide scaling As a result, SRAM leakage power now requires careful attention Because leakage power has a strong dependence on voltage, many have experimented with or implemented with “sleeping” the cache’s supply
11.3.1 Sleep Types
In general, cache “sleep” involves providing inactive SRAM cells, which
do not experience read-disturb, with a lowered supply to achieve power savings The lowered supply must be high enough to allow the inactive cells to maintain their data Then, before the cells are accessed, they are
“woken up” by providing a higher supply that can fulfill both read-disturb and access speed requirements
The most straightforward implementation of cache sleep involves providing the cache with two separate, external supplies However, a second supply is an expensive solution, so realistic implementations often choose to generate and regulate the second supply locally In general, these implementations fall into two categories – active and passive
“Active sleep” schemes try to actively maintain the reduced voltage at a certain level, while “passive sleep” schemes rely on voltage division or threshold voltage to determine the reduced voltage
Trang 811.3.1.1 Active Sleep
Khellah et al [10] used an op-amp to help control the reduced supply; Figure 11.11 illustrates its general concept When the arrays are active,
“wake” causes SramVSS to be connected to VSS through the strong NFET During idle mode, the strong NFET is turned off, allowing SramVSS to float SramVSS will rise due to array leakage, but the op-amp will prevent SramVSS from rising above VREF Of course, VDD – VREF must be greater than the SRAM cells’ standby VccMin, which is the minimum voltage at which cells are stable, to maintain cell data In this implementation, VREF is externally supplied for ease of controllability Also, an “early wake” signal is provide ahead of “wake”, to reduce the ground-bounce noise due to sudden discharge of SramVSS
Jumel et al [8] used a similar concept as the previous example, but took
it a step further As shown in Figure 11.12, an on-chip bandgap reference generates a reference voltage that is stable across PVT In addition, the voltage regulator is designed to track VDD, so a higher VDD would also allow SramVSS to rise, maintaining VDD – SramVSS close to VccMin Finally, the output of this regulator is trimmed on a die-by-die basis at wafer probe to account for process variations
Figure 11.11 Active sleep control [10] (© IEEE 2006)
Trang 9Startup Circuit Reference Bandgap
-+
Error Amplifier
Analog
Supply
Logic
Supply
GND
SramVSS (to SRAM)
Figure 11.12 Active sleep control with bandgap reference and VDD tracking [8]
(© IEEE 2006) Courtesy of Philippe Royannez: Texas Instruments, Inc
11.3.1.2 Passive Sleep
One straightforward way to generate a reduced supply is to use a diode, such as in [1] and illustrated in Figure 11.13 When SramVSS rises to the diode’s threshold voltage, the diode would clamp SramVSS The downside to this scheme is its inflexibility, as the clamping voltage is determined primarily by just the threshold voltage, and cannot be optimized for different supply voltages
wake
SramVSS SRAM array
Figure 11.13 Diode clamping sleep voltage
Trang 10Figure 11.14 Bias generator with replica transistors [18] (© IEEE 2006)
The example shown in Figure 11.14 aims to remove the SRAM supply’s
dependency on VDD [18] Rather than setting the array supply to VDD –
VT, which can vary depending on VDD, the array supply depends only on
transistor threshold voltages, as specified in Equation (11.1)
In this implementation, the array supply voltage specified in Equation
(11.1) is assumed to be sufficient for satisfying VccMin requirements To
adapt to different PVT conditions, the bias generator is built using replica
transistors The two replica load PFETs drop A1’s voltage to
Similarly, the two replica driver NFETs drop A2’s voltage to
Finally, the matching P1 and P1’ FETs clamp SramVSS at A1, while the
matching P2 and P2’ FETs clamp SramVSS at A2 The resulting
SramVSS is the lower of A1 and A2, producing Equation (11.1)
Trang 11SramVSS shutoff
Sleep biasing control
MND MPB
Figure 11.15 Passive sleep with parallel pull-down transistors [4] (© IEEE 2007)
In yet another example of passive cache sleep [4], a group of NFETs of different sizes were built in parallel between SramVSS and VSS, as shown
in Figure 11.15 In this implementation, VSS is gated by a shut-off FET to support cache power-down During silicon characterization, the optimal combination of these NFETs is determined to maximize leakage power savings while maintaining cell stability To provide better immunity from temperature variation, MND and MPB were added to the bias generator
In high temperature regions, the increased cell leakage would cause SramVSS to rise, reducing the supply to the memory cells and compromising stability In such regions, the reduced VTs for MND and MPS due to the high temperature would strengthen the pull-down, and reduce the amount that SramVSS rises
11.3.2 P Versus N Sleep
All the examples shown above use N-sleep, which provides the SRAM cells with true VDD and regulates SramVSS Before accessing the SRAM cells, NFETs are used to restore SramVSS to VSS Of course, designers can also implement the complementary P-sleep In P-sleep designs, SRAM cells are provided with true VSS and a regulated SramVDD Before accessing the P-sleep SRAM cells, PFETs are used to restore SramVDD
to VDD
Trang 12“1”
“1”
Figure 11.16 Junction leakage paths in SRAM cell
At first glance, N-sleep seems the obvious favorite because the superior current driving capability of NFETs allows for smaller wake-up transistors, thus producing more efficient designs However, designers must consider additional factors to make the appropriate choice For example, the VSS net in a SRAM array often has more capacitance than the VDD net, so the larger SramVSS capacitance that must be discharged may negate the increase in the NFET’s drive strength per Also, P-sleep could provide additional power savings to processes that have non-negligible junction leakage Figure 11.16 shows the junction leakage components in a typical SRAM cell, which includes 4 N-diffusion to body paths (solid arrows) and 1 P-diffusion to N-well path (dotted line arrow) Because of the greater number of N-diffusion to body paths, and because junction leakage from N-diffusion is usually worse than the junction leakage from P-diffusion, lowering VDD reduces the junction leakage more than raising VSS would This is especially important for designs that leverage the bias circuitry to help shut off portions of the cache, such as in [4] Shutting off VSS would cause SramVSS to rise, but the rise would eventually be halted by the increase in N-diffusion junction leakage as more N-diffusions are no longer at VSS Shutting off VDD, on the other hand, could allow SramVDD to drop more significantly as P-diffusion leakage is less severe than N-diffusion leakage Therefore, the proper choice between P and N sleep should be evaluated based on the specific process and SRAM cell design
11.3.3 Entering and Exiting Sleep
The goal for sleep mode is to reduce power consumption However, each time the cache enters or exits sleep mode, some active power is dissipated