2DPPC: A singlecorrection multipledetection method for ThroughSiliconVia Faults44918

This paper presents a 2D Parity Product Code 2D-PPC with the ability to correct one fault and detect, at least, two faults.. With the extension using Orthogonal Latin Square, 2D-PPC coul

Trang 1

2D-PPC: A single-correction multiple-detection

method for Through-Silicon-Via Faults

Khanh N Dang∗¶, Michael Conrad Meyer§, Akram Ben Ahmed†, Abderazek Ben Abdallah‡ and Xuan-Tu Tran¶

¶SISLAB, University of Engineering and Technology, Vietnam National University Hanoi, Hanoi, 123106, Vietnam

§G.S of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan

†National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, 305–8568, Japan

‡Adaptive Systems Laboratory, The University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan

Email:∗khanh.n.dang@ieee.org

Abstract—Through-Silicon-Via (TSV) is one of the most

promising technologies to realize 3D Integrated Circuits

(3D-ICs) However, the reliability issues due to the low yield rates,

the sensitivity to thermal hotspots and stress issues due to the

difference in temperature between layers are preventing

TSV-based 3D-ICs from being widely and efficiently used Due to

defect clustering, 3D-ICs could have multiple defects in the same

region which cannot be detected by using error correction codes

while dedicated testing could take a significant number of testing

cycles This paper presents a 2D Parity Product Code (2D-PPC)

with the ability to correct one fault and detect, at least, two

faults With the extension using Orthogonal Latin Square,

2D-PPC could detect multiple defects while reasonably increasing

the area cost and latency

Index Terms—3D-ICs, Through Silicon Via, Fault Tolerance,

Error Correction Code, Orthogonal Latin Square

I INTRODUCTION

Through-Silicon-Vias (TSVs) serve as vertical wires

be-tween two adjacent layers in Three-Dimensional Integrated

Circuits (3D-ICs) Thanks to their extremely short lengths,

their latencies are low, which could lead to extremely high

communication speeds [1] Moreover, as a 3D-IC technology,

TSV-based ICs can have smaller footprints despite the TSV’s

overheads [2], and lower power consumption thanks to the

shorter wires [1]

Despite the aforementioned advantages, reliability has been

a major concern of Through-Silicon-Vias due to their low yield

rates, vulnerability to thermal and stress, and the crosstalk

issues of parallel TSVs [3] Defects on TSVs can occur in both

random and cluster distributions [4] which create concerns

about their fault-tolerance capabilities Because of the natural

parallel structure, TSVs also face crosstalk challenges [5]

Furthermore, the difference in thermal expansion coefficients

of materials and temperature variations between two layers,

which has been reported to reach up to 10°C [6], could lead

to stress issues To enhance the reliability of TSVs, there are

three main approaches: (i) hardware fault-tolerance such as

correction circuits [7], redundancies [4], reliability mapping

[8]; (ii) information redundancy such as coding techniques

[9]–[11] or re-transmission request [12]; or (iii)

algorithm-based fault-tolerance [13] Built-in-self-test (BIST) [14] and

online testing [15], [16] techniques have also been proposed

to help the system to determine whether a TSV has a defect

or not

Although numerous methods have been proposed to solve the reliability issues of TSVs, several problems remain a challenge for designers First, to ensure the reliability of TSVs, testing and defect awareness must be provided However, a testing process using BIST [14] or external testing [17] usually causes interruptions of the system’s operations and may take

an enormous amount of cycles Therefore, the system must

be aware of when a possible fault has occurred Second, ECCs can detect TSV defects immediately; however, they are limited by a certain number of detectable faults For instance, the detection rates of Hamming or SECDED (Single Error Correction Double Error Detection) are low (one and two faults) which may lead to silent faults if multiple TSVs fail The exception is Orthogonal Latin Square Code (OLSC) [18] which provides a low latency and modular design However, OLSC does not provide extra detectability

Therefore, in this paper, we propose a coding method named Two Dimensional Parity Product Code (2D-PPC) and its matrix-switching method, which is specially designed for correcting and detecting faults in TSV-based links The con-tributions of this paper are as follows:

• 2D Parity Product Code (2D-PPC) offers one-bit cor-rection and at least two bits detection A Monte-Carlo simulation shows that 2D-PPC could detect more than two defects

• By using Orthogonal Latin Square matrices as alternative row and column coding, the detectability of 2D-PPC can

be significantly improved

• Light-weight design of the proposed 2D-PPC’s encoder and decoder Design of 2D-PPC shows lower delay values than Hamming and SECDED

The organization of this paper is as follows: Section II presents the proposed 2D-PPC Section III provides the evaluation environment and results Finally, Section IV concludes the paper

II 2D PARITYPRODUCTCODE

This section presents the proposed 2D Parity Product Code (2D-PPC) The following parts demonstrate the encoding and decoding processes with equivalent circuits Finally, we

Trang 2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 1 1 1 v

u 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 v

In 2D-PPC data bits r0 r1 r2 r3 c0 c1 c2 c3 u

Data Bits Parity Bits

Matrix-0

Matrix-1

Matrix-2

Matrix-3

alternative for c and r bits

Figure 1: Switching 2D-PPC using orthogonal Latin square

discuss the correctability and detectability of the coding

tech-nique

A Fault Consideration

Regarding behavior, we modeled the possible faults as

stuck-at faults and inverted logic behavior In other words,

the output logic value of a TSV is stuck to ‘0’ or ‘1’ or

the opposite of the true value These behaviors are generally

applied to soft errors as a single event upset, but they can also

be modeled as permanent defects or cross-talk depending on

the frequency of use The distribution of faults is defined as

random and multiple faults could happen in the same TSV

group

B Encoding

For each transmission, a TSV group send a coded flit F as

follows:

Fk =







b0,0 b0,1 b0,2 b0,N −1 r0

b1,0 b1,1 b1,2 b1,N −1 r1

bM −1,0 bM −1,1 bM −1,2 bM,N −1 rM −1







where bi,j is a data bit and

ri= bi,0⊕ bi,1⊕ · · · ⊕ bi,N −1

cj = b0,j⊕ b1,j⊕ · · · ⊕ bM −1,j

ur = r0⊕ r1⊕ · · · ⊕ rM −1

uc = c0⊕ c1⊕ · · · ⊕ cN −1

u = ur = uc = ⊕N −1i=0 ⊕M −1

j=0 (bi,j)

(1)

Note that the symbol ⊕ stands for XOR function

C Decoding

By using parity checking, the decoder can find the column and row indexes of the flipped bit The parity equations are

as follows:

sri= bi,0⊕ bi,1⊕ · · · ⊕ bi,N −1⊕ ri

scj= b0,j⊕ b1,j⊕ · · · ⊕ bM −1,j⊕ cj

scN = r0⊕ r1⊕ rM −1⊕ u

srM = c0⊕ c1⊕ cN −1⊕ u

(2)

The outputs of Eq 2 are two arrays of parity column (sc) and parity row (sr) If there is one or no flipped bit, the decoder can correct it using a masked M ask where

M ask(i, j) =

(

1 if sri== 1 and scj == 1 0

For each received flit ˆFk, the corrected flit Fk is obtained by:

Fk= ˆFk⊕ Mask The decoder fails to correct when there are two or more faults In this fashion, the decoder sends a NACK signal and

a hybrid automatic retransmission request (HARQ) is used to perform correction

NACK = (

N +1

X

i=0

sri≥ 2) OR (

M +1

X

i=0

D Correctability and Detectability

In general, 2D-PPC can ensure the ability to correct one and detect two flipped bits However, if there are more than two flipped bits, 2D-PPC still has a chance to detect them Although 2D-PPC can detect more than two faults, there is

a weak point in its detection approach that always prevents it

Trang 3

Figure 2: 2D-PPC detection ability evaluation

from detecting three faults For instance, if bits with indexes

(i, j), (i, k) and (l, j) are flipped, both cri and scj are ‘0’

which makes the decoder fail to detect while both crk and srl

could be ‘1’ This symptom makes the decoder understand that

there is one fault and corrects the bit bl,k

E 2D-PPC using switching Orthogonal Latin Square based

matrix

To correct more faults, we could extend 2D-PPC based on

Orthogonal Latin Square Note that it will limit the shape of

2D-PPC to square (M = N ) Here, 2D-PPC could be

consid-ered as an extended version of Orthogonal Latin Square code

There are two features that this could break the undetectable

pattern We could observe that the design for Matrix-2 and

Matrix-3 can be shared with the original matrices of

2D-PPC Because we target 2D-PPC for TSVs, adding extra parity

bits is not desirable; therefore, switching between matrices is

the optimal solution In the first cycle, 2D-PPC runs with its

original, then it could run alternative matrices in the following

cycles

While the original matrix could limit the undetectable

pattern, simply switching the different matrices could break

this pattern The extra cost and latency are only M × N

multiplexers and a MUX 2:1 delay, respectively

III EVALUATION

The 2D-PPC circuit is designed in Verilog-HDL with 45 nm

process technology The design is implemented using EDA

tools by Synopsys We first evaluate the detectability of

2D-PPC Then, the real implementation results are presented and

compared

A Detection performance

In order to study the detection ability of 2D-PPC, we

perform a 10,000 cases Monte-Carlo simulation represented

in Fig 2.1 With 2D-PPC (2 × 2), the results show that it can detect a high percentage of even number of faults However, with odd numbers of faults, the undetectable patterns in Section II-D occur which reduce the detection rate This could

be explained by the hidden pattern are occurred with three faults and are likely replicated with odd numbers Although

we switch the matrix, this pattern could be replicated in the alternative matrices that reduces the detectability of the method Even still, 2D-PPC provides excellent performance with a higher number of data bit-width because there is a lower chance for the worst cases of 2D-PPC to happen

By using an additional matrix and simply switching between them periodically, we can improve the accuracy of multiple fault detection With the addition of Matrix-2, we can observe

a significant improvement where most cases reach over 90% With Matrix-2 and Matrix-3, the 2D-PPC can almost break the undetectable patterns 99+% or 100% of the time However, there is a special case of having 5 faults when M=N=2 where switching matrices cannot help in tackling the undetectable patterns This could be easily accepted because having 5 faults out of 9 TSVs makes detection infeasible

In summary, the proposed 2D-PPC provides a reasonable detection rate By using extra matrices, it could help detect multiple faults without adding overwhelming extra area cost (M × N 2:1 multiplexers and a 2-bit counter)

B Hardware Implementation The hardware implementations of 2D-PPC are presented in Table I Besides the works in [19] and [20], we also perform the comparison with results obtained from [12] which are implemented in 65 nm technology Even when scaling to 45nm, the area cost of Hamming Product Code (HP-HARQ-II) in [12] is ' 8× higher than 2D-PPC The BCH [12] code provides multi-bit correction; however, its complexity

1 Extra test results: http://dangnamkhanh.com/share/2D-PPC extra all.csv

Trang 4

Table I: Hardware implementation results: “AO” and “DO” are Area Optimization and Delay Optimization, respectively.

Scheme Tech (nm) k (bit) n (bit)

Area Cost (µm 2 ) Latency (ns) Encoder Decoder Encoder Decoder

AO DO AO DO AO DO AO DO Hamming [9] 45 64 71 193.1200 463.1060 0.69 1.58 SECDED [10] 45 64 72 234.6120 487.0460 0.75 1.62

HP + HARQ-II [12] 65 64 72 9792.5 0.41 0.59

SEC-DAEC [19] 45 64 72 678 812 3106 4227 0.61 0.33 1.75 0.61 TAEC-64 [20] 45 64 72 566 695 5279 7165 0.58 0.30 1.81 0.62 2D-PPC(8 × 8) 45 64 81 201.8940 442.8900 0.44 0.54 2D-PPC(8 × 8)+Matrix-2 45 64 81 341.2780 628.0260 0.53 0.91 2D-PPC(8 × 8)+Matrix-2&3 45 64 81 404.5860 691.3340 0.55 0.97

is 50× more than the proposed one HP-HARQ-II encoder’s

and decoder’s latencies are 6.82% lower and 9.26% higher

while using older technology However, 2D-PPC’s latency is

still extremely low (0.44 ns and 0.54 ns) Meanwhile, the

area cost is similar to Hamming and SECDED which are

two simple coding techniques It is important to mention that

the area cost results have not taken into account the area of

the TSVs With the same 64 data bit-width, 2D-PPC uses 81

code-word bit-width (or TSVs) while Hamming, SECDED,

BCH use 71, 72 and 85 code-word bit-width (or TSVs),

respectively To support the switching technique with different

matrices, additional circuits are needed which increases the

area cost and the latency Utilizing one or two additional

matrices maintaining a latency below 1ns and increased the

area cost by factors of 1.5x and 1.7x, respectively, but greatly

improved the detection rate compared to using a single matrix

IV CONCLUSION

This paper presents the 2D Parity Product Code (2D-PPC)

to enhance the reliability of TSV-based 3D-IC designs By

exploiting the inherent 2D array organization of TSVs, the

proposed approach can efficiently represent the fault

manifes-tation in TSV-based systems allowing it to correct one and

detect at least two faults in a set of TSVs

From the conducted experiments, and in contrast to

conven-tional coding schemes that are limited to detecting a certain

number of faults, the proposed 2D-PPC has demonstrated its

ability to detect several defects while keeping a reasonable area

cost and latency Thank to the matrix-switching technique,

2D-PPC significantly improves the detection rate to allow it detect

most of the fault cases

As a future work, we plan to apply the 2D-PPC to a

dedicated 3D-IC architecture (e.g., 3D-RAM, 3D-NoCs) to

investigate the impact on the overall system Extending the

technique with adaptive coding and different based coding

methods is another possible direction

V ACKNOWLEDGMENT

This research is funded by Vietnam National Foundation for

Science and Technology Development (NAFOSTED) under

grant number 102.01-2018.312

REFERENCES [1] W R Davis et al., “Demystifying 3D ICs: The pros and cons of going vertical,” IEEE Des Test Comput., vol 22, no 6, pp 498–510, 2005 [2] X Dong and Y Xie, “System-level cost analysis and design exploration for three-dimensional integrated circuits (3D ICs),” in Proc of the 2009 Asia and South Pacific Des Automation Conf., 2009, pp 234–241 [3] G Van der Plas et al., “Design issues and considerations for low-cost 3-D TSV IC technology,” IEEE J Solid-State Circuits, vol 46, no 1,

pp 293–307, 2011.

[4] L Jiang et al., “On effective through-silicon via repair for 3-D-stacked ICs,” IEEE Trans Comput.-Aided Design Integr Circuits Syst., vol 32,

no 4, pp 559–571, 2013.

[5] A Eghbal et al., “Analytical fault tolerance assessment and metrics for TSV-based 3D network-on-chip,” IEEE Trans Comput., vol 64, no 12,

pp 3591–3604, 2015.

[6] Y J Park et al., “Thermal analysis for 3D multi-core processors with dynamic frequency scaling,” in 2010 IEEE/ACIS 9th Int Conf on Comput and Inform Sci (ICIS) IEEE, 2010, pp 69–74.

[7] M Cho et al., “Design method and test structure to characterize and repair TSV defect induced signal degradation in 3D system,” in Proc Int Conf on Comput.-Aided Des., 2010, pp 694–697.

[8] F Ye and K Chakrabarty, “TSV open defects in 3D integrated circuits: Characterization, test, and optimal spare allocation,” in Proc of the 49th Annu Des Automation Conf ACM, 2012, pp 1024–1030.

[9] R W Hamming, “Error detecting and error correcting codes,” Bell Labs Tech J., vol 29, no 2, pp 147–160, 1950.

[10] M.-Y Hsiao, “A class of optimal minimum odd-weight-column SEC-DED codes,” IBM J Res Dev., vol 14, no 4, pp 395–401, 1970 [11] R Kumar and S P Khatri, “Crosstalk avoidance codes for 3D VLSI,”

in Automation and Test in Europe EDA Consortium, 2013, pp 1673– 1678.

[12] B Fu and P Ampadu, “On hamming product codes with type-ii hybrid ARQ for on-chip interconnects,” IEEE Trans Circuits Syst I, vol 56,

no 9, pp 2042–2054, 2009.

[13] K N Dang et al., “Scalable design methodology and online algorithm for TSV-cluster defects recovery in highly reliable 3D-NoC systems,” IEEE Trans Emerg Topics Comput., in press.

[14] Y Lou et al., “Comparing through-silicon-via (TSV) void/pinhole defect self-test methods,” Journal of Electronic Testing, vol 28, no 1, pp 27–

38, 2012.

[15] K N Dang et al., “TSV-IaS: Analytic analysis and low-cost non-preemptive on-line detection and correction method for TSV defects,”

in The IEEE Symposium on VLSI (ISVLSI) 2019, 2019.

[16] Y Zhao et al., “Online Fault Tolerance Technique for TSV-Based 3-D-IC,” IEEE Trans VLSI Syst., vol 23, no 8, pp 1567–1571, 2015 [17] B Noia et al., “Pre-bond probing of TSVs in 3D stacked ICs,” in 2011 IEEE Int Test Conf (ITC) IEEE, 2011, pp 1–10.

[18] M Hsiao, D Bossen, and R Chien, “Orthogonal latin square codes,” IBM Journal of Research and Development, vol 14, no 4, pp 390–394, 1970.

[19] A Dutta and N A Touba, “Multiple bit upset tolerant memory using a selective cycle avoidance based SEC-DED-DAEC code,” in 25th IEEE VLSI Test Symp IEEE, 2007, pp 349–354.

[20] L.-J Saiz-Adalid et al., “MCU tolerance in SRAMs through low-redundancy triple adjacent error correction,” IEEE Trans VLSI Syst., vol 23, no 10, pp 2332–2336, 2015.

Định dạng
Số trang	4
Dung lượng	733,55 KB