1. Trang chủ
  2. » Công Nghệ Thông Tin

Fault Tolerant Computer Architecture-P1 pptx

10 223 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 121,23 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Fault Tolerant Computer Architecture... Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50 to 150 page publications on topics pertaining to t

Trang 1

Fault Tolerant Computer Architecture

Trang 3

Chapter Title here

Kratos

Editor

Mark D Hill, University of Wisconsin, Madison

Synthesis Lectures on Computer Architecture publishes 50 to 150 page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals

Fault Tolerant Computer Architecture

Daniel Sorin

2009

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale

Machines

Luiz André Barroso and Urs Hölzle

2009

Computer Architecture Techniques for Power-Efficiency

Stefanos Kaxiras and Margaret Martonosi

2008

Chip Mutiprocessor Architecture: Techniques to Improve Throughput and Latency

Kunle Olukotun, Lance Hammond, James Laudon

2007

Transactional Memory

James R Larus, Ravi Rajwar

2007

Quantum Computing for Computer Architects

Tzvetan S Metodi, Frederic T Chong

2006

Synthesis Lectures on Computer

Architecture

Trang 4

Copyright © 2009 by Morgan & Claypool

All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations

in printed reviews, without the prior permission of the publisher.

Fault Tolerant Computer Architecture

Daniel Sorin

www.morganclaypool.com

ISBN: 9781598299533 paperback

ISBN: 9781598299540 ebook

DOI: 10.2200/S00192ED1V01Y200904CAC005

A Publication in the Morgan & Claypool Publishers series

SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE

Lecture #5

Series Editor: Mark D Hill, University of Wisconsin, Madison

Series ISSN

ISSN 1935-3235 print

ISSN 1935-3243 electronic

Trang 5

Fault Tolerant Computer Architecture

Daniel J Sorin

Duke University

SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #5

Trang 6

For many years, most computer architects have pursued one primary goal: performance Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore’s law into remarkable increases in performance Recently, however, the bounty provided by Moore’s law has been accompanied by several challenges that have arisen as devices have become smaller, includ-ing a decrease in dependability due to physical faults In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and

to present the current state-of-the-art—over approximately the past 10 years—in academia and industry

vi

KEyWoRDS

fault tolerance (or fault tolerant), reliability, dependability, computer architecture, error detection, error recovery, fault diagnosis, self-repair, autonomous, dynamic verification

Trang 7

“To Deborah, Jason, and Julie”

Dedication

Trang 8

I would like to thank my family for their support while I was writing this lecture I would also like to thank Mark Hill for inviting me to write this lecture and Mike Morgan for organizing the produc-tion of the lecture Valuable feedback on early drafts of the lecture was provided by Babak Falsafi, Jude Rivers, and Mark Hill I would also like to thank Lihao Xu for helping me with a question about error coding

Acknowledgments

Trang 9

1 Introduction 1

1.1 Goals of this Book 1

1.2 Faults, Errors, and Failures 2

1.2.1 Masking 2

1.2.2 Duration of Faults and Errors 3

1.2.3 Underlying Physical Phenomena 3

1.3 Trends Leading to Increased Fault Rates 5

1.3.1 Smaller Devices and Hotter Chips 5

1.3.2 More Devices per Processor 6

1.3.3 More Complicated Designs 6

1.4 Error Models 7

1.4.1 Error Type 7

1.4.2 Error Duration 8

1.4.3 Number of Simultaneous Errors 8

1.5 Fault Tolerance Metrics 9

1.5.1 Availability 9

1.5.2 Reliability 10

1.5.3 Mean Time to Failure 10

1.5.4 Mean Time Between Failures 10

1.5.5 Failures in Time 10

1.5.6 Architectural Vulnerability Factor 11

1.6 The Rest of This Book 12

1.7 References 13

2 Error Detection 19

2.1 General Concepts 19

2.1.1 Physical Redundancy 19

2.1.2 Temporal Redundancy 22

Contents

ix

Trang 10

2.1.3 Information Redundancy 22

2.1.4 The End-to-End Argument 25

2.2 Microprocessor Cores 27

2.2.1 Functional Units 27

2.2.2 Register Files 29

2.2.3 Tightly Lockstepped Redundant Cores 29

2.2.4 Redundant Multithreading Without Lockstepping 30

2.2.5 Dynamic Verification of Invariants 34

2.2.6 High-Level Anomaly Detection 39

2.2.7 Using Software to Detect Hardware Errors 41

2.2.8 Error Detection Tailored to Specific Fault Models 42

2.3 Caches and Memory 44

2.3.1 Error Code Implementation 44

2.3.2 Beyond EDCs 45

2.3.3 Detecting Errors in Content Addressable Memories 46

2.3.4 Detecting Errors in Addressing 47

2.4 Multiprocessor Memory Systems 48

2.4.1 Dynamic Verification of Cache Coherence 49

2.4.2 Dynamic Verification of Memory Consistency 50

2.4.3 Interconnection Networks 52

2.5 Conclusions 52

2.6 References 52

3 Error Recovery 61

3.1 General Concepts 61

3.1.1 Forward Error Recovery 61

3.1.2 Backward Error Recovery 62

3.1.3 Comparing the Performance of FER and BER 68

3.2 Microprocessor Cores 69

3.2.1 FER for Cores 69

3.2.2 BER for Cores 69

3.3 Single-Core Memory Systems 71

3.3.1 FER for Caches and Memory 71

3.3.2 BER for Caches and Memory 72

3.4 Issues Unique to Multiprocessors 73

x FAULT ToLERANT CoMPUTER ARCHITECTURE

Ngày đăng: 03/07/2014, 19:20