Computer Science Room 3.10, Merchant Venturers Building Woodland Road, Bristol United Kingdom, BS8 1UB Cornell UniversityIthaca, NY 14853-7501, USA ISBN 978-1-84882-255-9 e-ISBN 978-1-84
Trang 2Texts in Computer Science
Trang 3Computer Architecture
123
A Practical Introduction to
Trang 4Dr Daniel Page
University of Bristol
Dept Computer Science
Room 3.10, Merchant Venturers Building
Woodland Road, Bristol
United Kingdom, BS8 1UB
Cornell UniversityIthaca, NY 14853-7501, USA
ISBN 978-1-84882-255-9 e-ISBN 978-1-84882-256-6
DOI 10.1007/978-1-84882-256-6
Springer Dordrecht Heidelberg London New York
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Control Number: 2009922086
c
Springer-Verlag London Limited 2009
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by the Copyright Licensing Agency Enquiries concerning reproduction outside those terms should be sent
to the publishers.
The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Trang 6It is a great pleasure to write a preface to this book In my view, the content isunique in that it blends traditional teaching approaches with the use of mathematicsand a mainstream Hardware Design Language (HDL) as formalisms to describekey concepts The book keeps the “machine” separate from the “application” bystrictly following a bottom-up approach: it starts with transistors and logic gates andonly introduces assembly language programs once their execution by a processor isclearly defined
Using a HDL, Verilog in this case, rather than static circuit diagrams is a bigdeviation from traditional books on computer architecture Static circuit diagramscannot be explored in a hands-on way like the corresponding Verilog model can Inorder to understand why I consider this shift so important, one must consider howcomputer architecture, a subject that has been studied for more than 50 years, hasevolved
In the pioneering days computers were constructed by hand An entire computercould (just about) be described by drawing a circuit diagram Initially, such dia-grams consisted mostly of analogue components before later moving toward dig-ital logic gates The advent of digital electronics led to more complex cells, such
as half-adders, flip-flops, and decoders being recognised as useful building blocks.However, miniaturisation of devices and hence computers has led to the design ofsingle circuits containing millions or even billions of components As a result, hand-lay-out is only used for specific modules, and circuit diagrams are less useful as amechanism for describing functionality for real circuits
Instead, two formalisms are used in industry: HDLs and mathematics A HDLallows us to tell the component-layout and simulation tools how we would like toimplement our circuit; mathematics tells us what the circuit ought to do In order
to verify whether the circuit does what we want it to, we can (partly mechanically)compare the mathematical description with the HDL description This representsincreased use of abstraction to cope with complexity, and an engineer can now beproductive by simply understanding and using high-level circuit design (e.g., multi-plier design or pipelined processors) and formalisms (e.g., HDLs and mathematics)
vii
Trang 7Circuit diagrams are still used in the design flow, but mostly to sketch the physicallayout, in order to predict whether a circuit can be laid out sensibly.
Dealing with gaps in understanding between such a wide range of concepts andtechniques is often off-putting for people new to the subject The best way to ap-proach the problem is by placing it within a practical context that enables students
to experiment with ideas and discover themselves the advantages and disadvantages
of a particular technique
In this book, Dan does just that by giving an excellent overview of key conceptsand an introduction to formalisms with which they can be explored I hope this bookwill inspire many readers to follow a career in this fascinating subject
Henk Muller
Principal Technologist, XMOS
Trang 8excel-1 Such modules are often regarded as unpopular and irrelevant by students whohave not been exposed to the subject before, and who view a computer systemfrom the applications level This is compounded by the prevalence of technolo-gies such as Java which place a further layer between the student and actualcomputer hardware In short, and no matter how one tries to persuade them oth-erwise, students often see no point in learning about the internals of a computersystem because they cannot see the benefit.
2 Conventional textbooks teach the subject in a different way than in other modulesstudents are exposed to at the same time For example, conventional wisdom saysthat one cannot “teach” programming, one has to “do” programming in order
to learn This is in stark contrast to textbooks on computer architecture wherestudents are often forced to learn in a more theoretical way, learning by takingfacts for granted rather than experimenting to arrive at their own conclusions Forexample, because of the difficulty in working with large logic designs on paper,any practical work is often limited and hence detached from the more challengingcontent
I would argue that this is a shame: computer architecture represents a broad trum of fundamental and exciting topics that underpin computer science in gen-eral Aside from the technical challenges and sense of achievement that stem from
spec-ix
Trang 9understanding exactly how high-level programs are actually executed on devicesbuilt from simple building blocks, historical developments in computer architectureneatly capture and explain many design decisions that have shaped a landscape wenow take for granted The representation of strings in C is a great example: the null-terminated ASCIIZ approach was not adopted for any real reason other than thePDP-7 computer included instructions ideal for processing strings in this form, andyet we still live with this decision years after the PDP-7 became obsolete Seem-ingly frivolous anecdotes and examples like this are increasingly being consigned
to history whereas from an Engineering perspective, one would like to learn andunderstand previous approaches so as to potentially improve in the future
International experts regularly debate tools and techniques for deliveringUniversity-level modules in computer architecture; the Workshop on Computer Ar-chitecture Education (WCAE), currently held in conjunction with the InternationalSymposium on Computer Architecture (ISCA), is the premier research conference
in this area This book represents an attempt at translating my personal philosophy,that theoretical concepts should be accessible for practical experimentation, into aform suitable for use in such modules Put simply, I see computer architecture as asubject in which “getting things done” is paramount; the ability to understand trade-offs before selecting between and implementing well considered design options isoften as important as the study of those options at a more theoretical level This fo-cus is underlined by the book sub-title: a “practical” approach is the aim throughout
To enable this, a key feature of this book is inclusion and use of a hardware tion language (i.e., Verilog), and a concrete processor (i.e., MIPS32) as practicalvehicles for modelling and experimenting with digital logic and processor design
descrip-Target Audience
The content is organised into three parts which contain a total of thirteen core ters Although some slight disagreement about inclusion of specific topics is in-evitable, the chapters represent a compromise between my informal opinion, inter-ests and experience, and more formal curriculum guidelines such as that developed
chap-by the UK Quality Assurance Agency (QAA):
http://www.qaa.ac.uk/
Of course, international equivalents exist; examples include those developed jointly
by the IEEE and ACM, leading professional bodies within this domain:
http://www.computer.org/curriculum/
The general aim of this book is to cover topics every computer science student
should have at least a basic grasp of, and equip said students with enough edge to read and understand more advanced textbooks In this respect, the core tar-get audience is first-year Undergraduate students with a rudimentary knowledge of
Trang 10knowl-Preface xi
programming in C More generally, the book content is pitched at a level which isfies most of the demands that have resulted from our degree programmes at theUniversity of Bristol In particular, the more advanced material has proved useful as
sat-a bridge towsat-ard, or in support of, more specisat-alised textbooks thsat-at cover lsat-ater-yesat-arUndergraduate and Postgraduate modules
Organisation
The book chapters are described briefly below Very roughly the three parts of thebook can be viewed as somewhat self-contained, representing three layers or levels
of abstraction: the digital logic layer, the instruction set and micro-architecture
layer, and the hardware/software interface Part 1 deals with basic tools and
tech-niques which underpin the rest of the book:
num-ber representation
ba-sics of digital logic including logic gates and their construction using transistors,combinatorial and clocked circuits and their optimisation
pre-vious chapter, Verilog is presented in an introductory manner; this content iswritten with a reader who is a proficient C programmer in mind
Part 2 deals with the broad topic of processor design and implementation The tent takes a step-by-step approach, starting with a functional description of a com-puter processor and gradually expanding on the details, issues and techniques thathave resulted in modern, high-performance processor designs:
to the study of general circuits) is to track historical developments and use them
as a means to explain central concepts such as the fetch-decode-execute cycle
form of MIPS32 is discussed; this discussion includes details such as addressingmodes and instruction encoding for example
in terms of various metrics which can be used to defined quality, focusing onperformance in particular
for arithmetic (e.g., addition and multiplication) is introduced and demonstratedusing Verilog
the memory hierarchy is introduced and demonstrated using Verilog
investigated including approaches such as superscalar and vector processors
Trang 11Finally, Part 3 attempts to bridge the gap between hardware and software by ining the programming tools and operating system concepts that support the devel-opment and execution of programs:
de-velopment tool-chain, starting with linkers and assemblers This material is ported by and links to Appendix A, which provides a stand-alone tutorial on us-ing SPIM, a MIPS32 simulator As such, it provides a concrete means of writingprograms for the processor design introduced earlier in the book
fo-cusing on their aspects which are most closely tied to the processor they target.For example, register allocation, instruction select and scheduling are all covered
sys-tem layer is introduced in a practical way by using SPIM as a platform for realimplementations of concepts such as scheduling and interrupt handling
programming This is written from the point of view of a programmer who wants(or needs) to capitalise on the behaviour and characteristics of the concepts pre-sented in previous chapters to improve their programs
Each chapter concludes with a set of example questions which are largely at a levelone might encounter in a degree-level examination These questions are numberedconsecutively (rather than relative to the chapter number) and a set of example so-lutions can be found in Appendix B
Although this content might clearly contain accidental omissions, some topicshave been avoided by design Optimistically, one might view them as ideal for in-clusion in future versions of the book; more realistically they represent topics thatare important but not vital at the level the book is aimed at:
• Perhaps the largest omission, at least the one which seems likely to prompt the
loudest outcry, is that of floating point arithmetic Although the book covers
floating point representation briefly, the circuits for arithmetic and instructionsets that use them are integer only The rationale for this decision was purelyspace: floating point represents a fairly self-contained topic which could be leftout without too negative an impact on core topics covered by the rest of thecontent
• A similar situation exists with the topic of design verification: although testing
the digital logic designs one can produce with Verilog is important, one couldeasily dedicate an entire book to the art of good design verification
• The main emphasis of the book is processors and as such, they are examined
somewhat in isolation This contrasts with reality where such processors typicallyform part of a larger system including peripherals, communication networks and
so on As such, the broad topic of system-level design, which is central to other
books, is not covered here
• At the time of writing, the “hot topic” in computer architecture is the advent of
multi-core devices, i.e., many processors on a single chip Although this has been
Trang 12Preface xiii
a research area for many years in an attempt to cope with design complexity andeffective use of increased transistor counts, commodity multi-core devices havebreathed new life into the subject Again, because the emphasis is more intra-processor than inter-processor, we omit the topic here although among all theoptions for future material this is perhaps the most compelling
• The book covers only traditional logic styles However, for specific application
areas it has become attractive to investigate alternatives; the idea is basically thatone should be aware of available alternatives and use the right one in the right
place Specifically, secure logic styles (which help to prevent certain types of passive attack) and asynchronous logic (which avoids the need for global clock
signals) represent interesting avenues for future content
Contact
As people who know me will (too) willingly attest, I am far from perfect As a result,this book is sure to include problems in the shape of minor errors and mistakes Ifyou find such a problem, or have a more general comment, I would be glad to hearabout it; you can contact me via
http://www.cs.bris.ac.uk/home/page/
Acknowledgements
This book was typeset with LATEX, originally developed by Leslie Lamport andbased on TEX by Donald Knuth; the listingspackage by Carsten Heinz, thealgorithm2epackage by Christophe Fiorio andkarnaughby Andreas Wielandwere all used in addition to the basic system The various figures and source codelisting were generated with the help ofModelSim, a Verilog simulator by MentorGraphics;GTKWave, a VCD waveform viewer by Anthony Bybell;xfig, a generalvector drawing package originally written by Supoj Sutanthavibul and maintained
by many others;xcircuit, a circuit vector drawing package by Tim Edwards;asymptote, a scripted vector drawing package by Andy Hammerlindl, John Bow-man, and Tom Prince; SPIM, a MIPS32 simulator written by James Larus; Dinero,
a cache simulator written by Jan Edler and Mark Hill; and finallygccthe GNU Ccompiler, originally by Richard Stallman and now maintained by many others.Throughout the book, images from other sources are reproduced under specificlicenses; each image of this type carefully notes the source and the license in ques-tion Specific details of the GNU Free Documentation License and Creative Com-mons Licenses can be found via
http://www.gnu.org/licenses/
and
Trang 13Of course, as with any project, numerous people contributed in other ways I wouldlike to thank the (extended) Page, Symonds, Gould and Hunkin families and myfriends Stan and Paul, Gavin and Heather and Fry’s Hockey Club for their support,and providing an escape from work; Maisie, you still owe me a “Busy Bee” forhelping with your homework The book could not have been completed without thehelp and guidance of Wayne Wheeler, Catherine Brett and Simon Rees at Springer-Verlag It probably would not have been started at all without the encouragementand tutelage of Nigel Smart, Henk Muller, David May and James Irwin within theComputer Science Department at the University of Bristol Staff and students in theCryptography and Information Security Group have provided a constant soundingboard for ideas; I would like to thank Andrew Moss, Rob Granger, Philipp Grabherand Johann Großsch¨adl in particular Like all good Engineers, students at the Uni-versity of Bristol have never been shy to say when I am talking nonsense or presentsomething in a particularly boring way; my thanks to them and many anonymousreviewers for improving the editorial quality throughout
Most of all I thank Kate for making it all worthwhile
Trang 14Part I Tools and Techniques
1 Mathematical Preliminaries 3
1.1 Propositions and Predicates 3
1.1.1 Connectives 5
1.1.2 Quantifiers 7
1.1.3 Manipulation 8
1.2 Sets and Functions 10
1.2.1 Construction 11
1.2.2 Operations 12
1.2.3 Numeric Sets 14
1.2.4 Functions 14
1.2.5 Relations 17
1.3 Boolean Algebra 18
1.3.1 Boolean Functions 21
1.3.2 Normal Forms 22
1.4 Number Systems 23
1.4.1 Converting Between Bases 25
1.4.2 Bits, Bytes and Words 27
1.4.3 Representing Numbers and Characters 29
1.5 Toward a Digital Logic 40
1.6 Further Reading 41
1.7 Example Questions 42
2 Basics of Digital Logic 43
2.1 Switches and Transistors 43
2.1.1 Basic Physics 43
2.1.2 Building and Packaging Transistors 44
2.2 Combinatorial Logic 48
2.2.1 Basic Logic Gates 48
2.2.2 3-state Logic 50
xv
Trang 152.2.3 Designing Circuits 51
2.2.4 Simplifying Circuits 53
2.2.5 Physical Circuit Properties 61
2.2.6 Basic Building Blocks 63
2.3 Clocked and Stateful Logic 74
2.3.1 Clocks 75
2.3.2 Latches 77
2.3.3 Flip-Flops 80
2.3.4 State Machines 81
2.4 Implementation and Fabrication Technologies 87
2.4.1 Silicon Fabrication 87
2.4.2 Programmable Logic Arrays 90
2.4.3 Field Programmable Gate Arrays 92
2.5 Further Reading 93
2.6 Example Questions 93
3 Hardware Design Using Verilog 97
3.1 Introduction 97
3.1.1 The Problem of Design Complexity 97
3.1.2 Design Automation as a Solution 98
3.2 Structural Design 100
3.2.1 Modules 100
3.2.2 Wires 101
3.2.3 Values and Constants 103
3.2.4 Comments 104
3.2.5 Basic Instantiation 105
3.2.6 Nested Instantiation 108
3.2.7 User-Defined Primitives 108
3.3 Higher-level Constructs 109
3.3.1 Continuous Assignments 110
3.3.2 Selection and Concatenation 112
3.3.3 Reduction 114
3.3.4 Timing and Delays 114
3.4 State and Clocked Design 115
3.4.1 Registers 115
3.4.2 Processes and Triggers 117
3.4.3 Procedural Assignments 118
3.4.4 Timing and Delays 120
3.4.5 Further Behavioural Statements 121
3.4.6 Tasks and Functions 126
3.5 Effective Development 127
3.5.1 System Tasks 128
3.5.2 Using the Pre-processor 129
3.5.3 Parameters 131
3.5.4 Named Port Lists 132
Trang 16Contents xvii
3.5.5 Generate Statements 133
3.5.6 Simulation and Stimuli 134
3.6 Further Reading 136
3.7 Example Questions 137
Part II Processor Design 4 A Historical and Functional Perspective 143
4.1 Introduction 143
4.2 Special-Purpose Computers 144
4.3 General-Purpose Computers 146
4.4 Stored Program Computers 150
4.5 Toward Modern Computers 158
4.5.1 The von Neumann Bottleneck 160
4.5.2 Data-Dependent Control-Flow 160
4.5.3 Self-Modifying Programs 161
4.6 Further Reading 166
4.7 Example Questions 166
5 Basic Processor Design 169
5.1 A Concrete Stored Program Architecture 169
5.1.1 Major Data-path Components 171
5.1.2 Describing Instruction Behaviour 174
5.1.3 The Fetch-Decode-Execute Cycle 176
5.1.4 Controlling the Data-path 176
5.2 Buses 177
5.2.1 Synchronous Buses 177
5.2.2 Asynchronous Buses 178
5.3 Addressing Modes 179
5.3.1 Immediate Addressing 180
5.3.2 Register Addressing 180
5.3.3 Memory Addressing 180
5.4 Instruction Encoding 183
5.4.1 Instruction Selection 183
5.4.2 Instruction Formats 186
5.4.3 Basic Encoding and Decoding 186
5.4.4 More Complicated Encoding Issues 189
5.5 Control-Flow 192
5.5.1 Predicated Execution 194
5.5.2 Function Calls 197
5.6 Some Design Philosophy 198
5.6.1 Moore’s Law 198
5.6.2 RISC versus CISC 199
5.7 Putting It All Together 200
5.8 Further Reading 208
5.9 Example Questions 209
Trang 176 Measuring Performance 213
6.1 Measuring Performance 213
6.1.1 Estimating Execution Time 214
6.1.2 Measuring Execution Time 216
6.1.3 Benchmark Programs 218
6.1.4 Measuring Improvement 219
6.2 Further Reading 220
6.3 Example Questions 220
7 Arithmetic and Logic 223
7.1 Introduction 223
7.2 Comparisons 224
7.2.1 Unsigned Comparisons 225
7.2.2 Signed Comparisons 228
7.3 Addition and Subtraction 228
7.3.1 Addition 228
7.3.2 Subtraction 233
7.4 Shift and Rotate 236
7.4.1 Bit-Serial Shifter 238
7.4.2 Logarithmic Shifter 241
7.5 Multiplication 243
7.5.1 Bit-Serial Multiplier 245
7.5.2 Tree Multiplier 249
7.5.3 Digit-Serial Multiplier 250
7.5.4 Early Termination 251
7.5.5 Wallace and Dadda Trees 253
7.5.6 Booth Recoding 257
7.6 Putting It All Together 259
7.6.1 Comparison ALU 259
7.6.2 Arithmetic ALU 262
7.7 Further Reading 265
7.8 Example Questions 266
8 Memory and Storage 269
8.1 Introduction 269
8.1.1 Historical Memory and Storage 271
8.1.2 A Modern Memory Hierarchy 277
8.1.3 Basic Organisation and Implementation 279
8.1.4 Memory Banking 288
8.1.5 Access Locality 290
8.2 Memory and Storage Specifics 291
8.2.1 Static RAM (SRAM) and Dynamic RAM (DRAM) 291
8.2.2 Non-volatile RAM and ROM 293
8.2.3 Magnetic Disks 294
Trang 18Contents xix
8.2.4 Optical Disks 296
8.2.5 Error Correction 297
8.3 Basic Cache Memories 300
8.3.1 Fetch Policy 303
8.3.2 Write Policy 304
8.3.3 Direct-Mapped Caches 305
8.3.4 Fully-Associative Caches 310
8.3.5 Set-Associative Caches 312
8.3.6 Cache Organisation 315
8.4 Advanced Cache Memories 316
8.4.1 Victim Caches 316
8.4.2 Gated and Drowsy Caches 319
8.5 Putting It All Together 321
8.5.1 Register File 322
8.5.2 Main Memory 323
8.5.3 Cache Memory 324
8.6 Further Reading 329
8.7 Example Questions 329
9 Advanced Processor Design 331
9.1 Introduction 331
9.1.1 A Taxonomy of Parallelism 332
9.1.2 Instruction-Level Parallelism (ILP) 335
9.2 Pipelined Processors 339
9.2.1 Pipelined Circuits 343
9.2.2 Pipelined Processors 347
9.2.3 Pipeline Hazards 350
9.2.4 Stalls and Hazard Resolution 352
9.3 Superscalar Processors 360
9.3.1 Basic Concept 360
9.3.2 Step 1: Scoreboard-based Design 361
9.3.3 Step 2: Reservation Station-based Design 370
9.3.4 Further Improvements 378
9.4 Vector Processors 380
9.4.1 Basic Concept 380
9.4.2 A Dedicated Vector Processor 382
9.4.3 SIMD Within A Register (SWAR) 384
9.4.4 Issues of Vectorisation 386
9.5 VLIW Processors 389
9.5.1 Basic Concept 389
9.6 Further Reading 390
9.7 Example Questions 390
Trang 19Part III The Hardware/Software Interface
10 Linkers and Assemblers 397
10.1 Introduction 397
10.2 The Memory Model 400
10.2.1 Stack Section 401
10.2.2 Static Data Section 402
10.2.3 Dynamic Data Section 403
10.3 Executable Versus Object Files 407
10.4 Linkers 410
10.4.1 Static and Dynamic Linkage 413
10.4.2 Boot-strap Functions 415
10.4.3 Symbol Relocation 416
10.4.4 Symbol Resolution 417
10.5 Assemblers 417
10.5.1 Basic Assembly Language Statements 418
10.5.2 Using Machine Instructions 420
10.5.3 Using Assembler Aliases 429
10.5.4 Using Assembler Directives 431
10.5.5 Peephole Optimisation 434
10.5.6 Some Short Example Programs 435
10.5.7 The Forward Referencing Problem 439
10.5.8 An Example Assembler 441
10.6 Further Reading 449
10.7 Example Questions 449
11 Compilers 451
11.1 Introduction 451
11.2 Compiler Bootstrapping and Re-Hosting 453
11.3 Intermediate Representation 454
11.4 Register Allocation 457
11.4.1 An Example Allocation 461
11.4.2 Uses for Pre-colouring 463
11.4.3 Avoiding Error Cases via Spilling 464
11.5 Instruction Selection and Scheduling 467
11.5.1 Instruction Selection 468
11.5.2 Instruction Scheduling 470
11.5.3 Scheduling Basic Blocks 470
11.5.4 Scheduling Instructions 471
11.6 “Template” Code Generation for High-Level Statements 473
11.6.1 Conditional Statements 475
11.6.2 Loop Statements 477
11.6.3 Multi-way Branch Statements 478
11.7 “Template” Code Generation for High-Level Function Calls 479
11.7.1 Basic Stack Frames 481
Trang 20Contents xxi
11.7.2 Advanced Stack Frames 485
11.8 Further Reading 491
11.9 Example Questions 492
12 Operating Systems 495
12.1 Introduction 495
12.2 The Hardware/Software Interface 497
12.2.1 MIPS32 Co-processor Registers 497
12.2.2 MIPS32 Processor Modes 499
12.2.3 MIPS32 Assembly Language 500
12.3 Boot-Strapping 501
12.4 Event Management 502
12.4.1 Handling Interrupts 503
12.4.2 Handling Exceptions 505
12.4.3 Handling Traps 506
12.4.4 An Example Exception Handler 510
12.5 Memory Management 511
12.5.1 Basic Concept 512
12.5.2 Pages and Frames 515
12.5.3 Address Translation and Memory Access 518
12.5.4 Page Eviction and Replacement 520
12.5.5 Translation Look-aside Buffer (TLB) 521
12.6 Process Management 522
12.6.1 Storing and Switching Process Context 523
12.6.2 Process Scheduling 524
12.6.3 An Example Scheduler 529
12.7 Further Reading 533
12.8 Example Questions 534
13 Efficient Programming 535
13.1 Introduction 535
13.2 “Space” Conscious Programming 536
13.2.1 Reducing Register Pressure 536
13.2.2 Reducing Memory Allocation 537
13.3 “Time” Conscious Programming 540
13.3.1 Effective Short-circuiting 540
13.3.2 Branch Elimination 541
13.3.3 Loop Fusion and Fission 542
13.3.4 Loop Unrolling 545
13.3.5 Loop Hoisting 546
13.3.6 Loop Interchange 548
13.3.7 Loop Blocking 549
13.3.8 Function Inlining 551
13.3.9 Software Pipelining 554
13.4 Example Questions 556
Trang 21Part IV Appendices
SPIM: A MIPS32 Simulator 561
A.1 Introduction 561
A.2 Configuring SPIM 562
A.3 Controlling SPIM 563
A.4 Example Program Execution 565
A.5 Using System Calls 570
Example Solutions 573
References 629
Index 633
Trang 22Part I Tools and Techniques
Trang 23Mathematical Preliminaries
In mathematics you don’t understand things You just get used to them.
– J von Neumann
Abstract The goal of this chapter is to give a fairly comprehensive overview of the
theory that underpins the rest of the book On first reading, it may seem a little dryand is often excluded in other similar books However, without a solid understand-ing of logic and representation of numbers it seems clear that constructing digitalcircuits to put this theory into practise would be much harder The theory here willpresent an introduction to propositional logic, sets and functions, number systemsand Boolean algebra These four main areas combine to produce a basis for formalmethods to describe, manipulate and implement digital systems such as computerprocessors Those with a background in mathematics or computer science mightskip this material and use it simply for reference; those approaching the subjectfrom another background would be advised to read the material in more detail
1.1 Propositions and Predicates
Definition 1 A proposition is a statement whose meaning, termed the truth value,
is either true or false Less formally, we say the statement is true if it has a truthvalue of true and false if it has a truth value of false
A predicate is a proposition which contains one or more variables; only when concrete values are assigned to each of the variables can the predicate be called a
proposition
Since we use them so naturally, it almost seems too formal to define what a tion is However, by doing so we can start to use them as a building block to describewhat logic is and how it works The statement
proposi-“the temperature is 90◦ C”
3
Trang 244 1 Mathematical Preliminaries
is a proposition since it is definitely either true or false When we take a proposition
and decide whether it is true or false, we say we have evaluated it However, there
are clearly a lot of statements that are not propositions because they do not state anyproposal For example,
“turn off the heat”
is a command or request of some kind, it does not evaluate to a truth value sitions must also be well defined in the sense that they are definitely either true orfalse, i.e., there are no “gray areas” in between The statement
“a man says that he is lying, is what he says true or false ?”
although a clearer version is the more commonly referenced
“this statement is false”
If the man is telling the truth, everything he says must be true which means he islying and hence everything he says is false Conversely, if the man is lying every-thing he says is false, so he cannot be lying since he said he was ! In terms of thestatement, we cannot be sure of the truth value so this is not normally classed as aproposition
As stated above, a predicate is just a proposition that contains variables By
as-signing the variable a value we can turn the predicate into a proposition and evaluate
the corresponding truth value For example, consider the predicate
“x ◦ C equals 90 ◦ C”
where x is a variable By assigning x a value we get a proposition; setting x = 10,
for example, gives
Trang 25proposi-1.1.1 Connectives
Definition 2 A connective is a statement which binds single propositions into a
compound proposition For brevity, we use symbols to denote common connectives:
• “not x” is denoted ¬x.
• “x and y” is denoted x ∧ y.
• “x or y” is denoted x ∨ y, this is usually called an inclusive-or.
• “x or y but not x and y” is denoted x ⊕ y, this is usually called an exclusive-or.
• “x implies y” is denoted x → y, which is sometimes written as “if x then y”.
• “x is equivalent to y” is denoted x ↔ y, which is sometimes written as “x if and
only if y” or further shortened to “x iff y”.
Note that we group statements using parentheses when there could be some
confu-sion about the order they are applied; hence (x ∧ y) is the same as x ∧ y.
A proposition or predicate involving connectives is built from terms; the connective joins together these terms into an expression For example, the expression
“the temperature is less than 90◦ C ∧ the temperature is greater than 10 ◦ C”
contains two terms that propose
“the temperature is less than 90◦ C”
and
“the temperature is greater than 10◦ C”
These terms are joined together using the∧ connective so that the whole expression
evaluates to true if both of the terms are true, otherwise it evaluates to false In asimilar way we might write a compound predicate
“the temperature is less than x ◦ C ∧ the temperature is greater than y ◦ C”
which can only be evaluated when we assign values to the variables x and y.
Definition 3 The meaning of connectives is usually describe in a tabular form
which enumerates the possible values each term can take and what the resulting
truth value is; we call this a truth table.
x y ¬x x ∧ y x ∨ y x ⊕ y x → y x ↔ y
false false true false false false true truefalse true true false true true true falsetrue false false false true true false falsetrue true false true true false true trueThe¬ connective negates the truth value of an expression so considering
¬(x > 10)
Trang 266 1 Mathematical Preliminaries
we find that the expression¬(x > 10) is true if the term x > 10 is false and the
expression is false if x > 10 is true If we assign x = 9, x > 10 is false and hence
the expression¬(x > 10) is true If we assign x = 91, x > 10 is true and hence the
expression¬(x > 10) is false.
The meaning of the∧ connective is also as one would expect; the expression
(x > 10) ∧ (x < 90)
is true if both the expressions x > 10 and x < 90 are true, otherwise it is false So if
x = 20, the expression is true But if x = 9 or x = 91, then it is false: even though
one or other of the terms is true, they are not both true
The inclusive-or and exclusive-or connectives are fairly similar The expression
Inference is a bit more tricky If we write x → y, we usually call x the hypothesis
and y the conclusion To justify the truth table for inference in words, consider the
be that x ≡ 1 (mod 2) even when x is not prime, and we do not know anything better
from the expression, we assume it is true when this case occurs
Equivalence is fairly simple The expression x ↔ y is only true if x and y evaluate
to the same value This matches the idea of equality of numbers As an example,consider
Trang 27it must be odd (apart from the corner case of x = 2) So the equivalence works in
one direction but not the other and hence the expression is false
Definition 4 We call two expressions logically equivalent if they are composed of
the same variables and have the same truth value for every possible assignment tothose variables
An expression which is equivalent to true, no matter what values are assigned
to any variables, is called a tautology; an expression which is equivalent to false is called a contradiction.
Some subtleties emerge when trying to prove two expressions are logically alent However, we will skirt around these For our purposes it suffices to simplyenumerate all possible values each variable can take, and check the two expressions
equiv-produce identical truth values in all cases In practise this can be hard since with n
variables there will be 2npossible assignments, an amount which grows quickly as
n grows ! More formally, two expressions x and y are only equivalent if x ↔ y can
be proved a tautology
1.1.2 Quantifiers
Definition 5 A free variable in a given expression is one which has not yet been
assigned a value Roughly speaking, a quantifier is a statement which allows a free
variable to take one of many values:
• the universal quantifier “for all x,y is true” is denoted ∀ x [y].
• the existential quantifier “there exists an x such that y is true” is denoted ∃ x [y].
We say that applying a quantifier to a variable quantifies it; after it has been
quanti-fied we say it has been bound.
Put more simply, when we encounter an expression such as
∃ x [y]
we are essentially assigning x all possible values; to make the expression true just
one of these values needs to make the expression y true Likewise, when we
“there exists an x such that x ≡ 0 (mod 2)”
which we can re-write symbolically as
Trang 288 1 Mathematical Preliminaries
∃ x [x ≡ 0 (mod 2)].
In this case, x is bound by the ∃ quantifier; we are asserting that for some value of
x it is true that x ≡ 0 (mod 2) To make the expression true just one of these values
needs to make the term x ≡ 0 (mod 2) true The assignment x = 2 satisfies this so
the expression is true As another example, consider the expression
“for all x, x ≡ 0 (mod 2)”
which we re-write
∀ x [x ≡ 0 (mod 2)].
Here we are making a more general assertion about x by saying that for all x, it is true that x ≡ 0 (mod 2) To decide if this particular expression is false, we need
simply to find an x such that x ≡ 0 (mod 2) This is easy since any odd value of x
is good enough Therefore the expression is false
Definition 6 Informally, a predicate function is just a shorthand way of writing
predicates; we give the function a name and a list of free variables So for examplethe function
Trang 29A common reason to manipulate a logic expression is to simplify it in some way
so that it contains less terms or less connectives A simplified expression might bemore opaque, in the sense that it is not as clear what it means, but looking at thingscomputationally it will generally be “cheaper” to evaluate As a concrete example
of simplification in action, consider the exclusive-or connective x ⊕ y which we can
write as the more complicated expression
To answer this, we simply start with one expression and apply our axiomatic laws
to move toward the other So starting with the first alternative, we try to apply theaxioms until we get the second: think of the axioms like rules that allow one to re-write the expression in a different way To start with, we can manipulate each term
which looks like p ∧ ¬q as follows
(p ∧ ¬q) = (p ∧ ¬q) ∨ false (identity)
= (p ∧ ¬q) ∨ (p ∧ ¬p) (null + inverse)
= p ∧ (¬p ∨ ¬q) (distribution)
Trang 3010 1 Mathematical Preliminaries
Using this new identity, we can re-write the whole expression as
(y ∧ ¬x) ∨ (x ∧ ¬y) = (x ∧ (¬x ∨ ¬y)) ∨ (y ∧ (¬x ∨ ¬y))
which gives us the second alternative we are looking for A later chapter showshow to construct an algorithm to do this sort of simplification mechanically so as toreduce our workload and reduce the chance of error
1.2 Sets and Functions
The concept of a set and the theory behind such objects is a fundamental part of mathematics Informally, a set is simply a well defined collection of elements Here
we mainly deal with sets of numbers, but it is important to note that the elementscan be anything you want
We can define a set using one of several methods Firstly we can enumerate theelements, writing them down between a pair of braces For example, one might
define the set A of whole numbers between two and eight (inclusive) as
A={2,3,4,5,6,7,8}.
The cardinality of a finite set is the number of elements it contains For the set A,
this is denoted by|A| such that from the example above
and be safe in the knowledge that A = B However, note that elements cannot occur
in a set more than once; a set where repetitions are allowed is sometimes called a
bag or multi-set but is beyond the scope of this discussion.
There are at least two predefined sets which have a clear meaning but are hard todefine using any other notation:
Definition 8 The set /0, called the null set or empty set, is the set which contains
no elements Note that /0 is a set not an element, one cannot write the empty set as
{/0} since this is the set with one element, that element being the empty set.
Trang 31Definition 9 The contents of the setU , called the universal set, depends on the
context Roughly speaking, it contains every element from the problem being sidered
con-As a side note, since the elements in a set can be anything we want they can tentially be other sets Russell’s paradox, discovered by mathematician BertrandRussell in 1901, is a problem with formal set theory that results from this fact Theparadox is similar to the liar paradox seen earlier and is easily stated by considering
po-A, the set of all sets which do not contain themselves The question is, does A
con-tain itself ? If it does, it should not be in A by definition but it is; if it does not, it should be in the set A by definition but it is not.
1.2.1 Construction
The above method of definition is fine for small finite sets but when the set is large oreven infinite in size, writing down all the elements quickly becomes an unpleasanttask ! Where there is a natural sequence to the elements, we write continuation dots
to save time and space For example, the same set A as above might be defined as
notation Basically speaking, we generate the elements in the set using f , a predicate
function:
D={x : f (x)}.
One should read this as “all elements x ∈ U such that the predicate f (x) is true”.
Using set builder notation we can define sets in a more programmatic manner, forexample
Trang 32567
Definition 10 A sub-set, say B, of a set A is such that for every x ∈ B we have that
x ∈ A This is denoted B ⊆ A Conversely, we can say A is a super-set of B and write
A ⊇ B.
Trang 33Note that every set is a sub-set and super-set of itself and that A = B only if A ⊆ B
and B ⊆ A If A = B, we use the terms proper sub-set and proper super-set and
write B ⊂ A and B ⊃ A respectively.
Definition 11 For sets A and B, we have that
• The union of A and B is A ∪ B = {x : x ∈ A ∨ x ∈ B}.
• The intersection of A and B is A ∩ B = {x : x ∈ A ∧ x ∈ B}.
• The difference of A and B is A − B = {x : x ∈ A ∧ x ∈ B}.
• The complement of A is A = {x : x ∈ U ∧ x ∈ A}.
We say A and B are disjoint or mutually exclusive if A ∩ B = /0 Note also that the
complement operation can be re-written A − B = A ∩ B.
Definition 12 The power set of a set A, denoted P(A), is the set of every possible
sub-set of A Note that /0 is a member of all power sets.
On first reading, these formal definitions can seem a bit abstract and slightly scary.However, we have another tool at our disposal which describes what they mean
in a visual way This tool is the Venn diagram, named after mathematician John
Venn who invented the concept in 1881 The basic idea is that sets are represented
by regions inside an enclosure that implicitly represents the universal setU By
placing these regions inside each other and overlapping their boundaries, we candescribe most set-related concepts very easily
Figure 1.1 details four Venn diagrams which describe how the union, tion, difference and complement operations work The shaded areas of each Venndiagram represent the elements which are in the resulting set For example, in the
intersec-diagram for A ∪ B the shaded area covers all of the sets A and B: the result contains
all elements in either A or B or both As a simple concrete example, consider the
Figure 1.2 shows membership in various settings; recall that those elements within
a given region are members of that set Firstly, we can take the union of A and B as
A ∪ B = {1,2,3,4,5,6} which contains all the elements which are either members
of A or B or both Note that elements 3 and 4 do not appear twice in the result The intersection of A and B can be calculated as A ∩ B = {3,4} since these are the
elements that are members of both A and B The difference between A and B, that is the elements in A that are not in B, is A − B = {1,2} Finally, the complement of A
is all numbers which are not in A, that is A = {5,6,7,8,9,10}.
The union and intersection operations preserve a law of cardinality called the
principle of inclusion in the sense that we can calculate the cardinality of the output
from the cardinality of the inputs as
Trang 3414 1 Mathematical Preliminaries
|A ∪ B| = |A| + |B| − |A ∩ B|.
This property is intuitively obvious since those elements in both A and B will be
counted twice and hence need subtraction via the last term We can even check it: inour example above|A| = 4 and |B| = 4 Checking our results we have that |A∪B| = 6
and|A ∩ B| = 2 and so by the principle of inclusion we should have 6 = 4 + 4 − 2
which makes sense
1.2.3 Numeric Sets
Using this basic notation, we can define three important numeric sets which are usedextensively later on in this chapter
Definition 13 The integers are whole numbers which can be positive or negative
and also include zero
Clearly the set of rational numbers is a super-set of bothZ and N since, for example,
we can write p/1 to represent any integer p as a member ofQ However, not all
numbers are rational Some are irrational in the sense that it is impossible to find
a p and q such that they exactly represent the required result; examples include the
value ofπ
1.2.4 Functions
Definition 14 If A and B are sets, a function f from A to B is a process that maps
each element of A to an element of B We write this as
f : A → B
where A is termed the domain of f and B is the codomain of f For an element
x ∈ A, which we term the preimage, there is only one y = f (x) ∈ B which is termed
the image of x Finally, the set
Trang 35Note that here we write the function signature which defines the domain and
codomain of INVinline with the definition of the function behaviour In this case
the domain of INVisZ since it takes integers as input; the codomain is Q since it
produces rational numbers as output If we take an integer and apply the function toget something like INV(2) = 1/2, we have that 1/2 is the image of 2 or conversely
2 is the preimage of 1/2 under INV
From this definition it might seem as though we can only have functions withone input and one output However, remember that we are perfectly entitled to have
sets of sets so we can easily define a function f , for example, as
f : A × A → B.
This function takes elements from the Cartesian product of A as input and produces
an element of B as output So since pairs (x, y) ∈ A × A are used as input, f can in
some sense accept two input values As a concrete example consider the function
pro-integers, say (2, 4), and apply the function we get MAX(2, 4) = 4 where we usually
omit the parentheses around the pair of inputs In this case, the domain of MAX
isZ × Z and the codomain is Z; the integer 4 is the image of the pair (2,4) under
MAX
Definition 15 For a given function f , we say that f is
• surjective if the range equals the codomain, i.e., there are no elements in the
codomain which do not have a preimage in the domain
• injective if no two elements in the domain have the same image in the range.
• bijective if the function is both surjective and injective, i.e., every element in the
domain is mapped to exactly one element in the codomain
Using the examples above, we clearly have that INVis not surjective but MAXis
This follows because we can construct a rational 2/3 which does not have an integer
preimage under INVso the function cannot be surjective Equally, for any integer x
in the range of MAXthere is always a pair (x, y) in the domain such that x > y so
Trang 3616 1 Mathematical Preliminaries
MAXis surjective, in fact there are lots of them sinceZ is infinite in size ! In the
same way, we have that INVis injective but MAXis not Only one preimage x maps
to the value 1/x in the range under INVbut there are multiple pairs (x, y) which map
to the same image under MAX, for example 4 is the image of both (1, 4) and (2, 4)
under MAX
and g is denoted
g ◦ f : A → C.
Given some input x ∈ A, this composition is equivalent to applying y = f (x) and
then z = g(y) to get the result z ∈ C More formally, we have
(g ◦ f )(x) = g( f (x)).
The notation g ◦ f should be read as “apply g to the result of applying f ”.
I : A → A, I (x) = x
so that it maps all elements to themselves Given two functions f and g defined by
f : A → B and g : B → A, if g ◦ f is the identity function on set A and f ◦ g is the
identity on set B, then f is the inverse of g and g is the inverse of f We denote this
by f = g −1 and g = f −1 If a function f has an inverse, we hence have f −1 ◦ f = I
The inverse of a function maps elements from the codomain back into the domain,essentially reversing the original function It is easy to see not all functions have aninverse For example, if the function is not injective then there will be more than onepotential preimage for the inverse of any image
At first glance, it might seem like our example functions both have inverses but
they do not For example, given some value 1/x, we can certainly find x but we have already said that numbers like 2/3 also exist in the codomain so we cannot invert
all the values we might come across However, consider the example of a successorfunction on integers
SUCC:Z → Z, SUCC(x) = x + 1
which takes an integer x as input and produces x + 1 as output The function is
bijective since the codomain and range are the same and no two integers have thesame successor Thus we have an inverse and it is easy to describe as
PRED:Z → Z, PRED(x) = x − 1
which is the predecessor function: it takes an integer x as input and produces x − 1
as output To see that SUCC−1= PREDand SUCC−1= PREDnote that
(PRED◦ SUCC)(x) = (x + 1) − 1 = x
Trang 37which is the identity function Conversely,
(SUCC◦ PRED)(x) = (x − 1) + 1 = x
which is also the identity function
1.2.5 Relations
Definition 18 We call a sequence of n elements (x0, x1, , x n −1 ) an n-tuple or
sim-ply a tuple when the number of elements is irrelevant In the specific case of n = 2,
we call (x0, x1) a pair The i-th element of the tuple x is denoted x i, and the number
of elements in a particular tuple x may be written as |x|.
Definition 19 The Cartesian product of n sets, say A0, A1, , A n −1, is defined as
A0× A1× ··· × A n −1={(a0, a1, , a n −1 ) : a0∈ A0, a1∈ A1, , a n −1 ∈ A n −1 }.
In the most simple case of n = 2, the Cartesian product A0× A1 is the set of all
possible pairs where the first item in the pair is a member of A0and the second item
is a member of A1
The Cartesian product of a set A with itself n times is denoted A n To be complete,
we define A0= /0 and A1= A Finally, by writing A ∗we mean the Cartesian product
of A with itself a finite number of times.
Cartesian products are useful to us since they allow easy description of sequences,
or vectors, of elements Firstly, we often use them to describe vectors of elements
from a set So for example, say we have the set of digits A = {0,1} The Cartesian
product of A with itself is
A × A = {(0,0),(0,1),(1,0),(1,1)}.
If you think of the pairs as more generally being vectors of these digits, the Cartesian
product A n is the set of all possible vectors of 0 and 1 which are n elements long.
Definition 20 Informally, a binary relation f on a set A is like a predicate function
which takes members of the set as input and “filters” them to produce an output As
a result, for a set A the relation f forms a sub-set of A × A For a given set A and a
binary relation f , we say f is
• reflexive if f (x,x) = true for all x ∈ A.
• symmetric if f (x,y) = true implies f (y,x) = true for all x,y ∈ A.
• transitive if f (x,y) = true and f (y,z) = true implies f (x,z) = true for all x,y,z ∈
A.
If f is reflexive, symmetric and transitive, then we call it an equivalence relation.
Trang 3818 1 Mathematical Preliminaries
The easiest way to think about this is to consider a concrete example such as the
case where our set is A = {1,2,3,4} such that the Cartesian product is
of A ×A (called AEQUfor example) in the sense that we can pick out the pairs (x, y)
AEQU={(1,1),(2,2),(3,3),(4,4)}
where EQU(x, y) = true For members of A, say x, y, z ∈ A, we have that EQU(x, x) =
true so the relation is reflexive If EQU(x, y) = true, then EQU(y, x) = true so the
relation is also symmetric Finally, if EQU(x, y) = true and EQU(y, z) = true, then
we must have that EQU(x, z) = true so the relation is also transitive and hence an
of all pairs (x, y) with x, y ∈ A such that LTH(x, y) = true It cannot be that LTH(x, x) =
true, so the relation is not reflexive; we say it is irreflexive It also cannot be that
if LTH(x, y) = true then LTH(y, x) = true, so the relation is also not symmetric;
it is anti-symmetric However, if both LTH(x, y) = true and LTH(y, z) = true, then
LTH(x, z) = true so the relation is transitive.
1.3 Boolean Algebra
Most people are taught about simple numeric algebra in school One has
• a set of values, say Z,
• a list of operations, say −, + and ·,
• and a list of axioms which dictate how −, + and · work.
Trang 39You might not know the names for these axioms but you probably know how they work, for example you probably know for some x, y, z ∈ Z that x + (y + z) = (x +
• a set of values, {false,true},
• a list of operations ¬, ∨ and ∧,
• and a list of axioms that dictate how ¬, ∨ and ∧ work.
Ironically, at the time his work was viewed as somewhat obscure and indeed Boolehimself did not necessarily regard logic directly as a mathematical concept How-ever, the step of finding common themes within two such concepts and unifyingthem into one has been a powerful tool in mathematics, allowing more general think-ing about seemingly diverse objects It was not until 1937 that Claude Shannon, then
a student of both electrical engineering and mathematics, saw the potential of usingBoolean algebra to represent and manipulate digital information [60]
: A × A → A
while a unary operation on A is a function
: A → A.
and∨ and a unary operator ¬ The members of B act as identity elements for ∧ and
∨ These operators and identities are governed by a number of axioms which should
look familiar:
Trang 40We call the∧, ∨ and ¬ operators NOT, AND and OR respectively.
Notice how this description of a Boolean algebra unifies the concepts of tional logic and set theory: 0 and false and /0 are sort of equivalent, as are 1 and trueandU Likewise, x ∧ y and A ∩ B are sort of equivalent, as are x ∨ y and A ∪ B and
proposi-¬x and A: you can even draw Venn diagrams to illustrate this fact It might seem
weird to see the statement
Also notice that the∧ and ∨ operations in a Boolean algebra behave in a similar
way to· and + in a numerical algebra; for example we have that x ∨ 0 = x and
x ∧ 0 = 0 As such, ∧ and ∨ are often termed (and written as) the “product” and
“sum” operations