1. Trang chủ
  2. » Công Nghệ Thông Tin

Iain D. Craig Virtual Machine

275 2,6K 10

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 275
Dung lượng 26,02 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

An interpr eter can operate on t he source structure of a progr am as many LISPinterpreters do or can execut e an internal form for exampl e, polish tation , while virt ual machines comb

Trang 1

Virtual Machines

Trang 2

lain D Craig

Virtual Machines

With 43 Figures

Trang 3

lain D Craig, MA, PhD, MBCS, CITP

Printed on acid-free paper

© Spr inger -Verlag London Limited 2006

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publi cation may only be repro- duced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency Enquiries concerning reproduction outside those terms should be sent to the publishers.

The use of registered names, trademarks, etc., in this publication does not imply, even in the absence

of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use.

The publisher makes no representation, express or implied, with regard to the accuracy of the mation contained in th is book and cannot accept any legal responsibility or liability for any errors

infor-or omissions that may be made.

Printed in the United States of America (EB)

9 8 7 6 5 4 321

Springer Science+Business Media

springeronline.com

Trang 4

To Dr P W Dale

(Uncle Paul)

Trang 5

I love virt ual machines (VMs) and I have done for a long time.Ift hat makes

me "sad" or an "anora k", so be it I love t hem because t hey are so much fun, aswell as being so useful T hey have an element of original sin (writ ingassembly

programs and being in cont rol of an entire machine), while st ill being able

to claim t hat one is being a respectable member of t he community (beingstruct ured, modul ar , high-level, object-oriente d, and so on) T hey also allowone to design machines of one's own, unencumbered by t he rest rict ions of apart icular processor (at least , until one starts opt imising it for some physicalprocessor or ot her)

I have been building virt ual machines, on and off, since 1980 or t about s It has always been somet hing of a hobby for me; it has also t urnedout to be a technique of great power and applicability I hope to cont inueworking on t hem, perhaps on some of t he ideas out lined in t he last chapte r(I certainly want to do some more work with register-based VMs and concur-rency)

here-I originally wanted to write t he book from a pur ely semantic viewpoint

I wanted to start wit h a formal semant ics of some language, t hen show how

a virt ual machine sat isfied t he semantics; finally, I would have liked to haveshown how to derive an implement ation Unfort unately, t here was insufficienttim e to do all of t his (alt hough some parts- the semant ics of ALEX and apar t proof of correct ness- were done but omit ted) Th ere wasn't enough tim e

to do all th e necessary work and, in addit ion, SHirk et al. had published t heirbook on J ava [47] which does everyt hing I had want ed to do (t hey do it with

J ava; I had want ed to define ad hoclanguages)

I hope to have made it clear t hat I believe t here to be a considerableamount of work left to be done with virt ual machines Th e entire last chapte r

is about t his As I have t ried to make clear, some of t he ideas included in t hatchapte r are intended to make readers t hink, even if they consider t he ideas

st upid!

A word or two is in order concern ing t he instruction sets of t he variousvirt ual machines t hat appear from Chap te r Four onwards T he instructions

Trang 6

viii Preface

for the stack machines in Chapter Four seem relatively uncontroversial Theinstructions in the chapter on register machines (Chapter Seven) might seem

to be open to a little more questioning

First, why not restrict the instruction set to those instructions required toimplement ALEX? This is because I wanted to show (if such a demonstrationwere really required) that it is possible to define a larger instruction set sothat more than one language can be supported

Next , most of the jump and arithmetic instructions seem sensible enoughbut there are some strange cases, the jump branching to the address on the top

of the stack is one case in point ; all these stack indexing operations constituteanother case I decided to add these "exotic" instructions partly because,strange as they might appear to some, they are useful Somewhere or other,

I encountered a virtual machine that employed a jump instruction similar tothe one just mentioned (I also tried one out in one of the Harrison Machine'simplementations-it was quite useful), so I included it Similarly, a lot of time

is spent in accessing variables on the stack, so I added instructions that wouldmake such accesses quite easy to compile; I was also aware that things likeprocess control blocks and closures might be on stacks I decided to add theseinstructions to build up a good repertoire, a repertoire that isnot restricted

to the instructions requir ed to implement ALEX or one of the extensionsdescribed in Chapter Five

I do admit, though, that the mnemonics for many of the operations couldhave been chosen with more care (I was actually thinking that an assemblercould macro these names out ) One reason for this is that I defined the registermachine in about a day (the first ALEX machine was designed in about forty-five minutes!) Another (clearly) is that I am not terribly good at creatingmnemonics I thought I'd better point these matters out before someone elsedoes

I have made every effort to ensure that this text is free of errors edly, they still lurk waiting to be revealed in their full horror and to show that

Undoubt-my proof-reading is not perfect Should errors be found, I apologise for them

in advance

Trang 7

Preface ix

Acknowledgements

Beverley Ford first t hought of this book when looking through some not es 1had made on abstract machines 1 would like to thank her and her staff atSpringer, especially Catherine Drur y, for making t he process of writing thisbook as smooth as possible

My brother Adam should be t hanked for creating the line drawings thatappear as some of th e figures (I actually managed to do th e rest myself) 1would also like to thank all th ose other people who helped in various wayswhile 1 was writing th is book (they know who they are)

l ain Craig

Market SquareAth erstone

14 Ju ne, 2005

Trang 8

1 Introduction 1

1.1 Introduction 1

1.2 Int erpr et ers 3

1.3 Landin's SECD Machine 3 1.4 Th e Organisation of t his Book 5 1.5 Omissions 7

2 VMs for Portability: BCPL 11

2.1 Int roduction 11

2.2 BCPL th e Language 12 2.3 VM Operations 15 2.4 Th e OCODE Machine 17 2.5 OCODE Instructions and t heir Implementation 18 2.5.1 Expression Instruction s 18

2.5.2 Load and Store Instruction s 20 2.5.3 Instructions Relatin g to Routin es 20

2.5.4 Cont rol Instructions 22

2.5.5 Directives 23

2.6 Th e Intcode/ Cintcode Machine 24 3 The Java Virtual Machine 27 3.1 Introduction 27

3.2 JV M Organisation: An Overview 28 3.2.1 Th e stack 29 3.2.2 Meth od areas 30 3.2.3 Th e P C register 31 3.2.4 Other st ruct ures 32 3.3 Class Files 32

3.4 Obj ect Representat ion at Runtime 40

3.5 Initialisation 42

3.6 Obj ect Deletion 44

Trang 9

xii Contents

3.9 Instructions 46

3.9.1 Dat a-manipulation instructions 48 3.9.2 Control inst ructions 51 3.9.3 Stack-manipulat ing inst ructions 54

3.9.4 Support for object orientation 56

3.9.5 Synchronisat ion 59

3.10 Concluding Remarks 59 4 DIY VMs 61 4.1 Int roduction 61

4.2 ALEX 62 4.2.1 Language Overview 62 4.2.2 What th e Virtual Machine Must Support 65 4.2.3 Virtual Machine- Storage Structures 66 4.2.4 Virtu al Machine-Registers 68 4.2.5 Virt ual Machine-Instruction Set 70

4.2.6 An Example 79

4.2.7 Implementat ion 81

4.2.8 Extensions 85

4.2.9 Alternatives 88

4.2.10 Specification 93

4.3 Issues 96

4.3.1 Indirect and Relative Jumps 97

Trang 11

9.3 Typed Inst ru ct ion Sets and Int ermediat e Cod es 216

9.9 Including more Inform ation about Sour ce Code 221

9.18 VirtualMachin es for more General Portabili ty 229

Trang 12

Conte nts xv

Trang 13

Introduction

1.1 Introduction

T here are, basically, two ways to implement a programming language: compile

it or interpr et it Compilers are usua lly written for a single target machine;the GNU C compiler is a partial counte r-example, containing, as it does, codegenerators for a number of target architect ures (act ually, the compiler has

to be compiled for a specific target and it is only t he full distribution t hatcont ains th e complete set of code generators) Int erpr eters are t hought to beslow but easy to port

An interpr eter can operate on t he source structure of a progr am (as many

LISPinterpreters do) or can execut e an internal form (for exampl e, polish tation) , while virt ual machines combine both compilation and interpretation.Virt ual machines consist of a compiler and a target architecture implemented

na-in software.Itcontains a core t hat deals with th e execution of code that hasbeen compiled into th e instruction set for the virtu al machine's software archi-tecture Th e core executes th ese instruction s by implementing th e operationsdefined by t he instruction set (which can be seen as a form of emulat ion orinterpr etation ) Much of t he t raditional runtime package funct ionality asso-ciat ed with compiled code is implemented as par t of a virt ual machine; t hisclearly serves as an invitation to expand available funct ionality to provide richexecution environments for programs.It also opens up t he possibility that tra-

ditional linkage methods (as exemplified by th e linkage editor or by dynamiclinkage of modules) can be eliminated in favour of more flexible methods.Virtual machines are used as a method for ensuring portability, as well

as for th e execut ion of languages t hat do not conform well (or at all) to thearchitecture of t he target architecture As noted in t he last par agraph , t heyafford oppor tunities to enrich t he executi on environment as well as greate rflexibility

It is t he case th at code in compiled form execute s considerab ly faster

t han interpreted code, with interpreted code running at one or two orders ofmagnitude slower th an th e correspondin g compiled form For many, opt imising

Trang 14

2 1 Introduction

compilers are the sine qua non,even t hough t he out put code can bear lit tleresemblance to t he source, thus causing verification problems (there is, andnever can be, a viable alternative to t he selection of good or , yet better,optim al algorithms) but optimi sing compilers are highly platform specific Th evirtu al machine is also a method for increasing th egeneralspeed of execut ion

of programs by providing a single site that can be tuned or improved byadditional techniques (a combinat ion of native code execut ion with virtualmachine code)

In a real sense, virt ual machines const it ute an execut ion meth od t hatcombines t he opport unities for compiler opt imisat ion wit h t he advantages ofinterpr etation

Although virt ual machines in t he form of "abst ract machines" have beenaround for a long t ime (since th e mid-1960s), t he advent of J ava has made

t hem a common (and even fashionable) technique for implementin g new guages, particularly t hose intended for use in het erogeneous environment s Asnot ed above, many languages (Prol og, Cur ry and Oz, to cite but th ree) haverelied upon virtual machines for a long time

lan-Itis clear t hat t he sense in which t he te rm "virt ual machine" is const ruedwhen considering execut ion environments for programs in par ticular program -ming languages relates to t he ot her senses of t he te rm To const ruct a virt ualmachine for some program ming language or ot her amounts, basically, to thedefinition of mechanisms t hat correspond to t he act ions of some computationalmachine (processor) or ot her.1

In t he sense of the term adop ted in t his book, existing hardw are imposes noconstraints upon th e designer oth er than th e semant ics of th e programminglanguage to be executed on th e virt ual machine Thi s view now seems tound erpin ideas on t he production of more general "virt ual machines" t hatare able to execut e t he code of more t han one programming language and toprovide support to execut ing programs in ot her ways

Virt ual machines const itute an act ive research area Thi s book is intended

as an invitation to engage in and cont ribute to it Thi s is manifested in anumber of ways:

• Th e use of transitions as a way of specifying virtu al machine instructions.(T his leads to th e idea of completely formal specifications, alt hough t his

is not followed up in thi s book- for a formal description of t he JVM, [47]

is recommended.)

• Th e use of register-b ased virtu al machines Most virtu al machines arebased on stacks In th e register-based approach, it seems possible to widen

t he scope of virt ual machines by providing more general instruction sets

t hat can be t ailored or augmented to suit particular languages

1 This latter sense is t he one adopted by the designers of IBM's VM operatingsystem; it implemented t he underlying hardware as a software layer

Trang 15

1.3 Landin's SEeD Machine 3

• Th e idea of t ra nslat ing ("morphing") code from one virt ual machine forexecut ion on anot her This ra ises correct ness issues t hat are partially ad-dressed in thi s book

1.2 Interpreters

Since t he 1950s, it has been possible to execute programs in compiled form

or in interpr eted form LISP was originally implemented in inte rprete d form ,

as was BASIC Th e LISP interp rete r was only a first stage of t he proj ect(since then, ext remely high-quality LISP compilers have been built ) butBASIC was intended from t he outset to be an interpreted language Since

th en, interpr eters have been implemented for a great many languages.Gries, in his [23], devotes a single chapter to interpreters He gives th eexample of th e interpr etati on of t he Polish form of a program and describes

th e organisation of an interpreter, as well as runtim e sto rage allocat ion Th etechniques involved in interpr etation are a subset of t hose in compilation tonative code

In [30], Landin int roduced t he SEeD mac hine. Thi s was originally intended

as a device for describing the operational semant ics of the A-calculus Landi nshowed how t he machine could be used to implement a functional program-ming language called ISWIM ("Ifyou See What I Mean"2).Since its intro-duction, t he SECD machine has been adapte d in various ways and used todescribe t he operational semantics of a great many languages, some func-tional , some not T he machine has shown itself easy to adapt so t hat featureslike lazy evaluation, persiste nce and assignment can easily be accommodatedwithin it

Since t he SECD machine is arguab ly t he first virt ual machine (or "abst ractmachine" as t hey used to be called),3

, it is useful to sketch its maj or points

A brief sketch of t he machine occupies t he remaind er of this section

Th e SECD machine gets its name from its main components or registers

(often erroneously called "stacks" ):

Trang 16

4 1 Introduction

Each of thes e components will be described in t urn

Th eS,st at e, register is a st ack t hat is used for th e evaluation of sions It is usually ju st called the stack To evaluate an expression such as

expres-5+3, t he values are pushed onto t he S regist er (in reverse order) and t hen t heoperator + is applied to t hem Just prior to th e applicat ion of the additionoperation, t he stack would be:

5 ·3 · After application of+,th e S register becomes:

envi-to be looked up when it is required

For example, consider th e unary function f (x ).When t his function is plied to an argument , say f (4),t he binding of 4 tox is recorded somewhere

ap-in t he E register Inside f ,when t he value ofx is needed, it is looked up in

t he environment and th e value 4 is obtained Th e environment is also used

to sto re t he values of local variables Th e code to access t he environment,both to bind and to lookup variable bindin gs is store d in t he C register and

is produ ced by a compiler generating SECD machine code

Th e C register contains a sequence of SECD machine instructions It isnot really a stac k but a simple list or vector A pointer run s down t he Cregister, pointing to each instruction in turn; in oth er machines, t his pointer

would be called the instruc tion point er or t he program counter ; in most SECD

implement ations , th e topmo st element in t he C register is shown

Th e instruction s used by an implementation of th e SECD machine definewhat is to be done with th e S, E and D registers (it is not impossible for

th em to define chang es to th e C register but it is rather rar e) For example,

th e addition instruction states that th e top two element s are to be poppedfrom S, added and th e result pushed onto S

Th e final register is t he D register, or t he dump Th e dump is used when

t he state of t he machine must be st ored for some reason For example, when

a routine is called, th e caller's local variables and stac k must be saved so

t hat t he called routine can perform its comput at ions In t he SECD machine,

t he registers are saved togeth er in the dump when a routin e is called When

a routin e exits, th e dump 's topmost element is popped and th e machine'sregisters are restored

Itwill be given a more precise interpretat ion later in this book

Trang 17

1.4 The Organisation of this Book 5

To make this a little clearer , consider an SECD machine It is describ ed

by a 4-tuple 5 ,E , C , D When a call is made within one routine to anotherroutine, the current instruction in th e C register could cause th e followingstate transition:

s,e, e,dbecomes 0,e', e',(s, e, e,d) d'

That is, an empty stack is put into th e 5 and a new environment established

in the E register; th e code for t he called routine is put into the C register.Meanwhile, the dump contains a 4-tuple consisting of the state of the callingroutine That state is suspended until the called routine exits

On exit, the called routine executes an SECD machine instruction th ateffects the following tr ansition:

s' , e', e', (s,e,e, d) · d'becomess, e, e,d'

I.e., everyt hing is put back where it belongs! (Transitions, more completelyformalised , will be used later in this book.)

In addition, th e SECD machine requires some storage management , cally a heap with a garbage collect or In most implementations, th e 5, E, Cand D registers are implemented as lists This implies th at some form of heapstor age is required to manage th em Th e Lispkit implementation described in[24] implements the t hree registers in this way and includes the (pseudo-code)specification of a mark and sweep garbage collector

typi-Th ere are many, different publications containing descriptions of the SECDmachine Th e book by Field and Harrison [18], as well as Henderson 's famousbook on Lispkit [24] are two, now somewhat old, texts containing excellentdescriptions of th e SECD machine

1.4 The Organisation of this Book

The chapte r th at immediat ely follows thi s (Chapter Two) is concerned withthe BCPL OCODE and Cint code /Intcode machines (in older versions , t hebootstrap code was called Int eode, while in th e newer, C-based, ones it iscalled Cinteode).BCPL is a relatively old language, although one th at still hasdevotees, 5 th at was always known for its port abilit y Portability is achievedthrough the definition of a virt ual machine, th e OCODE machine, that ex-ecut es BCPL programs Th e OCODE machine can be implemented fromscratch or bootstrapped using Cint code Int code, a process that involves theconstruction of a simple virtual machine on each new processor that is used toimplement the full OCODE machine Th e OCODE machine and its instruc-tion set are described in that chapter

Chapter Three contains a relatively short description of th e J ava VirtualMachine (JVM) , possibly th e most famous and widely used virtual machine

Su ch as t he a u t hor.

Trang 18

procedu-definition of t he instruct ion set T his machine is t hen specified using tion rules A compiler for a large subset of ALEX is specified in Appendix A;

transi-t he compiler transi-t ranslatransi-tes source code transi-to transi-t he single-stransi-tac k virtransi-t ual machine.The DIY theme cont inues in Chapte r Five Thi s chapter contains t he de-scriptions of two virtu al machines: one for a simple object-oriented language,

th e oth er for a language for pseudo parallelism Th e base language in bothcases is assumed to be the simple dialect of ALEX wit h which Chapte r Four

st arte d In each case, exte nsions are considered and discussed (t here appea rs

to be more to say about t he pseudo-parallel language)

T he idea of int rodu cing t he DIY virt ual machines is t hat t hey can beintr oduced in a simple form and t hen subject ed to exte nsions t hat suit t hevarious needs of different programming languages Thus, t he ALEX virt ualmachine starts with a call-by-value evaluat ion scheme which is later extended

by t he addit ion of call by reference; ALEX first has only vector s but recordsare added at a lat er stage In addi tion, t he DIY approach allows t he exte nsionand optimisat ion of t he instruction set to be discussed withou t reference to

an existing (and, hence, fixed) language and associate d virtu al machine

By way of relief, an event-based language is considered in Chapte r Six.Thi s language is somewhat different and has a semant ics th at is not ent irelyprocedural (although it contains procedural elements) and is not a st raightpseudo-parallel language (alt hough it can be related to one) ; t he syste m wasdesigned (and implemented) as par t of t he aut hor's work on computationalreflect ion Th e virtu al machine is a mixture of fairly convent ional instruct ions,instructions for handlin g events and event queues and, finally, instructions

to support (part of) th e reflective behaviour that was desired In order tomake t he virtual machine's definition clearer, a more mathematical approachhas been adopte d; tr ansitions specify t he instructions executed by th e virtualmachine A compiler for t he language execute d by t his virtual machine isspecified in Appendix B

6 For readers not familiar wit h t he term, "DIY" stands for "Do It Yourself" It usu ally refers to home "improvements" , ofte n in kit chens and bathrooms T he resu lt is often remini scent of t he detonati on of a medium-ca libre artillery shell (or so it seems from TV programmes on t he subject) The author explicit ly a nd publicly denies all and any knowledge of home improvements.

Trang 19

a motivating example for code translation between virtual machines , an issue,referred to as "code morphing" and discussed in Chapter 9) The correctness

of this translation is considered in a semi-formal way Finally, a more ral translation from ALEX to register-based code is considered before moreextensions are discussed

natu-Register-based virtual machines are discussed because th ey appear to be

an effective alternative to the more usual method of using stack (or address) machines The author experimented with such a virtual machine aspart of the work on the Harrison Machine, the system described in ChapterSix (although not discussed there) The discovery that the Parrot group wasusing a similar approach for Perl6 appeared a strong basis for the inclusion ofthe topic in this book

zero-The implementation of virtual machines is considered in Chapter Eight.Implementation is important for virtual machines: they can either be consid-ered theoretical devices for implementing new constructs and languages orpractical ways to implement languages on many and many platforms

In Chapter Eight , a number of implementation techniques are considered,both for stack- and register-based virtual machines They include the directtranslation to a language such as C and to other virtual machines The use ofdifferent underlying organisations, such as threaded code, is also discussed.The last chapter, Chapter Nine is concerned with what are considered to

be open issues for those interested in pushing forward the virtual machine proach This chapter is, basically, a somewhat loosely organised list- a brain-storming session-of ideas, some definitely worth investigating, some possiblydead ends, that are intended to stimulate interest in further work

ap-1.5 Omissions

Virtual machines are extremely popular for the implementation of languages

of all kinds Itis impossible in a book of this length to discuss them all; it isalso impossible, realistically, to discuss a representative sample

Prolog is a good example of a language that has been closely associatedwith a virtual (or abstract) machine for a long time The standard virtual

machine is that of Warren [52] (the Warren Abstract Machine or WAM) A

description of the WAM was considered and then rejected, mostly because ofthe excellent book by Ait-Kaci [3] on the WAM Readers interested in logic

Trang 20

8 1 Introduction

programming languages would be well advised to read and complete ly digest[3];readers just int erest ed in virtua l machines will also find it a pleasure t oread

The Scheme language (a greatly tidie d-up LISP dialect wit h static scope)[28]has been associated wit h compi lers since its inception However , t here is avirtual machine for it ; it is describ ed in [1](t he chapter on regist er machines)

T he impleme ntat ion t here can be used as t he basis for a working tion (indeed, many years ago , the aut hor used it as a stage in t he developm ent

implementa-of a compiled system for expe rime nt ing wit h reflect ion) Alt hough int end edfor und ergraduat es,[1]is highly informative about Scheme (and is also a goodread )

P ascal was distribut ed from ETH , Zurich, in t he form of an abstract chine (VM) t hat could be port ed with relative ease The UCSD Pascal syste mwas also based on an abstract machine The notion of using a virtual machine

ma-to enha nce portability is covered below in the cha pter on BCPL (Chapter 2).BCPL is simpler in some ways than Pascal: it only has one primi ti ve type (th emachine word ) and a few derived ty pes (tables and vectors) BCP L's machine

is a lit tle earlier t ha n t hat of Pascal, so it was decided to describe it (BC P Lwill also be less familiar to many readers" and was a maj or influence on t hedesign of C.)

Smalltalk [21] also has a virtua l machine, which is defined in [21] inSmalltalk T he Smallt alk VM inspired t he pseudo-par allel virt ua l machinedescr ibed in Cha pte r 5; it was also influenti al in t he design of t he Harr isonMachine (Cha pter6) A full descript ion of t he Smalltalk VM would have taken

a considerable amo unt of space, so it was decided to omit it

The Poplog system [42] ,a syst em for AI programming that supports monLISP, Prolog, Pop ll and Standard ML, uses a common virtual machine.Pop ll is used for all systems programming, so t he virt ua l machine is tailored

Com-to t hat language However, t he Lisp, Prolog and ML compilers are written inPopll an d generate virtual machine code The P rolog compiler is based on

a conti nuation-pass ing model, not on t he Warr en Abst ra ct Machine, so thePoplog inst ruction set can be utilised directly The Popll language is, in t heaut hor's opinion, worth studying in its own right ; the virt ua l machine and thecompilation mechani sms are also worth st udy The Poplog system distributioncontains on-line document ati on abo ut itself

There are many ot her virtual machines that could not be included in thi sbook T hey include VMs for :

• Fun cti onal languages (e.g., t he G-m achine [25] and derivati ves [39]; t heFPM [7]);

• Functi onal-logic programmi ng languages;

• Constraint languages (t he Oz language is an interesting exam ple)

7The author hopes it brings a smile to the lips of British readers, as well as fondand not-so fond memories

Trang 21

1.5 Omissions 9

Some readers will also ask why no attention has been paid to Just-In Time(JIT) compilers , particularly for Java One reason is t hat this is a techniquefor optimising code rather than a pur e-virtual machine method Secondly,JIT

compilers are a method for integrating nat ive code (compiled on the fly) with

a virtual machine As such, it requires an interface to the virtual machine

on which oth er code run s.In th e t reatment of th e Java virtual machine , the

nati ve code mechanism is outlined ; t his is one method by which native codemethods can be integrat ed

Given th e plethora of virt ual machines, th e reader might ask why it wasdecided to describe only t hree mainstream ones (BCPL , Java and Parrot)and to rely on (prob ably not very good) home-grown ones Th e reasons are

as follows :

• Ifth e book had been composed only of descript ions of exist ing virtualmachines, it would be open to the accusation that it omits t he X virtualmachine for languageL This was to be avoided

• Home-grown ones could be developed from scratch, thus making clear th eprin ciples that underpin the development of a virtual machine

• In the mainstream , only t he J ava virtual machine combines both objects

and concurrency.It was decided to present new, independent virtual chines so that differences in langu age could be introduced in various ways

ma-Th e home-grown approach allows langu age and virtual machine features

to be included (or excluded) ad libitum (even so, an at te mpt has beenmade to be as comprehensive as possible within t he confines of a book ofthis length-hence th e various section s and subsection s on exte nsions andalt ernative s)

• At th e tim e of writing, th e Parrot virtual machine app ears to be t he onlygenerally available one based on th e register-transfer model Th e aut horindep endently came to conclusions similar to thos e of the designers ofParro t as to th e merits of register -based machines (and on t reating vir-tual machines as dat a st ructures ) and want ed to argue for this alte rnat ivemodel As a consequence, th e mapping between st ack- and register-basedmodels was of importance (as are some of th e suggest ions for further work

in t he Chapter 9)

• Th e derivation of t ransit ions specifying many virtual machines would nothave been possible in the tim e available for th e writ ing of thi s book Fur-thermore, an exist ing virtual machine is an ent ity, so th e introduction ofnew instructions (e.g., branches or absolute jumps) would have been lessconvincing; thead hocvirtual machines describ ed below can be augment ed

as much as one wishes.8

8 Interested readers are actively encouraged to implement t he virtual machines inthis book and augment them as t hey seefit , as well as introducing new instructions

by defining new transitions

Trang 23

VMs for Portability: BCPL

2.1 Introduction

BCPL is a high-level language for syste ms programming that is intended to be

as portable as possible.Itis now a relatively old language but it cont ains mostsyntact ic const ructs found in conte mpora ry languages Indeed, C was designed

as a BCPL derivative (C can be considered as a mixt ure of BCPL and Algol68

plus some sui generis features) BCPL is not conventionally ty ped.Ithas onebasic dat a ty pe , th e machine word It is possible to ext ract bytes from wordsbut t his is a derived operation All ent ities in BCPL are considered eit her t o

be machine words or to require a machine word or a number of machine words.BCPL supports addresses and assumes that th ey can fit into a single word.Similarly, it supports vectors (one-dimensional arr ays) which are sequences

of words (multi-dimensional arr ays must be explicit ly programmed in terms

of vectors of point ers to vect ors) Routines (procedures and functions ) can

be defined in BCPL and are represented as pointers to t heir ent ry point s.Equall y, lab els are addresses of sequences of instructions

BCPL stands for "Basic CP L" , a subset of t he CP L lan guage CP L was

an ambit ious lexically scoped, imperative procedural programming languagedesigned by Str achey and oth ers in t he mid-1960s as a joint effort involvingCambridge and London Universit ies CP L cont ained all of t he most advancedlanguage const ructs of t he day, includin g polymorphism There is a st ory th atthe compiler was too large to run on even th e biggest machines available in

th e Universit y of London! Even th ough it strictly prefigures th e structuredprogramming movement , BCPL contains st ructured cont rol const ructs (com-mand s) including two-br anch conditionals, switch commands, st ructured loopswith st ruct ured exits It also supports statement forrnulee similar to t hose inFORTRAN and t he original BASIC Recursive routin es can be defined BCPLdoes support a goto command Separ ate compilat ion is support ed in part by

t he provision of a "global vector", a vect or of words t hat contains ers t o exte rnally defined routines BCPL is lexically scoped It implementscall-by-value semantics for routine par amet ers It also permits higher-order

Trang 24

bootstrap-an abstract machine for OCODE is relatively straigthforward.

In the book on BCPL [45], Richards and Whitby-Strevens define a second

low-level intermediate language calledIntcode. Intcode is an extremely simplelanguage that can be used to bootstrap OCODE More recently, Richards hasdefined a new low-level bootstrap code called Cintcode. The idea is that afundamental system is first written for IntcodejCintcode This is then used

to bootstrap the OCODE evaluator The definition of the Intcode and code machines is given in the BCPL documentation The BCPL system wasdistributed in OCODE form (more recent versions distribute executables forstandard architectures like the PC under Linux) At the time the book waspublished, an Intcode version of the system was required to bootstrap a newimplementation

Cint-The virtual machines described below are intended, therefore, as an aid toportability The definitions of the machines used to implement OCODE andIntcodejCintcode instructions include definitions of the storage structures andlayout required by the virtual machine , as well as the instruction formats andstate transitions

The organisation of this chapter is as follows We will focus first on BCPLand its intermediate languages OCODE and IntcodejCintcode (Cintcode ispart of the current BCPL release and access to the documentation is rela-tively easy) We will begin with a description of the OCODE machine Thisdescription will start with a description of the machine's organisation and then

we move on to a description of the instruction set The relationship betweenOCODE instructions and BCPL's semantics will also be considered Then,

we will examine Cintcode and its abstract machine Finally, we explain howBCPL can be ported to a completely new architecture

2.2 BCPL the Language

In this section, the BCPL language is briefly described.

BCPL is what we would now see as a relatively straightforward procedurallanguage As such, it is based around the concept of the procedure BCPLprovides three types of procedural abstraction:

• Routines that update the state and return no value;

Trang 25

2.2 BCPL the Language 13

• Rout ines t hat can update the state and retu rn a single value;

• Rout ines that just compute a value

T he first category refers to proced ures proper , while t he second corresponds

to t he usual concept of function in procedural languages The t hird categorycorresponds to t he single-line functions in FORTRAN and in many BASICdialects Each category permits t he programmer to pass parameters, whichare called by value

BCPL also supports a variety of functio n that is akin to t he so-called mula functio n" of FORTRAN and BASIC T his can be considered a variety

"for-of macro or open procedure because it declares no local varia bles

BCP L supports a variety of state-modifying constructs As an imperativelanguage, it should be obvious t hat it contains an assignment statement As-signment in BCP L can be simple or mult iple, so t he following are bot h legal:

Newline, in BCPL, can also be used to terminate a statement Th is is

a nice feature , one found in only a few othe r languages (Eiffel and Imp, alanguage used in the 1970s at Edinb urgh University)

Aside from t his syntactic feat ure , the multiple assignment gives a clue t hatthe underlying semantics of BCPL are based on a stack

In add it ion, it contains a number of branching const ructs:

• IF DO.l T his is a simple test Ifth e test is true, th e code following the

DO is executed.Ift he test is false, the enti re statement is a no-operat ion

• UNLESS DO T his is synt actic sugar for IF NOT DO.That is, t hecode following t he DO is executed if the test fails

• TEST THEN ELSE This corresponds to the usual if then else inmost programming languages

• SWITCHON T his is direct ly ana logous to thecasestatement in Pascal andits descendants and to the switchstatement in C and it s derivatives Casesare marked using th e CASE keyword Cases run into each ot her unlessexplicitly broken There is also a an opt ional default case denoted by akeyword Each case is implicitly a block

In general, t he syntax word do can be interchanged with then In the abovelist , we have followed the convent ions of BCPL sty le

BCPL contains a number of iterative statements The iterative statementsare accompan ied by structured ways to exit loops

Keywords must be in uppercase, so the convention is followed here

Trang 26

14 2 VMs for Portability: BCPL

BCPL has a goto,asbefits its age

BCPL st ate ments can be made to return values Thi s is done using th epair of command sVALOFandRESULTIS.Th eVALOFcommand introduces ablock from which a value is returned using th eRESULTIScommand; there can

be more t han oneRESULTIScommand in aVALOFblock Th e combinat ion ofVALOF andRESULTIS is used to retu rn values from funct ions Th e following

The following is a BCPL funct ional routi ne:

LET Global Added.Val (x) =

From t his small example, it can be seen th at t he bod y of a procedure

is marked by th e BE keyword , while functional rout ines are signalled by t heequals sign and t he use ofVALOF andRESULTIS (BCPL is case-sensitive).BCPL is not convent ionally typed.Ithas only one dat a type, th e machineword , whose size can change from machine to machine Th e language alsocontai ns operato rs that access t he byt es within a machine word Storage isallocated by t heBCPL compiler in units of one machine word T he languagecontai ns an operator th at returns t he address of a word and an operator t hat,given an address, ret urns t he contents of th e word at th at address (derefer-eneing)

BCPL supports st ruct ured types to a limited exte nt It permits t he nition of vectors (single-dimension arrays of words) It also has a table ty pe.Tables are vectors of words th at are indexed by symbolic constants , not bynumerical values In addit ion, it is possible to take th e add ress of a routine

defi-(pro cedure or function) ; such addresses are t he entry points of the routines (asin C) The passing of routine addresses is t he method by which BCPLsupports higher-order routin es (much as C does)

It also permits t he definition of symbolic constants Each constant is onemachine word in length

BCPL introduces ent it ies using t he LETsyntax derived from ISWIM Forexample, t he following introduces a new variable th at is init ialised to zero:LET x := 0 IN

Th e following introduces a constant :

Trang 27

2.3 VM Operations 15LET x = 0 IN

Multiple definitions are separated by the AND keyword (logical tion is represented by the "&" symbol) as in:

conjunc-LET x := 0

AND Y= 0

IN

Routines are also introduced by the LETconstruct

Variables and constants can be introduced at th e head of any block

In order to support separat e compilat ion and to ease th e handling of the

runtime library, a global vector is supported This is a globally accessible vector

of words, in which th e first few dozen ent ries are init ialised by the runtimesystem (they are initialis ed to library routine ent ry points and to globallyuseful values) Th e programmer can also assign to th e global vector at higherlocations (care must be taken not to assign to locations used by th e syste m)

Th ese are the primary semant ic const ruct s'of BCPL Given t his summary, wecan now make some observations about t he support required by the virtualmachine (th e OCODE machine)

2.3 VM Operations

The summary of BCPL above was intended to expose th e major constructs

Th e identification of major const ructs is important for the design of a virtualmachine which must respect t he semant ics of t he language as well as providing

th e storage st ruct ures required to support th e language

At thi s st age, it should be clear that a BCPL machine should provide port for the primit ive operations needed for th e manipulation of dat a of allprimitive types The virtual machine support for th em will be in t he form ofinstructions that th e machine will directly implement In BCPL , thi s impliesthat th e virtual machine must support operations on the word type: arit hmeticoperations, comparisons and addressing Byte-based operations can eit her beprovided by runtime library operations or by instructions in th e virtual ma-chine; BCPL employs t he latter for t he reason t hat it is faster and reduces

sup-th e size of t he library In addition , BCPL supports vectors on sup-the stack; t heymust also be addressed when designing an appropriate virt ual machine

Th e values manipulated by t hese opera tions must be stored somewhere: astorage area, particularly for temporary and non-global values must be pro-vided Operations are required for manipulatin g thi s storage area Operationsare also required to load values from oth er locations and to stor e t hem asresults More than one load operation might be required (in a more richlytyp ed language, thi s might be a necessity) and more th an one store operat ionmight be required It is necessary to look at th e cases to determine what isrequired

Trang 28

16 2 VMs for Portability: BCPL

BCPL employs static scoping The compiler can be relied upon to ify that variables , etc , are not required Static scoping requires a stack-likemechanism for the storage of variables The virtual machine is, therefore, builtaround a stack Operations are required to allocate and free regions of stack

ver-at routine entry and exit ; the return of results can also be implemented bymeans of stack allocation and addressing The compiler generates instructionsthat allocate and free the right amount of stack space; it also generates in-structions to handle returned values and the adjustment of the stack whenroutines return Evaluation of expressions can be performed on the stack, so

we now are in a position to define the instructions for data manipulation.With expressions out of the way, the following families of construct must

be handled by the compiler and OCODE instructions generated to implementthem:

• Control constructs, in particular, conditionals, iteration, jumps;

• Assignment;

• Routine call and return;

• Parameter passing and value return from routines and valof

Note that we assume that sequencing is handled implicitly by the compiler.Control structure is handled, basically, by means of labels and jumps.There are clear translations between most of the control structures and label-jump combinations The problem cases are FORand SWITCHON The former

is problematic because it requires counters to be maintained and updated inthe right order ; the latter because the best implementation requires a jumptable

Assignment is a relatively straightforward matter (essentially, push a valueonto the stack and pop it off to some address or other) Multiple assignment isalso easy with a stack machine The values are pushed onto the stack in someorder (say left to right) and popped in the reverse order Thus, the command:p,q := 1, 2

has the intention of assigning 1 to p and 2 to q This can be done by pushing

1, then 2 onto the stack and assigning them in reverse order An interestingexample of multiple assignment is:

p,q := q, p

Swap!It can be handled in exactly the manner just described

Finally, we have routine calls and VALOF.There are many ways to plement routine calls For software virtual machines, relatively high-level in-structions can be used (although low-level instructions can also be employed).The OCODE machine provides special instructions for handling routine entryand exit , as will be seen

im-BCPL is a call-by-value language , so the runtime stack can be directlyemployed to hold parameter values that are to be passed into the routine

Trang 29

2.4 The OeODE Machine 17

T heVALOF RESU LTIScombination can be handled in a variety of ways.One is to perform a source-to-source transformation Another is to use thestack at runtime by introducing a new scope level Variables local to theVALOFcan be allocated on the runtime stack with the stack then being usedfor local values until the RESULTISis encountered An implementation forRESULTISwould be to collapse the stack to the point where the VALOF wasencountered and then push the value to be returned onto the stack

In this section, the organisation of the OCODE machine is presented BCPL

is a procedural programming language that supports recursion It requires aglobally accessible vector of words to support separate compilation.Italso re-quires a pool of space to represent global variables The language also permitsthe use of (one-dimensional) vectors and tables (essentia lly vectors of wordswhose elements are indexed by symbolic ident ifiers, much like tables in assem-bly language) As a consequence, the OCODE machine must reserve space for

a stack to support lexical scope and for recursion The OCODE machine alsoneeds space to hold the global vector and also needs a space to hold programinstructions

Fig 2.1. The OeODE machine organisation.

The OCODE machine has three memory regions:

• T he Global vector ;

• The Stack (this is aframed stack) ;

• Storage for program code and static data

Trang 30

18 2 VMs for Portability : BCPL

The organisation of th e OCODE machine is shown in Figure 2.1

T heglobal vector is used to store all variables declared global in th e gram Th e global vecto r is a vector of words cont aining global variables; italso contains th e ent ry points of routines declared in one module th at are to

pro-be made visible in another It is pointed to by the G register Th e currentstac k frame is pointed to by th e P register Th e size of th e current stack frame

is always known at compilat ion time, so it need not be represented in code by

a register

Th ere is also a specialAregister which is used to hold values returned byfunction s (see below)

Stat ic variables, tables and st ring constants are st ored in the program area

Th ey are referenced by lab els which are usually represented by t he letter Lfollowed by one or more digits

Th e stack holds all dynamic (local) variables

All variables are of t he same size Th at is, all variables are allocated thesame amount of space in th e stor e For most modern machines th ey are 32-

or 64-bits in length

2.5 OeODE Instructions and their Implementation

In OCODE, instructions are represented as integers Here, we will use onlythe mnemonic names in th e interests of readabi lity It is important to notethat th e mnemonic form for inst ruct ions and lab els must be converted intomore fundamental representati ons when code is emit ted by th e compiler.The size of the current stack frame is always known at compile tim e Whenspecifying instruct ions, a variable, 8, is used to denot e an offset from th e start

of t he current stack frame T his is done only to show how much space is left

in the current stack frame by th e individual instructions

When defining abst ract machine instructions, an arr ay not ation will beemployed T hus, P is considered as a one-dimensional vector 8 will still be aconstant denotin g t he size of th e current stac k frame Similarly, Gwill also beconsidered as an array

Th e notation P[8-1] denot es t he first free element on th e stack

2.5.1 Expression Instructions

The OCODE instruction s t hat implement expressions do not alte r th e stackframe size In th e case of unary instructions, the operand is replaced on t he top

of the st ack by the result of the instruction In the case of binary operations ,

t he st ack element immediately beneath t he top one is replaced by the result

Th e instruction s are mostl y quit e clear Rather t han ente r into unnecessarydet ail, t hese instructions are summ arised in Table 2.1 Th e table's middlecolumn is a short English equivalent for t he opcode

Trang 31

2.5 OCODE Instruct ions and their Implementation 19

On ly t he first inst ru ct ion deserves any real comment It is an inst ru ct ion

t hat considers t he cur rent to p-of-stack element as a point er into memory.Itreplaces t he to p-of-stack element by t he object t hat it poin t s to This is t heopera t ion of dereferencing a point er to yield anr-value

Table 2.1. OeOD E expression ins tru ctio ns

Op code Description Definition

RV r-value P [8-1] := cts([8-1])

AB8 absolute value P [8- 1] := abs(P [8-1 ])

NEG unary minus P [8-1] : = -P[8-1]

NOT logical negation P [8-1] : = , (P [8- 1])

GETBYTE extract byte P [8-2] := P[8-2] gtb P[8-1]

L8HIFT left shift P [8-2] : = P[8- 2] « P [8-1]

R8HIFT right shift P [8-2] : = P[8-2] » P [8-1]

LOGAND logical and P [8-2] : = P[8-2] and P[8-1]

LOGOR logical or P [8-2] : = P[8-2] or P [8-1]

EQV bitwise equal P [8-2] : = P[8- 2] leq P [8-1]

NEQV xor P [8- 2] := P[8-2] xor P[8-1]

Table 2.1 employs a not ational convent ion t hat needs explanation:

• cts is t he conte nts ope ration (dereferences its argument)

• abs is t he absolute value of it s argume nt

• gt b is the getbyte operator

• rem is integer remainder aft er division

• and is logical and (conj unct ion)

• or is logical or (disjunction)

• leq is bitwise equivalence

• xor is bitw ise exclusive or (logical not-equi valence)

• e l <<e2 is left shift e1 by e2 bit s

• e l >>e 2 is right shift el by e2 bit s

Other t han t his, t he "descript ion" of each inst ru ct ion is just an operat ion

on the OeODE st ack In t his and t he following cases , t he code equivalent is

Trang 32

20 2 VMs for Portability: BCP L

included in the table; when defining virtual machines later in this book, thismeth od will be used to indicate both "descript ions" and implementations ofvirt ual machine inst ructions

2.5.2 Load and Store Instructions

Th e load and store instruct ions, like t hose for expressions, should be fairlyclear The code equivalents are included in t he right-h and column of Tab le2.2 Each inst ruction is described (middle column of the table)

Table 2.2 OCODE load and store i nstru cti ons.

Op code Descr ipt ion Definition

LP n load fromP P[8] := P[ n] ; 8 := 8+1

LG n load global P[8] := G[ n] ; 8 := 8+1

LL Ln load label P[8] := Ln ; 8 := 8+1

LL Pn load address P [8] := P[ n]; 8 := 8+1

LL Gn load global addrP[8] ;=G[ n]; 8 := 8+1

LLL Ln load label addr P [8] := Ln ; 8 := 8+1

L8TR n 01 Onload string P [8] = "Ol On"; 8 := 8+1

8TIND store index cts (P[8-1]) := P[8-2] ; 8 := 8-2PUTBYTE put byte setbyte(P[8-2] ,P[8-1]) =

P[8-3 ] ; 8 := 8-3

T here is an instruction not included in Tab le 2.2 t hat appears in th eOeODE machine specification in [44] It is the QUERY instruction It is de-fined as:

Unfort unately, [44] does not contain a description of it Th e remaining structions have an interpretat ion that is fairly clear and is included in th etable.Itis hoped that the relatively brief description is adequate

in-2.5.3 Instructions Relating to Routines

T his class of instruction deals with rout ine entry (call) and ret urn When itcompiles a routin e, t he OeODE compiler generates code of t he following form:

Trang 33

2.5 OeODE Instructions and their Implement at ion 21

ENTRY Li n Cl Cn

SAVE s

<body of r outine>

ENDPROC

Here,Li is t he label of the routine's ent ry point For debugging purposes,

t he lengt h of th e routine's identifier is recorded in th e code (t his is n in th ecode fragment ); th e characte rs comprising th e name are t he elements denoted

Cl to Cn Th e instructions in t his catego ry are shown in Table 2.3

Th e SAVE instruction specifies t he initial setting of t he S register Th evalue of t his is t he save space size (3) plus t he number of formal paramet ers.The save space is used to hold t he previous value ofP ,t he return address and

t he routine entry address Th e first argument to a routin e is always at th elocation denot ed by 3 relative to t he point er P(some versions of BCPL have

a different save space size, so th e st and ard account is followed above)

Th e end of each routine is denot ed by ENDPROC Th is is a no-op whichallows th e code generat or to keep tr ack of nested procedure definitions

Th e BCPL standard requires t hat arguments are allocated in consecutivelocations on t he stac k Th ere is no a priori limit to t he number of arguments

t hat can be supplied A typical call of t he form:

E(El , , En )

is compiled as follows (see Table 2.3) First , S is incremented to allocate spacefor th e save space in th e new st ack frame Th e arguments El toEnare com-piled and th en th e code forE. Fin ally, either FNAP k orRTAP k instruction is

generat ed, th e actu al one depending upon whether a function or routine call

is being compiled Th e valuekis t he distance between t he old and new st ackframes (i.e., th e number of words or bytes between t he start of t he newlycompiled stack frame and th e start of t he previous one on t he stack)

T a b le 2 3 OeODE instru ctions for routines.

Op code Meanin g

ENTRY ente r rout ine

SAVE save locals

ENDPROCend rout ine

FNAPk apply function

RNAPk apply pro cedure

RTRN return from procedure

FNRN return from function

Return from a rout ine is performed by t heRTRNinstruction Thi s restores

th e previous value of P and resumes execution from th e return address If

the return is from a function , t he FNRN instruction is planted just after th e

Trang 34

22 2 VMs for Portability: BCPL

result has been evaluated (this is always placed on t he to p of the stack) TheFNRN inst ruct ion is ident ical to RTRN after it has stored t he result in t he Aregister ready for the FNAP instruct ion to sto re it at the required locat ion inthe previous stack frame

2.5 4 Control Instructions

Control instruct ions are to be found in most virtual machines T heir functio n iscentred around the t ra nsfer of control from one point to anot her in t he code.Included in this set are instruct ions to create labels in code The OCODEcontrol instructions are shown in F igure 2.4

Table 2.4 OeODE control instructions.

SWITCHONnLdK1 L 1 • •• L n jump table for a SWITCHON

The JUMP Ln instruction transfers control uncondit ionally to the label L.

The instructions JT and JF transfer control to t heir labels if the top of thestack (implemented asP !(S-1) ) is true or false, respectively Instructions likethis are often found in the instruction sets of virtual machi nes The conditionaljumps are used , inter alia, in the implementation of selection and iteration

commands

Although they are particular to OCODE, the other instructions also resent typical operations in a virtual machine T he LAB instruction (really apseudo-operation) declares its operand as a label (thus associating the add ress

rep-at which it occurs with the label)

T he GOTO instruction is used to generate code forSWITCHONcommands

It takes the form GOTO E, where E is an express ion In the generated code,the code forE is compiled and immediat ely followed by the GOTO instruction

At ru nt ime, t he expression is evaluated, leaving an address on the top of thestack The GOTO inst ruction then transfers control to t hat address

The RES and RSTACK instructions are used to compi le RESULTIS mands Ifthe argument to a RESULTIS is immediately returned as the result

com-of a funct ion, the FNRN instructio n is selected In all other contexts,RESULTIS

e compiles to the code for efollowed by the RES L n instruction The tion of this instruction places t he result in the A register and then jumps to

Trang 35

execu-2.5 OeODE Instructions and their Implementation 23

t he label Ln.The label addresses an RSTACK k instruction, which takes t he

result and sto res it at location P! k and sets S to k+1.

T heOCODESWITCHONinstr uct ion performs a jump based on t he value on

t he top of t he stack.Itis used to implement switches(SWITCHONcommands,

ot herwise known as case statements) It has t he form shown in Table 2.4, where

n is t he numb er of cases to which to switch and Ld is t he label of t he default

case T heK, are t he case constants and t he L , are the corresponding codelabels

Finally, t he FINISH instr uct ion implements th e BCPL FINISH command

Itcompiles to stop(O) in code and causes execut ion to ter minate

it is to execute T his is t he role of t he directives

T heBCPL OCODEmachine manages a globals area , a stac k and a codesegment T he runtime system must be told how much space to allocate toeach It must also be told where globals are to be located and where literalpools start and end, so t hat modules can be linked Th e syste m also needs

to know which symbols are exported from a module and where modules startand end

Th e BCPLglobal vector is a case in point T here is no a priori limit on

t he size of t he global vector In addit ion, two modules can assign differentvalues to a par ticular cell in t he global vector (wit h all t he ordering problems

t hat are so familiar)

of direct ives T he directives in t he version ofBCPLt hat is current at t he time

of writi ng (Summer, 2004) are as shown in Table 2.5 Th e direct ives are used

in different par ts of t he syst em, so are briefly explained in t he following fewpara graphs

Table 2.5. OeODE direciioes.

DirectiveSTACKs

STOREITEMN n

DATALAB LnSECTIONNEEDSGLOBAL n K IL l K nL n

Trang 36

24 2 VMs for Portability: BCPL

The STACKdirective informs the code generator of the current size of thestack This is required because the size of the current stack frame can beaffected by some control structures, for example those that leave a block inwhich local variables have been declared

The STOREdirective informs the code generator that the point separatingthe declarations and code in a block has been reached Any values left on thestack are to be treated as variable initialisations and should be stored in theappropriate places

Static variables and tables are allocated in the program code area usingthe ITEMNdirective The parameter to this directive is the initial value of thecell that is reserved by this directive For a table, the elements are allocated

by consecutive ITEMNdirectives The DATALABdirective is used to associate alabel with a data area reserved by one or moreITEMN directives

The SECTION and NEEDSdirectives are direct translations of the SECTIONand NEEDS source directives The latter are used to indicate the start of aBCPL module and the modules upon which the current one depends

An OCODE module is terminated with the GLOBAL directive The ments denote the number of items in the global initialisation list and each oftheK, are offsets into the global vector and L n is the label of the correspond-ing offset (i.e.,KiLidenotes an offset and the label to be associated with thatoffset)

argu-Directives are an important class of virtual machine instruction, althoughlittle more will be said about them One reason for this is that, once onebecomes aware of their need, there is little else to be said A second reason

is that, although every system is different , there are things that are common

to all-i-in this case, the general nature of directives It is considered thatthe directives required by any virtual machine will become clear during itsspecification

2.6 The IntcodejCintcode Machine

The Intcode/Cintcode machine is used to bootstrap an OCODE machine on

a new processor; it can also serve as a target for the BCPL compiler's codegenerator The code is designed to be as compact as possible The Cintcodemachine was originally designed as a byte-stream interpretive code to run

on small 16-bit machines such as the Z80 and 6502 running under CP1M.

More recently, it has been extended to run on 32-bit machines, most notablymachines running Linux

The best descriptions of the Intcode and Cintcode machines are [45] and[44], respectively Compared with OCODE, (Ci/I)ntcode is an extremely com-pact representation but is somewhat more complex The complexity arises be-cause of the desire to make the instruction set as compact as possible; this isreflected in the organisation which is based on bit fields The organisation ofthe machine is, on the other hand , easily described The following description

Trang 37

2.6 T he Intcodej Cin t code Machine 25

is of t he original Intcode machine and follows t hat in [45] (the account in [44]

is far more det ailed but is essentially the same in intent )

The Int code machine is composed of t he following components A ory consisti ng of equal-sized locations that can be addresse d by consecutiveintegers (a vector of words, for examp le).Ithas a number of central registers:

mem-A ,B : t he accumulator and auxi liary accumulator;

C : the contro l register.This is the inst ruction pointer; it points to t he nextinstruction to be execut ed;

D : t he address register,used to store the effective address of an instruction ;

P : a pointer t hat is used to address t he current stack fram e;

G: a pointer used to access the global vector

Note t hat t he Intco de machine has a framed stack and a global vector (bothnecessary to implement OeODE)

Inst ruct ions come in two length s: single and double lengt h T he compilerdete rmines when a double-length inst ruct ion should be used

T he operations provided by the Intcode machine are shown in Table 2.6(t he idea is taken from [45], p 134; the specification has been re-writtenusing mostly e conventions) As in the OeODE instructions, each operation

is specified by a code fragment

Table 2.6 The Intcode machin e fun ctions.

Operation Mnemonic Specification

Load L B : = A: A:= D

Sto re S *D : = A

J ump J C : = D

J ump if true T IF A THEN C := D

J ump if false F IF NOT A THEN C := D

Ca ll routine K D:= P + D

*D : = P; *(D+l) := C

P := D; C : = A

Execute operation X Various operations, mostly arithmetic of

logical op erations op erating on A

and B.

Each Intcode instruction is composed of six fields Th ey are as follows:

• Function Par t : T his is a three-bit field It specifies one of th e eight possiblemachine operations described in Tab le 2.6

• Address Field: T his field holds a posit ive integer It represents th e init ialvalue of the Dregister

• D bit : This is a single bit When set, it specifies that t he initial value ofthe Dregister is to be taken from the following word

Trang 38

26 2 VMs for Portability: BCPL

• P bit: This is single bit It specifies whether the Pregister is to be added

to the Dregister at the second stage of an address calculation

• G bit: This is another single bit field It specifies whether the Gregister

is to be added to the Dregister at the end of the third stage of addresscalculation

I bit : This is the indirection bit Ifit is set , it specifies that the Dregister

is to be relaced by the contents of the location addressed by the Dregister

at the last stage of address calculation

The effective address is evaluated in the same way for every instruction and

is not dependent upon the way in which the machine function is specified.Intcode is intended to be a compact representation of a program Itis alsointended to be easy to implement, thus promoting BCPL's portability (theBCPL assembler and interpreter for Intcode occupies just under eight and ahalf pages of BCPL code in [45])

The Intcode machine also uses indirection (as evidenced by the stage address calculation involving addresses in registers), thus making codecompact

three-This has, of necessity, been only a taster for the Intcode and Cintcodemachines The interested reader is recommended to consult [44] and [45] formore information The full BCPL distribution contains the source code of theOCODE and Cintcode machines ; time spent reading them will be rewarding

Trang 39

The Java Virtual Machine

3.1 Introduction

It is argua ble t hat Java and its "compile once, run anywhere" slogan st arted

th e curre nt int erest in virtual machines; indeed, it would appea r to have ularised the t erm "virt ual machine"

pop-This cha pt er is organised as follows First , t he J ava language is briefly

in-t rod uced Nexin-t , in-t he gross organisain-t ion of in-t he J ava Virin-t ua l Machine-in-th e J VMfor short-will be described In t hat sect ion, t he sto rage orga nisat ion used by

t he J VM and t he organisat ion of t he stack is presented and major conceptssuch as t he Run t ime Constant Pool and its cont ents, including t he methodareas and class file represent ation are int rod uced T he inst ruct ion point er (orprogram counte r- pc in J VM terms) is also briefly discussed T his is followed

by a relatively short description of t he "class file" , a run tim e representation

of each class; t his descript ion is followed by a brief out line of so-called "classresolut ion," t he process of locat ing and load ing classes at run t ime

Section 4 is concerned wit h t he J VM's instructi on set T he inst ruction setcan be described in a number of ways A subset of t he instruction set is clearl y

ty ped, while anot her is not Some inst ruct ions are at a relati vely high level(e.g., t hose dealing with locks and except ion), while ot hers (e.g., j umps andarit hmeti c) are not Fin ally, t here are special-pur pose instruct ions direct lytailored to J ava 's needs: t hose dealing wit h locks, monitors, and method anddat a location, for example

T he final sect ion acts as a summary of the main points.It also discusses

t he relationship between t he main components of t he JVM and th e sourcestruct ure of J ava program s

It is not possible, given t he space available here, t o describe t he J VMexhaust ively Inst ead , all t hat can be done is t o give t he reader a genera limpression t hat is det ailed in some places and superficial in ot hers Readersinterest ed in t he exact det ails should consult [33] For informat ion about J avaits elf, t he language definit ion [22] should be consulte d It is, of cours e, notpossible to underst and t he det ails of t he J VM in complete det ail un less t he

Trang 40

28 3 Th e Java Virtual Machine

languag e is completely understood.Itis st rongly recommend ed that interestedreaders should consult both of these tex ts

3.2 JVM Organisation: An Overview

This section contains an overview of th e JVM 's organisation This description

is based upon the publish ed specification [33]

Th e JVM is a st ack-based virtual machine.Itcontains these basic tures:

struc-• The heap store (main store) ;

• Th e stack;

• Th e method area;

• Runtime Const ant Pools;

• The PC regist er;

The stack and the "class file" objects are stored inth e heap , as are t he stant Pools and th e method area.In addition, t here should be st ruct ures to

Con-support threads and monitors- th ey will be considered only (Section 3.9).The JVM specification is silent on issues pertaining to the heap 's man-agement A JVM can use a mark and scan , stop- and- copy or a generationalgarbage collector , th erefore A pur e reference-counting storage managementregime cannot be used, however, unless it is supported by some oth er mech-anism Th e reason for this is that circular links can exist between ent it iesstoredinthe heap (J ava has an imperat ive semant ics and, th erefore, supportsassignment )

Th ere are, in fact , two st acks in th e JVM specification: th e "nat ive code"

st ack (or "C stack") and the "Java stack" Th e first can be disposed of fairlyreadily It is t he st ack used to implement th e JVM itself; it is also the st ackused for intermediat e storage by all t he primitive rout ines in a JVM and byall the code implementin g JVM inst ruct ions Additional primitives, as well

as native methods, are implement ed using t he "native code" st ack. In most

implementations, thi s st ack will be th e one used by t he C runtime system.This stack will not be of further interest because it is beyond the cont rol ofthe JVM specification

The other stack is the JVM st ack prop er.It is a framed st ack A stackframe is allocated when cont rol ente rs a method.Itis deallocat ed when cont rolreturns from th e method th at caused its allocation Th ere are two cases ofreturn from a method:

• Normal return This is performed by t he execut ion of a return instruction

• Abnormal return Thi s is performed when an exception of some kind iscaught by a handl er th at is not inside th e met hod invocat ion associatedwith the stack frame

Ngày đăng: 15/02/2016, 10:01

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
1. Abelson , H., and Sussman, G. J ., The St ructure and Int erpretat ion of Comput er Programs, MIT Pr ess, Ca mbridge MA, 1985 Sách, tạp chí
Tiêu đề: The St ructure and Int erpretat ion of Comput er"Programs
3. Ait-Kaci, H., Warren's Abstra ct Machin e, MIT P ress, Cambridge MA, 1991 Sách, tạp chí
Tiêu đề: Warren's Abstra ct Machin e
4. App el, A. W ., Compiling with Continuation s, CUP, 1992 Sách, tạp chí
Tiêu đề: Compiling with Continuation s
5. Appel , A. W ., Modern Compiler Impl em entation in Java, CUP, 1998 Sách, tạp chí
Tiêu đề: Modern Compiler Impl em entation in Java
6. Baillar guet , C., MVV : langage et systeme, plus qu 'un mariage de raison , Journe es des Jeun es Chercheurs en System e, Rennes , France, June, 1999 Sách, tạp chí
Tiêu đề: Journe es des Jeun es Chercheurs en System e
7. Bailey, R. FP1M Abstract Syntax Descript ion , Int ern al Repor t , Dept. of Com- puting, Imp erial College, London , 1985 Sách, tạp chí
Tiêu đề: FP1M Abstract Syntax Descript ion
8. Barratt , R ., Ramsey, A., and Sloman , A., Pop -Ll : A Pract ical Language for Artificial Int elligence, Ellis Horwood , Chichest er, En gland , 1985 Sách, tạp chí
Tiêu đề: Pop -Ll : A Pract ical Language for"Artificial Int elligence
9. Bell, J ames R., Threaded Code, Com munications of the A CM, Vol. 16, No. 6, pp . 370- 72, 1973 Sách, tạp chí
Tiêu đề: Com munications of the A CM
10. Blaschek, G., Object-Ori ent ed Programmi ng with Prototypes, Springer-Verlag, Heidelberg, 1994 Sách, tạp chí
Tiêu đề: Object-Ori ent ed Programmi ng with Prototypes
11. Bobrow , D. G., and Stefik, M., Th e LOOPS Manual, Xerox PARC, Palo Alto, CA,1983 Sách, tạp chí
Tiêu đề: Th e LOOPS Manual
12. Brin ch Hansen , P., Structured Multiprogramming, CACM, Vol. 15, No.7, pp . 574-578, 1972 Sách, tạp chí
Tiêu đề: CACM
13. Cra ig, 1. D., Reflecting on Time, Proc. Inti . Congress on Cyberne tics and Sy s- tems, Int ernational Cyb ern eti cs Society, 1999 Sách, tạp chí
Tiêu đề: Proc. Inti . Congress on Cyberne tics and Sy s-"tems
14. Craig, 1. D., HM pap er from Freiburg, 2003. Event -b ased Introspection and Communcation, ESSCS Annual Conference, Freiburg, Germ any, August , 2003 Sách, tạp chí
Tiêu đề: ESSCS Annual Conference
15. Craig,1. D., Th e Int erpretation of Object-Orient ed Programming Languages, 2nd edn., Spr inger-Verlag, London , 2002 Sách, tạp chí
Tiêu đề: Th e Int erpretation of Object-Orient ed Programming Languages
16. Diehl, Steph en , Semantics-Directed Generation of Compilers and Abstract Ma- chin es, Ph. D. Dissert at ion, University of Saarbriicken, Germ any, 1996 Sách, tạp chí
Tiêu đề: Semantics-Directed Generation of Compilers and Abstract Ma-"chin es
17. Diehl, Stephen , A generat ive methodology for th e design of abst ra ct machines, Science of Computer Programm ing, Vol. 38, pp. 125-142 ,2000 Sách, tạp chí
Tiêu đề: Science of Computer Programm ing
18. Field, A. J ., and Harri son , P. G., Fun ctional Programming Addison-We sley, Wokingham, England, 1988 Sách, tạp chí
Tiêu đề: Fun ctional Programming
19. Folliot , B., Virtu al Virtual Machine Project , Invited Talk , Simpo sio Brasileiro de Arquitetura de Computadores e Processam ento de Alto Desempenho (SBAC '2000), 2000 Sách, tạp chí
Tiêu đề: Simpo sio Brasileiro"de Arquitetura de Computadores "e "Processam ento de Alto Desempenho
20. Friedman, D. P., Wand, M., and Haynes, C.T ., Essentials of Programming Lan- guages, 2nd edn , MIT P ress, Cambridge, MA, 2001 Sách, tạp chí
Tiêu đề: Essentials of Programming Lan-"guages
21. Goldberg, A., and Robson, D., Smalltalk-80: The Language and Its Implemen- tation, Addison-Wesley, Reading, MA, 1983 Sách, tạp chí
Tiêu đề: Smalltalk-80: The Language and Its Implemen-"tation

TỪ KHÓA LIÊN QUAN