1. Trang chủ
  2. » Công Nghệ Thông Tin

Modern progamming languages: ByteCode and Virtual Machines

97 359 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 97
Dung lượng 659,71 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Old days: No Virtual MachineYou write: Program in source language Source language specification MyProg.cpp Book: “The C++ programming language” Source-to-machine compiler Program in mac

Trang 1

Modern programming languages:

ByteCode and Virtual Machines

CSE 6329, Spring 2011 Christoph Csallner, UTA

Trang 2

Old days: No Virtual Machine

You write: Program in source language

Source language specification

MyProg.cpp

Book: “The C++

programming language”

Source-to-machine compiler

(Old) Microsoft Visual Studio language”

Trang 3

Old days: No Virtual Machine

You write: Program in source language

Source language specification

MyProg.cpp

Book: “The C++

programming language”

Source-to-machine compiler

Program in machine code MyProg.exe in MS

Windows x86 binary

Machine instruction set

(Old) Microsoft Visual Studio

language”

Program in

Intermediate

Representation

Trang 4

Today: Virtual Machines popular

You write: Program in source language MyProg.java

Source language spec Java SpecificationSrc-to-bytecode comp. javac MyProg.java

Program in bytecode MyProg.class

Bytecode language spec JVM Specification

java MyProg

Virtual machine

Trang 5

Program Analysis today

• Many programs compiled to bytecode

– Virtual machine executes bytecode

• Bytecode has advantages over source language

• Many Program Analyses analyze bytecode

– Results translated back to your original Java/C#/… source program

• Example program anlyses that are very easy to use:

– For Java: FindBugs: http://findbugs.sourceforge.net/

– For C#: Pex for fun: http://www.pexforfun.com/

Trang 6

Big picture

You write: MyProg.java

Source to bytecode compiler

E.g.: javac, MS Visual Studio

Program analysis E.g.: FindBugs, Pex

Bytecode: MyProg.class

Virtual machine, e.g.:

JVM, Net runtime

Trang 7

Why is bytecode good for Program

Analysis?

Trang 8

Simple yet powerful

• Bytecode is simpler than source language

– Similar to compiler IR

– Simplifies analysis

– Java, C#, VB, F#, etc are far more complex

Trang 9

Simple yet powerful

• Bytecode is simpler than source language

– Similar to compiler IR

– Simplifies analysis

– Java, C#, VB, F#, etc are far more complex

• Retains most information of source language

– Similar to compiler IR

– Enables meaningful analysis

Trang 10

• Fewer language elements = less “syntactic sugar”

• Example: Explicit loop constructs in Java

– Sourcecode: 4

• Which ones?

Trang 11

• Fewer language elements = less “syntactic sugar”

• Example: Explicit loop constructs in Java

– Sourcecode: 4

• while, do (“until”), basic for, enhanced for

– Bytecode: 0

• ?

Trang 12

• Fewer language elements = less “syntactic sugar”

• Example: Explicit loop constructs in Java

– Sourcecode: 4

• while, do (“until”), basic for, enhanced for

– Bytecode: 0

• All 4 are mapped to jumps

– Makes program analysis easier to implement

Trang 13

• Still a non-trivial, Turing-complete language

– As least as expressive as Java source language

– Supports all legal Java source programs (and more)

• Bytecode retains most information of original source program

– Allows automatic reconstruction of source from bytecode– “Dis-assembler” fast, powerful, and convenient

Trang 14

• Several “dis-assembler” libraries provide a nice API to retrieve and even change bytecode

– Beyond capability of Java or C# built-in reflection

– BCEL and ASM for Java bytecode

– ExtendedReflection (part of Pex) for Net bytecode

Trang 15

Documented Standard

• Carefully designed and specified

– Better than most compiler IR

• Java Virtual Machine specification

Trang 16

Shared Standard

• Shared standard among different languages

– Java, C#, VB, F#, etc all compiled to same bytecode

– Programs in many source languages can be checked with single Program Analysis tool

• Shared standard among different operating systems

– Cell phones, mainframe, etc all run same bytecode

– Programs on many OS can be checked with single tool

Trang 17

Old days: Typically no shared intermediate

language

You write: MyProg.cpp You write: MyProg.ada

MyProg.exe in Windows x86 MyProg in Linux x86

Linux Windows

Visual

Studio

Trang 18

Bytecode:

Shared intermediate language

You write: MyProg.java You write: MyProg.cs You write: MyProg.ada

Trang 19

Many software engineering papers focus on combination of Java source with Java bytecode

• Probably easiest to understand

• Other combinations work similarly

• Well documented, many research papers

• Industrial-strength, but still relatively simple

– C# started with Java-like features

– But C# grew faster  more complex now

– C++ more complex than Java

– Other combinations more obscure

Trang 20

javac compiler implements our source-bytecode combination

Bytecode: MyProg.class

Java virtual

machine

JVM spec

Trang 21

• Following overview gives a flavor

– Slightly simplified: Details may differ from JVM

– Omits several parts: Exceptions, floating point, …

• May be intimidating

– But remember that you can typically use a powerful

disassembler to help with bytecode

• Following mostly copied from Java virtual machine specification 2nd edition:

http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html

Trang 23

Structure of the Java Virtual Machine

= Sections of chapter 3 of JVM Spec

1 The class file format

2 Data types

3 Primitive types and values

4 Reference types and values

5 Runtime data areas

6 Frames

7 …

Trang 24

Class file format

• Standard format for Java bytecode

• JVM accepts bytecode only in class file format

• JVM Spec, Section 4, defines class file format

– Contents

– Order

– Representation

– Verification [Section 4.9]

Trang 25

Class file format

• Binary format

• Independent of hardware and OS

– Fixes byte order (“endianness”),

regardless of byte order of current machine

• Independent of actual files, despite the name

• Class may arrive at runtime as a byte array from

elsewhere

– From a class generator

– From the web

Trang 26

Class/interface  class file

• 1:1 mapping between (class or interface) and class file

– Class file can define a class or an interface

– Each class is defined in its own class file

– Each interface is defined in its own class file

• Applies to top-level types and nested types

– Java compiler creates a separate class file for each nested class

Trang 27

Basic organization

• Class file = stream of bytes, 1 byte = 8 bits

• Multibyte items stored in big-endian

= High byte first

• Read consecutive bytes

• Interpret consecutive bytes as unsigned number

– 8 bit item = 1 byte [0 255]

– 16 bit item = 2 consecutive bytes [0 65,535]

– 32 bit item = 4 consecutive bytes [0 4,294,967,295]

– 64 bit item = 8 consecutive bytes

[0 18,446,744,073,709,551,615]

Trang 28

Class file data types

• Own simple data types

– Different from Java data types

– Different from JVM data types

– Neither “byte” nor “int” nor “long”

• Just three types

– u1 = unsigned byte

– u2 = unsigned 2 consecutive bytes: (high, low)

– u4 = unsigned 4 consecutive bytes

Trang 29

Class file structure

Trang 30

Class/Interface

Header

Trang 32

• Magic number

• First four bytes of a Java class file

• Each class file has the same magic number

• Helps OS recognize this file as a Java class file

• Value is 3405691582 = CAFEBABE in hex

• More on CafeBabe:

– http://www.artima.com/insidejvm/whyCAFEBABE.html

Trang 33

minor_version, major_version

• Together define the version of the class file format used in the class file

• Tells JVM if it understands the format of the class file

– An older JVM can reject to load a class file, if the class file

is in a class file format that was defined after the JVM was released

Trang 34

Constant Pool

of this Class/Interface

Trang 36

Constant Pool

• Constants from user source program

– Constant String objects, int, float, long, double

• Internal String values

– Unicode character sequences

• Names and signatures of

– Classes, interfaces, methods, fields

Trang 38

Constant Pool

• constant_pool_count = Number of entries in the

constant_pool (+ 1)

• constant_pool = Sequence of cp_info items

• cp_info = {u1 tag; u1 info[]; }

• Tag byte defines the kind of cp_info, e.g.:

– 3 indicates a CONSTANT_Integer_info

• Info array holds the actual data, e.g.:

– Info array of CONSTANT_Integer_info is one u4

Trang 39

Index into Constant Pool

• u2 value

– Greater than zero

– Less than constant_pool_count

• Example

– constant_pool_count = 7

– 1 = Index of first element

– 6 = Index of last element

Trang 40

Constant String Objects

• Declared in the user program as constant objects of the type String, e.g.:

– String s = “CSE 6329 rocks”;

• CONSTANT_String_info = {

u2 string_index; } // index into cp

• cp at string_index must be a CONSTANT_Utf8_info

Trang 41

Internal String Values

• Holds a character sequence

– Each character is a Unicode character

– Each character represented by 1, 2, or 3 bytes

• Used for both user program constant objects and

internal Strings (method signatures, etc.)

Trang 42

Access Rights

of this Class/Interface

Trang 44

Class/Interface Access Rights: access_flags

• Bit mask – each bit represents a flag

• Each flag represents an access permission or a

property of this class or interface

– Flag = (class/interface) was declared …

– Flag = (class/interface) was declared …

Trang 45

Class/Interface Access Rights:

Public or Default

• Class/interface either has public flag set or not

– No “private” or “protected” flags

• Public flag set

– Access from within or outside its package

• Default access rights, if public flag not set

– Access only from within its package

Trang 46

Direct Subclass Relation

Trang 47

Name and Direct Super-Class

of this Class/Interface

Trang 48

– Name of class or interface

– In “internal” notation: Replace “.” with“/”

– Example: “java/lang/Object”

Trang 49

• If this class file defines a class,

– super_class must be zero or an index into the cp

• If super_class is zero

– This class file must represent java.lang.Object – the root class of the Java class hierarchy

• If super_class is non-zero,

– cp at super_class must be a CONSTANT_Class_info

representing the direct super class

Trang 50

• If this class file defines an interface

– super_class must be an index into the cp

– cp at super_class must be a CONSTANT_Class_info for

java.lang.Object

• This is a bit confusing

– An interface does not have a super class

– E.g., the instance method getSuperclass() of java.lang.Classreturns null if invoked on an interface

Trang 51

Direct Interfaces

of this Class/Interface

Trang 53

• interfaces_count = Number of direct super interfaces

• Interfaces = Array of indices into cp

• Cp at each index must be a

CONSTANT_Class_info that represents a direct super interface

Trang 55

of this Class/Interface

Trang 56

• fields_count = Number of fields declared by this class

or interface

– Includes static fields and instance fields

– Does not include any inherited fields

• fields = Sequence of field_info items

– Each field_info represents one field declared by this class

or interface

Trang 57

• field_info = {

– u2 access_flags; // Access rights

– u2 name_index; // Simple name

– u2 descriptor_index; // Type

– u2 attributes_count; // Attributes

– attribute_info attributes[attributes_count]; }

Trang 58

Field Access Rights field_info access_flags

• Flag = Field was declared …

– The field is accessible …

Trang 59

Field Access Rights

• Only one of the access flags (public, private,

protected) may be set

• “Default” access, if no access flag is set

– Only within its package

• Reminder from Java Spec: Class X can access a field C.f only if it can access class C.

– Public field f may not be accessible for class X

Trang 60

More Field Access Rights field_info access_flags

• Flag = Field was declared …

• 0x0008 = static

– Class field (one per class)

– Not an instance field (one per instance)

• 0x0010 = final

– No further assignment after initialization

• 0x0040 = volatile

• 0x0080 = transient

Trang 61

Field Signature

• cp[name_index] is a CONSTANT_Utf8_info

– Simple name of field, e.g.:

– double[] foo; // “foo”

– static Object bar; // “bar”

Trang 62

Descriptor Notation

• Cryptic type notation used in Java bytecode

– Notation Java type interpretation

– C char Unicode character

– L<name>; reference instance of <name>

– [ reference one array dimension

Trang 63

Descriptor Notation

– Notation Java type interpretation

– B byte 8 bit signed integer

Trang 64

Field Attributes:

attributes[attributes_count]

• attributes_count = Number of attributes for this field

• attributes = Sequence of attribute_info items

– Each attribute_info represents one attribute

• Examples:

– @Deprecated int myDeprecatedField = 0;

– @Deprecated @MyAttribute int otherField = 1;

Trang 65

of this Class/Interface

Trang 67

• methods_count = Number of methods declared by this class or interface

– Includes static methods and instance methods

– Includes constructors and static initializers

– Does not include inherited methods

• methods = Sequence of method_info items

– Each method_info represents one method declared by this class or interface

Trang 68

• method_info {

– u2 access_flags; // Method access rights

– u2 name_index; // Simple name

– u2 descriptor_index; // Signature

– u2 attributes_count; // Attributes

– attribute_info attributes[attributes_count]; }

Trang 69

Method Access Rights method_info access_flags

• Next two slides identical to field access rights

– Public, private, protected, default

• Fields, constructors, methods are all “members” of a class or interface

– Similar access right rules

Trang 70

Method Access Rights method_info access_flags

– The method is accessible …

Trang 71

Method Access Rights

• Only one of the access flags (public, private,

protected) may be set

• “Default” access, if no access flag is set

– Only within its package

• Reminder from Java Spec: Class X can access a

method C.m only if it can access class C.

– Public method m may not be accessible for class X

Trang 72

More Method Access Rights method_info access_flags

• Flag = Field was declared …

• 0x0008 = static

– Class method (called independent of instance)

– Not an instance method (which needs an instance as a

“receiver instance” or “this parameter”)

• instance.method(p2, p3, )

• 0x0010 = final

– May not be overridden by sub-classes

Trang 73

More Method Access Rights method_info access_flags

• Flag = Field was declared …

Trang 75

Method Signature

• cp[descriptor_index] is CONSTANT_Utf8_info

– (Parameter types) Return type

– In same cryptic notation as field types

– “V” = void is also a legal return type

– Never includes a “receiver type”

• Examples

– public int foo() { } // “()I” instance method– MyClass(long p) {} // “(J)V” constructor

– static { bar = 5; } // “()V”

Trang 76

Method Attributes:

attributes[attributes_count]

• attributes_count = Number of attributes for this

method

• attributes = Sequence of attribute_info items

– Each attribute_info represents one attribute

– Code attribute, present iff the method is neither abstract nor native

– Exceptions attribute, lists declared exceptions

– @Deprecated attribute

Trang 77

Code of a method/constructor/clinit:

In a Code Attribute

• Code_attribute {

– { u2 start_pc; u2 end_pc; u2 handler_pc; u2 catch_type; }

exception_table[exception_table_length];

– attribute_info attributes[attributes_count]; }

Trang 78

of this Class/Interface

Trang 80

Class/Interface Attributes:

attributes[attributes_count]

• attributes_count = Number of attributes for this class

or interface

• attributes = Sequence of attribute_info items

– Each attribute_info represents one attribute

Trang 81

Referring to fields/methods

in other classes

Trang 82

• So far: How to define the elements of a class

– Class name

– Access rights of the class

– Fields of the class

Trang 83

CONSTANT_Fieldref_info CONSTANT_Methodref_info

• Reference to a field/method/constructor

• CONSTANT_Fieldref_info { // similar for all

– u1 tag;

– u2 class_index; // type declaring this member

– u2 class_index; // type declaring this member

// CONSTANT_Class_info– u2 name_and_type_index;

// simple name and descriptor// CONSTANT_NameAndType_info}

Ngày đăng: 15/02/2016, 10:02

TỪ KHÓA LIÊN QUAN