Tìm sai số làm tròn cho các chương trình Java sử dụng Symbolic PathFinder : M.A Thesis Information Technology : 60 48 01

VIETNAM NATIONAL UNIVERSITY, HANOIUNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN BAO NGOC FINDING ROUND-OFF ERROR FOR JAVA PROGRAMS USING SYMBOLIC PATHFINDER MASTER THESIS OF INFORMATIO

Trang 1

VIETNAM NATIONAL UNIVERSITY, HANOI

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

NGUYEN BAO NGOC

FINDING ROUND-OFF ERROR FOR JAVA PROGRAMS USING SYMBOLIC

PATHFINDER

MASTER THESIS OF INFORMATION TECHNOLOGY

Trang 2

VIETNAM NATIONAL UNIVERSITY, HANOI

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

NGUYEN BAO NGOC

FINDING ROUND-OFF ERROR FOR JAVA PROGRAMS USING SYMBOLIC

PATHFINDER

Major: Computer science

Code: 60 48 01

MASTER THESIS OF INFORMATION TECHNOLOGY

SUPERVISOR: PhD Truong Anh Hoang CO-SUPERVISOR: PhD Le Trong Vinh

Trang 3

Originality Statement

I hereby declare that this submission is my own work and to the best of myknowledge it contains no materials previously published or written by anotherperson, or substantial proportions of material which have been accepted for theaward of any other degree or diploma at University of Engineering and Technology(UET/Coltech) or any other educational institution, except where due acknowl-edgment is made in the thesis Any contribution made to the research by others,with whom I have worked at UET/Coltech or elsewhere, is explicitly acknowl-edged in the thesis I also declare that the intellectual content of this thesis is theproduct of my own work, except to the extent that assistance from others in theproject's design and conception or in style, presentation and linguistic expression

is acknowledged

Date:

Signed:

Trang 4

Nowadays, with the explosion of mobile and embedded devices, the need for plications for these devices has been increasing non-stop Development process

ap-of these programs is commonly comprised ap-of two phases: developing phase taken

on personal computers and deployment phase taken on actual devices The latterphase includes porting which requires changing oating-point numbers and opera-tion to xed-point and here often occur round-o errors between two versions ofthe program

In this thesis, we present a novel approach to produce a precise representation ofthe round-o error using symbolic computation With this representation, we cananalyze various aspects of the error such as nding largest round-o error usingoptimization tools like Mathematica, or doing error bound-checking using SMTsolvers We already implemented a tool using Symbolic PathFinder to generatesymbolic output expression of a Java program Using that symbolic output expres-sion, we execute symbolically round-o error part to get precise symbolic round-oerror of the original program Experiments executed with simple Java programswill demonstrate the eectiveness our symbolic round-o error

Publications:

* Anh-Hoang Truong, Huy-Vu Tran, Bao-Ngoc Nguyen Finding Round-O Error Using bolic Execution In Proceedings of the Fifth International Conference Knowledge and Systems Engineering (KSE 2013).

Trang 5

First and foremost, I would like to express my greatest appreciation to my visor - Dr Truong Anh Hoang It is no doubt that without his patient guidance,continuous contributive involvements and unceasing eorts to inspire me through-out the process of researching, this thesis would have never been accomplished

super-I am also grateful to my co-supervisor, Dr Le Trong Vinh, for his tive advice, my lecturers in Software Engineering Department for their supportiveassistance, and my colleagues from Information Technology Faculty for their un-conditional help to which I am indebted

collabora-Finally, it must be my shortcoming not to mention supports from my belovedfamily and friends They have been always encouraging me a lot Thank you!

Trang 6

Table of Contents

2.1 Number representation 3

2.1.1 Floating-point numbers 3

2.1.2 Fixed-point numbers 4

2.2 Symbolic execution with Symbolic PathFinder 6

2.2.1 Symbolic execution 6

2.2.2 Symbolic PathFinder 7

2.3 Related works 8

Trang 7

3 Symbolic round-o error 10

3.1 Symbolic round-o error 10

3.2 Constraints 13

3.3 Symbolic round-o error for expressions 14

3.4 Symbolic round-o error for programs 14

3.5 Applications of symbolic round-o error 15

4 Experiments 16 4.1 Implementation 16

4.1.1 Our extended Symbolic PathFinder 19

4.2 Experimental results 21

4.2.1 Experiment with simple program 21

4.2.2 Experiment with a polynomial of degree 5 23

4.2.3 Experiment with Taylor series of sine function 24

4.3 Discussion 24

Trang 8

List of Figures

2.1 A symbolic execution tree 6

2.2 Symbolic PathFinder overview [16] 7

2.3 CANA system 9

3.1 An example program 15

4.1 System architecture overview 17

4.2 Java PathFinder architecture 20

4.3 Listener architecture 21

4.4 Mathematica problem for example in Figure 3.4 22

Trang 9

List of Tables

2.1 32 bits binary representation of oat number 118.625 4

2.2 16 bits xed-point binary representation of 118.625 5

4.1 Top round-o errors in 100.000.000 tests with round to the nearest 26

4.2 Top round-o errors in 100.000.000 tests with round towards −∞ 27

Trang 10

List of Abbreviations

AI Ane Interval 8, 9

ALU Arithmetic Logic Unit 5

CI Classical Interval 8,9

EAI extended ane interval 8, 9

FPU Floating-Point Unit 1, 5

JPF Java PathFinder 7

SIM Subscriber Identity Module 1

SMT Satisability Modulo Theories 2, 24

SPF Symbolic PathFinder 3,6, 7,18

Trang 11

Chapter 1

Introduction

In recent years, software is everywhere It comes bundled in nearly everything:from large, easily recognizable personal computers to your pocket-sized phones;even in unexpected things like a Subscriber Identity Module (SIM) card1 As theresults of rapid technology development, the size of these devices are becomingsmaller and smaller, which leads to easier and smoother adoption in every aspect

of life The demand for software on those devices are, because of that, also beinggained signicant traction

Nevertheless, development process of embedded software has not changed much Itoften get written on computers while embedded devices are on which it get to run

As those devices may or may not be equipped with Floating-Point Units(FPUs),they may use dierent numbers representations: oating-point or xed-point, re-spectively This might cause round-o errors, also called rounding errors Al-though round-o errors often quite small, they could be accumulated throughtime and result in unrecoverable system failure Historically, round-o error hashad severe consequences, such as those encountered in a Patriot Missile Failure [8].While traditional round-o error is the dierence between the exact (correct) re-sult and the approximate result that a computer generates, we only address inthis thesis the dierence which a program may produce from same inputs whenexecuted with oating-point or xed-point

Indeed, there are three common types of round-o errors: real numbers versus

point numbers, real numbers versus xed-point numbers, and

oating-1 a common embedded device used for identication purpose with mobile network carriers

Trang 12

Chapter 1: Introduction

point numbers versus xed-point numbers This thesis is based on our previouswork [20] where we focus on last type of round-o errors for two main reasons.First, with the wide-spread use of mobile and embedded devices, many applica-tions developed for personal computers are now ported to run on these platforms.Secondly, even with new applications, it is impractical and time consuming todevelop complex algorithms directly on embedded devices Hence, many complexalgorithms are developed and tested on personal computers that use oating-pointnumbers before they are ported to embedded devices that use xed-point numbers.Our work was inspired by recent approaches to round-o error analysis [12, 13]that use various kinds of intervals to approximate round-o errors Instead of ap-proximation, we try to build symbolic representations of round-o errors based onthe idea of symbolic execution [9] The symbolic representation, called 'symbolicround-o error', is a function over program parameters that precisely representsthe round-o error of the program

The symbolic round-o error allows us to analyze various aspects of round-oerror First, to nd the maximum round-o error, we only need to nd the optima

of the symbolic round-o error in the (oating-point) input domain We usuallyrely on an external tool such as Mathematica [21] for this task Second, to check

if there is an error above a threshold or to guarantee that the round-o error

is always under a given bound we can construct a numerical constraint and use

Satisability Modulo Theories (SMT) solvers to nd the answers We can alsogenerate test cases that are optimal in terms of producing the largest round-oerror

The rest of the thesis is structured as follows The next chapter is some ground In chapter 3 we extend the traditional symbolic execution to includeround-o error information so that we can build a precise representation of theround-o error for a program Then in chapter4we present our implementation to

back-nd the maximum round-o error and provide some experimental results Finally,chapter 5 concludes the thesis

Trang 13

Unlike real world, in computers, real numbers only have nite representation due

to restricted number of bits used to store them In this section we will discussabout two types of real number representation: oating-point and xed-pointnumbers

2.1.1 Floating-point numbers

IEEE Standard for Floating-Point Arithmetic (IEEE 754) [7, 17] denes binaryrepresentations for 32-bit single-precision oating-point numbers with three parts:the sign bit, the exponent, and the mantissa or fractional part The sign bit is 0 ifthe number is positive and 1 if the number is negative The exponent is an 8-bitnumber that ranges in value from -126 to 127 The mantissa is the normalizedbinary representation of the number to be multiplied by 2 raised to the powerdened by the exponent

Trang 14

Chapter 2: Background

Example 2.1 Consider the representation of oat number 118.625

The number 118.625 is a positive number, so the sign bit is 0 To nd the exponentand mantissa, rst write the number in binary, which is 1110110.101 Next, nor-malize the number to 1.110110101x26, which is the binary equivalent of scienticnotation The exponent is 6 and the mantissa is 1.110110101 The exponent must

be biased, which is 6 + 127 = 133 The binary representation of 133 is 10000101

Table 2.1: 32 bits binary representation of oat number 118.625

Sign (1 bit) Exponent (8 bits) Mantissa (23 bits)

In Example 2.2, assume a 16-bit fractional number with 8 magnitude bits and 8radix bits, which is typically represented as 8.8 representation Like most signedintegers, xed-point numbers are represented in two's complement binary Using

a positive number keeps this example simple

Example 2.2 16 bits xed-point representation of 118.625

Trang 15

To encode 118.625, rst we need to nd the value of the integer bits The binaryrepresentation of 118 is 01110110, so this is the upper 8 bits of the 16-bit number.The fractional part of the number is represented as 0.625 × 2n where n is thenumber of fractional bits Because 0.625 × 256 = 160, you can use the binaryrepresentation of 160, which is 10100000, to determine the fractional bits Thus,the binary representation for 118.625 is 0111011010100000 The value is typicallyreferred to using the hexadecimal equivalent, which is 76A0

Table 2.2: 16 bits xed-point binary representation of 118.625

Integer part (8 bits) Fraction (8 bits)

01110110 10100000

The major advantage of using xed-point representation for real numbers is that

xed-point adheres to the same basic arithmetic principles as integers Therefore,

xed-point numbers can take advantage of the general optimizations made to the

Arithmetic Logic Unit (ALU) of most microprocessors, and do not require anyadditional libraries or any additional hardware logic On processors without a

FPU, such as the Analog Devices Blackn Processor, xed-point representationcan result in much more ecient embedded code when performing mathematicallyheavy operations

In general, the disadvantage of using xed-point numbers is that xed-point bers can represent only a limited range of values, so xed-point numbers are sus-ceptible to common numeric computational inaccuracies For example, the range

num-of possible values in the 8.8 notation that can be represented is +127.99609375

to -128.0 If you add 100 + 100, you exceed the valid range of the data type,which is called overow In most cases, the values that overow are saturated, ortruncated, so that the result is the largest

Assume we use xed-point format (2, 11, 4) and we have the oating-point number1001.010111 Then the corresponding xed-point number is 1001.0101 and theround-o error is 0.000011

Note that there are two types of lost bits in xed-point computation: overowerrors and round-o errors and we only consider the latter in this work, as theyare more dicult to detect

Trang 16

of them open-source: NASA's SPF1 for Java, UIUC's CUTE and jCUTE2, ford's KLEE3, and UC Berkeley's CREST 44, etc Symbolic execution tools arenow used in industrial practice at Microsoft (Pex, SAGE, YOGI and PREx),IBM (Apollo), NASA and Fujitsu (SPF) [4].

Stan-The most crucial idea behind symbolic execution [9] is to use as input values bolic values instead of actual data, and to represent values of program variables

sym-as symbolic expressions As a result, the outputs computed by a program are pressed as a function of the symbolic inputs Unlike concrete execution, because

ex-of symbolic values, a program can take any feasible path in symbolic execution Insoftware testing, symbolic execution is used to generate a test input for each exe-cution path of a program All the execution paths of a program can be representedusing a tree, called the execution tree [3] (see Figure 2.1 for example)

Figure 2.1: A symbolic execution tree

1 http://babelsh.arc.nasa.gov/trac/jpf/wiki/projects/ jpf-symbc

2 http://osl.cs.uiuc.edu/~ksen/cute/

3 http://klee.llvm.org/

Trang 17

• Constraint complexity: This is one of the main reasons that make symbolicexecution fails to scale on large programs because solvers cannot nd thesolution for too complex queries.

2.2.2 Symbolic PathFinder

Symbolic PathFinder is an extension of Java PathFinder (JPF) to perform bolic execution of Java programs Overall extension overview can be seen inFigure 2.2 It combines symbolic execution with model checking and constraintsolving for automated generation of test inputs that guarantee high code coverageand error detection in programs with unspecied inputs [14]

sym-Figure 2.2: Symbolic PathFinder overview [16]

In this thesis, an extended version ofSPF plays an important role in the rst step

to generate symbolic round-o error from Java programs

Trang 18

2.3 Related works

Overow and round-o error analysis has been studied from the early days ofcomputer science because both xed-point and oating-point number representa-tions and computations have its own problem Most work addresses both overowand round-o error, for example [7, 17] Because round-o error is more subtleand sophisticated, we focus on it in this work, but our idea can be extended foroverow error

As we mentioned, there are three kinds of overow and round-o errors: realnumbers versus oating-point, real numbers versus xed-point, and oating-pointnumbers versus xed-point numbers Many previous works focus on round-oerror with real results, cf [10] Here we focus on the last type of round-o error.The most recent work that we are aware of is of Ngoc and Ogawa [12, 13] Theauthors develop a tool called C ANAlyzer (CANA) (see Figure 2.3) for analyzingoverows and round o errors CANA ouputs round-o error ranges of variables ateach point of the program and warning about overow errors if they occur Theypropose a new interval, the extended ane interval (EAI), to estimate round-oerror ranges instead of the Classical Interval (CI)[1] andAne Interval (AI) [18]

Trang 19

Figure 2.3: CANA system

• Classical Interval was rst time introduced in the 1960s by Moore [11] as

a method to putting bounds on round-o errors in mathematical tions CI is simple but imprecise

computa-• Ane Interval provides higher precision because it introduces symbolic nipulations on noise symbols to handle correlations between variables which

ma-CI lacks

EAI has several advantages over CI and AI First, it avoids the problem of troducing new noise symbols of AI, therefore its form is more compact than AI'sform Second, it is more precise than CI because it can store information ofuncertainty [12] But it is still as imprecise as our approach

Trang 20

in-Chapter 3

Symbolic round-o error

In the last chapter, we showed some background material that is crucial for thethesis In this chapter, rst we will present a symbolic computation that takes intoaccount round-o errors inspired from [13, 9] Then we will extend the discussion

to example programs, which will be simplied to a set of arithmetic expressionswith constraints

3.1 Symbolic round-o error

Let R, L, and I be the sets of all real numbers, all oating-point numbers, andall xed-point numbers, respectively L and I are nite because a xed number ofbits are used for their representation For practicality, we assume that the number

of bits in xed-point format is not more than the number of signicant bits in the

oating-point representation, which means we assume I ⊂ L ⊂ R

Let's assume that we are working with a real arithmetic function y = f(x1, , xn)where x1, , x and y are in R and f is an arithmetic expression over x1, , xn.For simplicity, we denote x0

∈ L the rounded value of x and x00

∈ I the roundedvalue of x0

As arithmetic operations on oating-point and xed-point may be also dierent(in precision), we denote fl and fi the oating-point and xed-point version of f,respectively, where real arithmetic operations are replaced by the corresponding

Định dạng
Số trang	41
Dung lượng	826,39 KB