VIETNAM NATIONAL UNIVERSITY, HANOIUNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN BAO NGOC FINDING ROUND-OFF ERROR FOR JAVA PROGRAMS USING SYMBOLIC PATHFINDER MASTER THESIS OF INFORMATIO
Trang 1VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN BAO NGOC
FINDING ROUND-OFF ERROR FOR JAVA PROGRAMS USING SYMBOLIC
PATHFINDER
MASTER THESIS OF INFORMATION TECHNOLOGY
Trang 2VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN BAO NGOC
FINDING ROUND-OFF ERROR FOR JAVA PROGRAMS USING SYMBOLIC
PATHFINDER
Major: Computer science
Code: 60 48 01
MASTER THESIS OF INFORMATION TECHNOLOGY
SUPERVISOR: PhD Truong Anh Hoang CO-SUPERVISOR: PhD Le Trong Vinh
Trang 3Originality Statement
I hereby declare that this submission is my own work and to the best of myknowledge it contains no materials previously published or written by anotherperson, or substantial proportions of material which have been accepted for theaward of any other degree or diploma at University of Engineering and Technology(UET/Coltech) or any other educational institution, except where due acknowl-edgment is made in the thesis Any contribution made to the research by others,with whom I have worked at UET/Coltech or elsewhere, is explicitly acknowl-edged in the thesis I also declare that the intellectual content of this thesis is theproduct of my own work, except to the extent that assistance from others in theproject's design and conception or in style, presentation and linguistic expression
is acknowledged
Date:
Signed:
Trang 4Nowadays, with the explosion of mobile and embedded devices, the need for plications for these devices has been increasing non-stop Development process
ap-of these programs is commonly comprised ap-of two phases: developing phase taken
on personal computers and deployment phase taken on actual devices The latterphase includes porting which requires changing oating-point numbers and opera-tion to xed-point and here often occur round-o errors between two versions ofthe program
In this thesis, we present a novel approach to produce a precise representation ofthe round-o error using symbolic computation With this representation, we cananalyze various aspects of the error such as nding largest round-o error usingoptimization tools like Mathematica, or doing error bound-checking using SMTsolvers We already implemented a tool using Symbolic PathFinder to generatesymbolic output expression of a Java program Using that symbolic output expres-sion, we execute symbolically round-o error part to get precise symbolic round-oerror of the original program Experiments executed with simple Java programswill demonstrate the eectiveness our symbolic round-o error
Publications:
* Anh-Hoang Truong, Huy-Vu Tran, Bao-Ngoc Nguyen Finding Round-O Error Using bolic Execution In Proceedings of the Fifth International Conference Knowledge and Systems Engineering (KSE 2013).
Trang 5First and foremost, I would like to express my greatest appreciation to my visor - Dr Truong Anh Hoang It is no doubt that without his patient guidance,continuous contributive involvements and unceasing eorts to inspire me through-out the process of researching, this thesis would have never been accomplished
super-I am also grateful to my co-supervisor, Dr Le Trong Vinh, for his tive advice, my lecturers in Software Engineering Department for their supportiveassistance, and my colleagues from Information Technology Faculty for their un-conditional help to which I am indebted
collabora-Finally, it must be my shortcoming not to mention supports from my belovedfamily and friends They have been always encouraging me a lot Thank you!
Trang 6Table of Contents
2.1 Number representation 3
2.1.1 Floating-point numbers 3
2.1.2 Fixed-point numbers 4
2.2 Symbolic execution with Symbolic PathFinder 6
2.2.1 Symbolic execution 6
2.2.2 Symbolic PathFinder 7
2.3 Related works 8
Trang 73 Symbolic round-o error 10
3.1 Symbolic round-o error 10
3.2 Constraints 13
3.3 Symbolic round-o error for expressions 14
3.4 Symbolic round-o error for programs 14
3.5 Applications of symbolic round-o error 15
4 Experiments 16 4.1 Implementation 16
4.1.1 Our extended Symbolic PathFinder 19
4.2 Experimental results 21
4.2.1 Experiment with simple program 21
4.2.2 Experiment with a polynomial of degree 5 23
4.2.3 Experiment with Taylor series of sine function 24
4.3 Discussion 24
Trang 8List of Figures
2.1 A symbolic execution tree 6
2.2 Symbolic PathFinder overview [16] 7
2.3 CANA system 9
3.1 An example program 15
4.1 System architecture overview 17
4.2 Java PathFinder architecture 20
4.3 Listener architecture 21
4.4 Mathematica problem for example in Figure 3.4 22
Trang 9List of Tables
2.1 32 bits binary representation of oat number 118.625 4
2.2 16 bits xed-point binary representation of 118.625 5
4.1 Top round-o errors in 100.000.000 tests with round to the nearest 26
4.2 Top round-o errors in 100.000.000 tests with round towards −∞ 27
Trang 10List of Abbreviations
AI Ane Interval 8, 9
ALU Arithmetic Logic Unit 5
CI Classical Interval 8,9
EAI extended ane interval 8, 9
FPU Floating-Point Unit 1, 5
JPF Java PathFinder 7
SIM Subscriber Identity Module 1
SMT Satisability Modulo Theories 2, 24
SPF Symbolic PathFinder 3,6, 7,18
Trang 11Chapter 1
Introduction
In recent years, software is everywhere It comes bundled in nearly everything:from large, easily recognizable personal computers to your pocket-sized phones;even in unexpected things like a Subscriber Identity Module (SIM) card1 As theresults of rapid technology development, the size of these devices are becomingsmaller and smaller, which leads to easier and smoother adoption in every aspect
of life The demand for software on those devices are, because of that, also beinggained signicant traction
Nevertheless, development process of embedded software has not changed much Itoften get written on computers while embedded devices are on which it get to run
As those devices may or may not be equipped with Floating-Point Units(FPUs),they may use dierent numbers representations: oating-point or xed-point, re-spectively This might cause round-o errors, also called rounding errors Al-though round-o errors often quite small, they could be accumulated throughtime and result in unrecoverable system failure Historically, round-o error hashad severe consequences, such as those encountered in a Patriot Missile Failure [8].While traditional round-o error is the dierence between the exact (correct) re-sult and the approximate result that a computer generates, we only address inthis thesis the dierence which a program may produce from same inputs whenexecuted with oating-point or xed-point
Indeed, there are three common types of round-o errors: real numbers versus
point numbers, real numbers versus xed-point numbers, and
oating-1 a common embedded device used for identication purpose with mobile network carriers
Trang 12Chapter 1: Introduction
point numbers versus xed-point numbers This thesis is based on our previouswork [20] where we focus on last type of round-o errors for two main reasons.First, with the wide-spread use of mobile and embedded devices, many applica-tions developed for personal computers are now ported to run on these platforms.Secondly, even with new applications, it is impractical and time consuming todevelop complex algorithms directly on embedded devices Hence, many complexalgorithms are developed and tested on personal computers that use oating-pointnumbers before they are ported to embedded devices that use xed-point numbers.Our work was inspired by recent approaches to round-o error analysis [12, 13]that use various kinds of intervals to approximate round-o errors Instead of ap-proximation, we try to build symbolic representations of round-o errors based onthe idea of symbolic execution [9] The symbolic representation, called 'symbolicround-o error', is a function over program parameters that precisely representsthe round-o error of the program
The symbolic round-o error allows us to analyze various aspects of round-oerror First, to nd the maximum round-o error, we only need to nd the optima
of the symbolic round-o error in the (oating-point) input domain We usuallyrely on an external tool such as Mathematica [21] for this task Second, to check
if there is an error above a threshold or to guarantee that the round-o error
is always under a given bound we can construct a numerical constraint and use
Satisability Modulo Theories (SMT) solvers to nd the answers We can alsogenerate test cases that are optimal in terms of producing the largest round-oerror
The rest of the thesis is structured as follows The next chapter is some ground In chapter 3 we extend the traditional symbolic execution to includeround-o error information so that we can build a precise representation of theround-o error for a program Then in chapter4we present our implementation to
back-nd the maximum round-o error and provide some experimental results Finally,chapter 5 concludes the thesis
Trang 13Unlike real world, in computers, real numbers only have nite representation due
to restricted number of bits used to store them In this section we will discussabout two types of real number representation: oating-point and xed-pointnumbers
2.1.1 Floating-point numbers
IEEE Standard for Floating-Point Arithmetic (IEEE 754) [7, 17] denes binaryrepresentations for 32-bit single-precision oating-point numbers with three parts:the sign bit, the exponent, and the mantissa or fractional part The sign bit is 0 ifthe number is positive and 1 if the number is negative The exponent is an 8-bitnumber that ranges in value from -126 to 127 The mantissa is the normalizedbinary representation of the number to be multiplied by 2 raised to the powerdened by the exponent
Trang 14Chapter 2: Background
Example 2.1 Consider the representation of oat number 118.625
The number 118.625 is a positive number, so the sign bit is 0 To nd the exponentand mantissa, rst write the number in binary, which is 1110110.101 Next, nor-malize the number to 1.110110101x26, which is the binary equivalent of scienticnotation The exponent is 6 and the mantissa is 1.110110101 The exponent must
be biased, which is 6 + 127 = 133 The binary representation of 133 is 10000101
Table 2.1: 32 bits binary representation of oat number 118.625
Sign (1 bit) Exponent (8 bits) Mantissa (23 bits)
In Example 2.2, assume a 16-bit fractional number with 8 magnitude bits and 8radix bits, which is typically represented as 8.8 representation Like most signedintegers, xed-point numbers are represented in two's complement binary Using
a positive number keeps this example simple
Example 2.2 16 bits xed-point representation of 118.625
Trang 15Chapter 2: Background
To encode 118.625, rst we need to nd the value of the integer bits The binaryrepresentation of 118 is 01110110, so this is the upper 8 bits of the 16-bit number.The fractional part of the number is represented as 0.625 × 2n where n is thenumber of fractional bits Because 0.625 × 256 = 160, you can use the binaryrepresentation of 160, which is 10100000, to determine the fractional bits Thus,the binary representation for 118.625 is 0111011010100000 The value is typicallyreferred to using the hexadecimal equivalent, which is 76A0
Table 2.2: 16 bits xed-point binary representation of 118.625
Integer part (8 bits) Fraction (8 bits)
01110110 10100000
The major advantage of using xed-point representation for real numbers is that
xed-point adheres to the same basic arithmetic principles as integers Therefore,
xed-point numbers can take advantage of the general optimizations made to the
Arithmetic Logic Unit (ALU) of most microprocessors, and do not require anyadditional libraries or any additional hardware logic On processors without a
FPU, such as the Analog Devices Blackn Processor, xed-point representationcan result in much more ecient embedded code when performing mathematicallyheavy operations
In general, the disadvantage of using xed-point numbers is that xed-point bers can represent only a limited range of values, so xed-point numbers are sus-ceptible to common numeric computational inaccuracies For example, the range
num-of possible values in the 8.8 notation that can be represented is +127.99609375
to -128.0 If you add 100 + 100, you exceed the valid range of the data type,which is called overow In most cases, the values that overow are saturated, ortruncated, so that the result is the largest
Assume we use xed-point format (2, 11, 4) and we have the oating-point number1001.010111 Then the corresponding xed-point number is 1001.0101 and theround-o error is 0.000011
Note that there are two types of lost bits in xed-point computation: overowerrors and round-o errors and we only consider the latter in this work, as theyare more dicult to detect
Trang 16of them open-source: NASA's SPF1 for Java, UIUC's CUTE and jCUTE2, ford's KLEE3, and UC Berkeley's CREST 44, etc Symbolic execution tools arenow used in industrial practice at Microsoft (Pex, SAGE, YOGI and PREx),IBM (Apollo), NASA and Fujitsu (SPF) [4].
Stan-The most crucial idea behind symbolic execution [9] is to use as input values bolic values instead of actual data, and to represent values of program variables
sym-as symbolic expressions As a result, the outputs computed by a program are pressed as a function of the symbolic inputs Unlike concrete execution, because
ex-of symbolic values, a program can take any feasible path in symbolic execution Insoftware testing, symbolic execution is used to generate a test input for each exe-cution path of a program All the execution paths of a program can be representedusing a tree, called the execution tree [3] (see Figure 2.1 for example)
Figure 2.1: A symbolic execution tree
1 http://babelsh.arc.nasa.gov/trac/jpf/wiki/projects/ jpf-symbc
2 http://osl.cs.uiuc.edu/~ksen/cute/
3 http://klee.llvm.org/
Trang 17• Constraint complexity: This is one of the main reasons that make symbolicexecution fails to scale on large programs because solvers cannot nd thesolution for too complex queries.
2.2.2 Symbolic PathFinder
Symbolic PathFinder is an extension of Java PathFinder (JPF) to perform bolic execution of Java programs Overall extension overview can be seen inFigure 2.2 It combines symbolic execution with model checking and constraintsolving for automated generation of test inputs that guarantee high code coverageand error detection in programs with unspecied inputs [14]
sym-Figure 2.2: Symbolic PathFinder overview [16]
In this thesis, an extended version ofSPF plays an important role in the rst step
to generate symbolic round-o error from Java programs
Trang 18Chapter 2: Background
2.3 Related works
Overow and round-o error analysis has been studied from the early days ofcomputer science because both xed-point and oating-point number representa-tions and computations have its own problem Most work addresses both overowand round-o error, for example [7, 17] Because round-o error is more subtleand sophisticated, we focus on it in this work, but our idea can be extended foroverow error
As we mentioned, there are three kinds of overow and round-o errors: realnumbers versus oating-point, real numbers versus xed-point, and oating-pointnumbers versus xed-point numbers Many previous works focus on round-oerror with real results, cf [10] Here we focus on the last type of round-o error.The most recent work that we are aware of is of Ngoc and Ogawa [12, 13] Theauthors develop a tool called C ANAlyzer (CANA) (see Figure 2.3) for analyzingoverows and round o errors CANA ouputs round-o error ranges of variables ateach point of the program and warning about overow errors if they occur Theypropose a new interval, the extended ane interval (EAI), to estimate round-oerror ranges instead of the Classical Interval (CI)[1] andAne Interval (AI) [18]
Trang 19Chapter 2: Background
Figure 2.3: CANA system
• Classical Interval was rst time introduced in the 1960s by Moore [11] as
a method to putting bounds on round-o errors in mathematical tions CI is simple but imprecise
computa-• Ane Interval provides higher precision because it introduces symbolic nipulations on noise symbols to handle correlations between variables which
ma-CI lacks
EAI has several advantages over CI and AI First, it avoids the problem of troducing new noise symbols of AI, therefore its form is more compact than AI'sform Second, it is more precise than CI because it can store information ofuncertainty [12] But it is still as imprecise as our approach
Trang 20in-Chapter 3
Symbolic round-o error
In the last chapter, we showed some background material that is crucial for thethesis In this chapter, rst we will present a symbolic computation that takes intoaccount round-o errors inspired from [13, 9] Then we will extend the discussion
to example programs, which will be simplied to a set of arithmetic expressionswith constraints
3.1 Symbolic round-o error
Let R, L, and I be the sets of all real numbers, all oating-point numbers, andall xed-point numbers, respectively L and I are nite because a xed number ofbits are used for their representation For practicality, we assume that the number
of bits in xed-point format is not more than the number of signicant bits in the
oating-point representation, which means we assume I ⊂ L ⊂ R
Let's assume that we are working with a real arithmetic function y = f(x1, , xn)where x1, , x and y are in R and f is an arithmetic expression over x1, , xn.For simplicity, we denote x0
∈ L the rounded value of x and x00
∈ I the roundedvalue of x0
As arithmetic operations on oating-point and xed-point may be also dierent(in precision), we denote fl and fi the oating-point and xed-point version of f,respectively, where real arithmetic operations are replaced by the corresponding