Symbolic Execution Symbolic execution [6] is a way of executing a program in which the program variables that contain concrete values are replaced by their symbolic counterparts that ex
Trang 1A Parameterized Unit Test Framework Based on Symbolic Java PathFinder
Anh-Hoang Truong and Thanh-Nhan Vu
College of Technology Vietnam National University
144 Xuan Thuy, Hanoi, Vietnam Email: {hoangta, nhanvt.mcs07}@vnu.edu.vn
Abstract – Parameterized unit test recently gains a lot of
attention as it saves testing cost and is more efficient in term
of code coverage We present a framework for running
parameterized unit tests (PUT) based on Java PathFinder
(JPF) and JUnit Our approach bases on model checking and
symbolic execution of JPF for generating standard unit tests
As a result, we achieve high structural and path coverage
The generated unit tests are automatically executed by JUnit
so programmers receive immediately assertion failures if
any Currently, our approach mainly works with numeric
and boolean data type but it is possible to extend our
framework for other data types such as string
Keywords – Testing, Parameterized Unit Test
I INTRODUCTION There are many examples of the damages caused by
software errors, especially when software is ubiquitous
According to a report by the US National Institute of
Standards and Technology, software failures cost the US
economy $60 billion per annum, but the improvements in
software testing infrastructure are still limited while it
could save one-third of this cost [21]
Software testing [1], the most commonly used
technique for validating the quality of software, takes
about half of the total cost of software development One
of the reasons is because the practice of testing is still
manual in various phases Automated testing helps
developers reduce the cost of producing software and
increase the reliability of software Numerous methods
and researches have been proposed to support and
automate some parts of the software testing
To produce test data, random testing generates
randomly a stream of bits and sends to the program as
input parameters The main advantage of this method is its
simplicity It also does not require special machines and
or computing resources However, this approach is not
efficient as one path of a program may be executed many
times while some other paths are difficult to get executed
In other words, it is difficult to archive the adequacy
criterion [5] Symbolic execution can resolve the
drawbacks of random testing In this method the variables
that normally contain concrete values are replaced by their
symbolic counterparts, which express a range of possible
values using symbolic expressions Base on these
symbolic data, the model checker generates input data set
that covers all of possible executions of the program It
usually requires an external solver [3] to find the solutions for symbolic expressions
Model checking has been a popular topic since last two decades and recently it is widely used to analyse software programs However, model checking is hard due
to the complexity of the code and it often cannot completely analyse the program’s state space due to the large amount of memory it requires and the path explosion problem For these reasons, many popular model checkers rely on abstractions to reduce the size of the state space [20] However, these techniques are not well suited for handling code that manipulates complex data as they introduce too many predicates, making the abstraction process inefficient
Java PathFinder [2] uses model checking [4] without relying on abstraction (that cannot always achieve good code coverage), but augmenting it with symbolic execution It allows us to extend its capabilities by listening to events whenever a bytecode instruction is executed We base on this feature to add an extension that allows us to run parameterized unit tests, which creates standard unit tests of JUnit We then reply on JUnit test framework to run these generated unit tests The result is that the programmers do not have to write unit tests and the generated unit tests ensure high path coverage This is our main contribution in this paper
The rest of the paper is organized as follows Section
2 discusses about symbolic execution of JPF and other related backgrounds that are used in the later sections Section 3 shows some related works Section 4 gives detail of our approach We show some experimental results in Section 5 and conclusion in Section 6
II BACKGROUND
A Unit Test and Parameterized Unit Test Unit Test (UT) [11, 12]: is a concept in traditional
testing techniques A unit test is a method that has no parameter and returns void type UT is used to test a single unit of code Each UT contains 3 parts: input values, a sequence of instructions and assertions An UT
is failed when any of its assertions is violated or an exception is thrown The disadvantage of UT is that it can
check some specific execution paths of a program only Parameterized Unit Test (PUT) [7, 11, 12] is an improvement to unit test A PUT is a unit test with parameters PUT allows accepting different input values that are passed via parameters Usually these input values are generated automatically by a tool
2009 International Conference on Knowledge and Systems Engineering
Trang 2The relationship between UT and PUT is shown in
Figure 1 Traditional UTs can be generalized to PUTs and
PUT can instantiate back UTs
Figure 1: The relationship between UT and PUT
B Symbolic Execution
Symbolic execution [6] is a way of executing a
program in which the program variables that contain
concrete values are replaced by their symbolic
counterparts that express a range of possible values using
symbolic expressions In symbolic execution, values of
variables and return values of programs are symbolic
expressions consisting of symbolic input During the
execution process of a program P, if the value of a
variable depends on the input parameters, the machine
will calculate a symbolic value to replace the concrete
value of the variable Given a variable x, the symbolic
value of x can be expressed by one of the following
formats:
(a) An input symbol
(b) A formula consisting of symbolic values and
operators
(c) A formula consisting of symbolic values and
concrete values and operators
Operators in symbolic execution can be addition (+),
subtraction (-), multiply (*) or divide (/), etc When a
program is executed in symbolic mode, concrete types are
replaced with corresponding symbolic types and concrete
operations are replaced with calls to methods that
implement corresponding operations on symbolic
expressions
Figure 2 shows an example of symbolic execution
The lower part is the symbolic execution tree of the
program above it In the tree, numbers outside the boxes
are the line numbers of the statements in the program
In symbolic execution, the states of a single thread
program consist of three parts: symbolic values of the
expressions, a path condition (PC), which is a set of
constraints on the values that we have to find to execute
on that path, a program counter which indicates the next
statement to be executed
PC is a boolean formula over input variables and
describes which conditions must be true in the state PC
accumulates constraints that the inputs must satisfy in
order for an execution to follow the particular path When
working with variables of an array type, PC needs to add a
condition to ensure that there is no out-of-bound array
access
A symbolic execution tree (SET) can be used to
characterize all execution paths of program A symbolic execution tree of a program is a (possibly infinite) tree where nodes are program states during symbolic execution and arcs are possible transitions between states All the leaf nodes of a SET where the PC is satisfiable represent the final states of programme while the paths from the root to these nodes represent the different execution paths Moreover, all feasible execution paths of the program are represented in SET All satisfiable valuations for a PC (in a leaf node) will give us
a real input and execution paths with all those inputs are equal, and the number of concrete executions may be infinite
public void Swap(int x, int y){
1: if (x > y) {
2: x = x + y;
3: y = x - y;
4: x = x - y;
5: if (x - y > 0)
6: assert(false);
}
Figure 2: A simple program and its symbolic execution tree There are two types of symbolic execution:
• Static symbolic execution: in every branching
point, PC is updated and a constraint solver [3] determines whether the appropriate path is feasible If a path is not feasible, the execution backtracks to previous node so only feasible paths are executed
• Dynamic symbolic execution: is a symbolic
execution technique base on dynamic program analysis where a program can be executed many times with different values of input parameters First, each input parameter is given a random value, the program is executed with these values and constraints are collected in the execution process and new constraints are generated automatically base on collected constraints Test input generators use an algorithm to generate a set of input data so that all executable paths are examined The algorithm is sketched as follows:
Trang 3The program is executed with random values of
parameters and a given depth of SET (that useful when
program contains recursions or infinite loops) First, SET
is initialized Then symbolic values of variables are
calculated based on the given values of input parameters
and path constraints are generated accordingly When a
new constraint is generated, a new node is added to SET
When the execution finishes, the path of SET is set to
examined With concrete values, a concrete path of SET is
examined, and a new node is created for the other
branches and this node is marked as non–examined The
new node will store the appropriate path constraints
After a path is completed, a non-examined node is
chosen and a new constraint is built by collecting of all
the constraints in nodes that belong to the path from the
root of SET to the chosen node The new constraint is sent
to a constraint solver The constraint solver will
determine:
• If the constraint is not satisfied, another
non-examined is chosen and above step is repeated
• If the constraint is satisfied, the constraint solver
will generate concrete values for the input
parameters These values will be used for the
next concrete execution
• The algorithm terminates when all possible paths
of SET are examined
All of concrete values of input parameters and
analysis information of appropriate concrete execution
(method summary) are used for generating UT and
reporting purpose
Figure 3: Testing system structure
C Lazy Initialization
Lazy initialization is an algorithm for generalizing
traditional symbolic execution to support advanced
constructs of modern programming languages, such as
Java and C++ With lazy initialization, un-initialized
variables will be initialized when they are first met during
the execution of the program [19]
III RELATED WORKS Several approaches and tools have been proposed and
developed to generate test cases based on symbolic
execution [8, 10, 11, 13, 14, 16] Microsoft Pex (Program
EXploration), a model checker under active development,
can execute parameterized unit test (PUT) [7, 16, 18],
which inspires this work
Pex can generate a traditional unit test suite with
high code coverage It allows writing, executing
parameterized unit test in NET environment A
parameterized unit test is simply a method that takes
parameters, calls the code under test, and states assertions
[15] Given a parameterized unit test written in a NET
language, Pex automatically produces a small unit test
suite with high code and assertion coverage It performs a systematic white - box program analysis It learns the program behaviour by monitoring execution traces, and uses a constraint solver to generate new test cases with different behaviours However, the main drawbacks of Pex are it can analyse NET applications only and we can execute only one PUT in each run
Some researches introduce a method to support
symbolic execution in Java environment by instrumenting
Java bytecode [9, 19, 20] In tools like JCUTE and CUTE, original Java programs are converted to programs in other simpler architecture language (such as SCIL, Jimple) [17] and augmented with code to support symbolic execution
TABLE I EXAMPLES OF CONVERTING A PROGRAM FROM J AVA TO SCIL
p[i] = q[j]; t1 = p + i;
t2 = q + j;
*t2 = *t1;
assert (x > 0);
x = 10;
if (x > 10) goto L; ERROR;
L: x = 10;
if (x > 0)
x = 1;
else
x = -1;
y = x + y;
if (x <= 0) goto L1;
x = 1;
if (true) goto L2; L1: x = -1;
L2: y = x + y;
Table I shows examples of converting a program from Java to SCIL As we can see the array reference is changed to pointer operations in the first row and the assert command is converted to an if statement in the second row
The instrumented programs then are tested by the model in Figure 3 The original program is instrumented with additional code so that the Test Executor can talk with Test Input Selector to get test data Then the data is sent to Test Generator to produce unit tests
IV OUR APPROACH
A Java PathFinder Structure
JPF is a special Java Virtual Machine that executes a program not just once (like a normal VM), but theoretically in all possible ways It helps checking property violations like deadlocks or unhandled exceptions along all potential execution paths If it finds
an error, JPF reports the whole execution path that leads
to it Unlike a normal debugger, JPF keeps track of every step how it got to the defect as it is augmented with a capability of storing states, matching states and backtracking during its execution
The capability of storing states and matching states helps JPF check every state before executing it The state
is only executed if it is a new state (it has never been executed before); otherwise JPF goes back to the nearest state that has never explored (the backtracking capability) The key mechanisms of Symbolic JPF are:
Bytecode Instruction Factory: by default, a standard
JVM interprets bytecode instructions according to
Trang 4concrete execution semantics, but in symbolic JPF,
user-made JVM uses an instruction factory to replace or extend
standard concrete execution semantics of bytecodes with
non-standard symbolic execution BCEL library
(ByteCode Engineering Library) is used in the interpreting
process Furthermore, JPF allows replacing this standard
execution semantics by using a configurable
InstructionFactory
Attributes associated with the program state:
JPF manages program states similar to the standard
JVM does Each state consists of a call stack, a heap that
stores values of the fields, and scheduling information
The call stack consists of stack frames corresponding to
methods that are being executed Each stack frame stores
symbolic information in local variables and stack
operands
Attributes that associated with the program values are
called slot attributes These attributes store the symbolic
values and expressions that are created during symbolic
execution
The combination of the above mechanisms allows
dynamic modification of execution semantics, i.e.,
changing mid-stream from a system-level, concrete
execution semantics to symbolic execution semantics,
thus providing the integrated test generation capability
described later in this paper
Symbolic JPF also uses other mechanisms such as:
Choice generators (CG): This is for handling
branching conditions during symbolic execution CG
creates a non-deterministic choice in JPF’s search and
adding the condition (or its negation) to the corresponding
path condition A constraint solver is used for checking
whether the path condition is satisfying or not Symbolic
JPF uses one of two constraint solvers: Choco and
IASolver
Listeners: There are two types of listener: search
listener and VM listener for printing the results of the
symbolic analysis and for enabling dynamic change of
execution semantics, respectively
Native peer: This is Model Java Interface (MJI) that
helps execution in JPF interact with lower level virtual
machines Symbolic JPF uses MJI for modelling native
libraries, e.g., to capture java.lang.Math library calls and
to send them to the constraint solver
B Combine Concrete Execution and Symbolic
Execution
Program unit-level testing is well supported in
Symbolic JPF Because Symbolic JPF can combine
concrete execution and symbolic execution, it helps to
execute a method by combining concrete values and
symbolic values of the input parameters
For symbolic execution, we need to configure JPF so
that it can use SymbolicInstructionFactory class – a tool
allows interpreting bytecode to a symbolic execution
semantic [2] We also need to indicate which method that
needs be executed, which input parameters and global
fields are concrete, which are symbolic
In the first stages, the program is concretely executed
since all the symbolic bytecode Instruction classes
delegate execution to the “concrete” super-class if there are no symbolic attributes associated with the data
A listener monitors the concrete execution of the program within JPF’s VM and it generates symbolic execution the first time the method with the specified name is invoked, it calculates and sends new symbolic values in the attributes of the specified symbolic inputs (parameters and global fields)
From that point, the methods invoked by the designated method continue to process the symbolic information stored in the attributes Once the method returns, JPF prints out the method summary
We can write a special listener to monitor concrete variable conditions in concrete execution When conditions are met, it starts symbolic execution with the corresponding method It is also possible to switch back to concrete execution, which means it can solve the constraints in the current path condition, computes the corresponding concrete values of the program variables and executes in concrete mode
The capability of analysing a unit with mixed concrete and symbolic inputs helps us to use concrete execution of a program to set up different concrete global contexts for the unit-level symbolic analysis This also helps reduce the complexity of path conditions in the symbolic analysis This useful when we want to test a complex program where path coverage is impossible by splitting the state space to several sub-spaces using some concrete parameters mixed with symbolic ones
C PUT Generation in Symbolic JPF
The idea is using the combination of dynamic symbolic execution and lazy initialization in symbolic JPF
to build a system, which can accept a PUT and generate appropriate UTs so that they form a test suite that has high path coverage
Symbolic JPF already supports symbolic execution with input parameters of numeric types (int, float, long, double) Based on this feature of JPF, we develop a testing framework that allows us to write PUT for testing Java classes that have methods with input parameters of numeric types Methods are executed on symbolic inputs that represent all possible concrete inputs Values of variables are represented as symbolic expressions A constraint solver processes these constraints of symbolic expressions to generate concrete test inputs that will make the execution follows the chosen path
To test the generated UTs we need JUnit – a testing framework for the Java programming language that allows
us to write and execute unit tests After writing PUTs we pass them to Symbolic JPF to generate concrete UTs that will be automatically executed in JUnit
The structure of an UT in JUnit is as follows:
public ClassName(String name) { super(name);
}
protected void setUp() throws Exception { super.setUp();
}
protected void tearDown() throws Exception { super.tearDown();
} //JUnit Test Methods
}
Trang 5Our PUTs will follow the same structure so that we need
not to rewrite assertion library First, we must configure
Symbolic JPF so that JPF can execute PUT The
configuration is set as follows:
//notice JPF to use symbolic bytecode
+vm.insn_factory.class=gov.nasa.jpf.symbc.Symbol
icInstructionFactory
// uses Listener for result reporting
+jpf.listener=gov.nasa.jpf.symbc.SymbolicListene
r
// uses Math libraries
+vm.peer_packages=gov.nasa.jpf.symbc:gov.nasa.jp
f.jvm
//Choose a Constraint Solve
+symbolic.dp=iasolver
// Name of symbolic method and type of
parameters: concrete or symbolic
+symbolic.method=UnitUnderTest(sym#sym#con)
// Name of class that contains symbolic method
Main
Symbolic JPF is a virtual machine so the testing class
must have a main method to start symbolic execution In
the main method, the testing method that needs to be
executed symbolically is called with concrete values
In testing classes, there is no main method so for
these classes can be executed in JPF directly we add
PUTDriver component to work around this problem
PUTListener class allows the system to control execution
process, report results and generate UTs
Figure 6 Architecture of the PUT framework
Our proposed system has architecture as in Figure 6
The main component of the system is PUTRunner, which
uses a PUTDriver to generate data for the PUTRunner
PUTListener will listen to events generated by
PUTRunner to generate unit tests and they will be
executed as standard unit tests
V EXPERIMENT
We show here a simple method Euclid_GCD
(Euclid’s algorithm to find the greatest common divisor of
two integer numbers) in class NeedTest that we need to
test
while (x!=y){
if(x>y) x = x-y;
if(x<y) y = y-x;
}
return x;
}
We write a PUT method that test the above method as
follows
void testEuclid(int x, int y){
int z;
NeedTest ut = new NeedTest ();
Assume.isTrue(x>0 && x<100 && y>0 && y<100);
z = ut.Euclid_GCD(x, y);
Assert.assertEquals(x/(x/z),y/(y/z));
}
}
In this class, we need Assume statement to limit range
of the inputs value With this limitation, we reach the working range of Euclid GCD algorithm (all pair of
non-negative integer numbers: x > 0 && y > 0) and avoid the path explosion in symbolic execution (x < 100 && y <
100)
Executing this PUT we produce the generated test units as follows
extends TestCase{
private PUT cls_PUT = new PUT();
public GeneratedJUnits_by_PUT(String name){ super(name); }
protected void setUp()throws Exception { super.setUp(); }
protected void tearDown()throws Exception { super.tearDown(); }
public void testNeedTest_1_100(){
cls_PUT.testEuclid(1,100);}
public void testNeedTest_1_1(){
cls_PUT testEuclid (1,1); }
public void testNeedTest_3_2(){
cls_PUT testEuclid (3, 2); }
public void testNeedTest_8_5(){
cls_PUT testEuclid (8, 5); }
cls_PUT testEuclid (21, 13); }
cls_PUT testEuclid (55, 34); } JUnit automatically executes these unit tests so we immediately get assertion errors if any This is really useful because the generated test units cover all paths and
we save a lot of manual work when having to write all of them down, especially for large functions and projects
VI CONCLUSION Base on Symbolic JPF and JUnit, we have proposed a parameterized testing framework for Java and implemented it as a tool Our experiments show a working tool that can save manual work of creating test cases and increase the coverage while maintain a compact test suite
in the sense that there is no two test cases that the program runs on the same path More important, our approach is extensible to hand branching condition of other data types such as string We plan to add this functionality in the next release of the tool
ACKNOWLEDGEMENT This research was partly supported by Vietnam National University, Hanoi under the project QGTD.09.02 We thank Binh-Duong Tran (K50) for the initial implementation of the tool
REFERENCES [1] B Beizer “Software Testing Techniques,” Van Nostrand Reinhold Co., New York,NY, USA, 2nd edition, 1990
[2] Corina Pasareanu, “Combining Unit-level Symbolic Execution and System-level Concrete Execution for Testing NASA Software,” ISSTA’08 paper
[3] Daniel Kroening, Ofer Strichman “Decision Procedures: An Algorithm Point of View,“ © 2008 Springer-Verlag Berlin Heidelberg
[4] Edmund M Clarke, Orna Grumberg, and Doron A Peled
“Model Checking,” The MIT Press, January 2000
Trang 6Adequacy ACM Computing Surveys,” 29 (4) ISSN 0360-0300,
December 1997, pp 366–427
[6] J C King, “Symbolic execution and program testing,”
Communications of the ACM, 19(7):385–394, 1976
[7] J de Halleux and N Tillmann, “Parameterized unit testing with
Pex (tutorial),” In Proc of Tests and Proofs (TAP’08), volume
4966 of LNCS, pages 171–181, Prato,Italy, April 2008 Springer
[8] Jon Edvardsson, “Techniques for Automatic Generation of Tests
from Programs and Specifications,” Department of Computer and
Information Science, Linkoping SE-581, Sweden 2006
[9] Kari Kähkönen, “Evaluation of Java PathFinder Symbolic
Execution Extension,” June, 2007
[10] Michael, C.C.; McGraw, G.E.; Schatz, M.A.; Walton, C.C,
“Genetic algorithms for dynamic test data generation,”
Automated Software Engineering, 1997 Procee-dings., 12th
IEEE International Conference Volume , Issue , 1-5 Nov 1997
Page(s):307 – 308
[11] N Tillmann and W Schulte, “Unit tests reloaded: Parameterized
unit testing with symbolic execution,” IEEE Software, 23(4):38–
47, 2006
[12] N Tillmann and W Schulte, “Parameterized unit tests,” In
Proceedings of the 10thEuropean Software Engineering
Conference held jointly with 13th ACM SIG-SOFT International
Symposium on Foundations of Software Engineering, pages 253–
262 ACM, 2005
[13] Patrice Godefroid, “Compositional dynamic test generation,” In
Proceedings ofthe 34th ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages (POPL), pages 47-54
ACM, 2007
[14] Patrice Godefroid, Nils Klarlund, and Koushik Sen Dart,
“Directed automated random testing,” In Proceedings of the ACM
and Implementation (PLDI), pages 213-223 ACM, 2005 [15] Peli de Halleux and Nikolai Tillmann, “Parameterized Test Patterns For Effective Testing with Pex,” Copyright Microsoft Corporation.October 21, 2008
[16] Petri Ihantola, “Automatic test data generation for programming exercises withsymbolic execution and Java PathFinder,” Master's thesis, Helsinki University of Technology, Departement of Theoretical Computer Science, 2006
[17] Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie J Hendren, Patrick Lam, and Vijay Sundaresan Soot, “A Java bytecode optimization framework,” In Proceedings of the 1999 conference
of the Centre for Advanced Studies on Collaborative Research (CASCON), page 13 IBM, 1999
[18] S Anand, P Godefroid, and N Tillmann, “Demand-driven compositional symbolic execution,” In Proc of TACAS’08, volume 4963 of LNCS, pages 367–381 Springer, 2008
[19] Sarfraz Khurshid, Corina S Pasareanu, and Willem Visser,
“Generalized symbolic execution for model checking and testing,” In 9th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), volume
2619 of Lecture Notes in Computer Science, pages 553-568 Springer, 2003
[20] Saswat Anand, C S Pasareanu, and W Visser, “JPF-SE: A symbolic execution extension to Java PathFinder,” In Proc of the 13th TACAS Conference, 2007
[21] Willem Visser, Corina S Pasareanu, and Sarfraz Khurshid, “Test input generation with Java PathFinder,” In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), pages 97-107 ACM, 2004