Let Fr 2.5 Accessibility of classes, methods and fields In Java [22], classes are contained in packages and they are accessible bythe code outside the containing packages only if the pac
Trang 1Object-Oriented Programs
byPhung Hua Nguyen
B.Eng., HCMC University of Technology, 1996M.Eng., HCMC University of Technology, 1999
A thesis submitted in fulfillment
of the requirements for the degree of
Doctor of PhilosophySchool of Computer Science and Engineering
University of New South Wales
August, 2005
Trang 2I hereby declare that this submission is my own work and to the best of
my knowledge it contains no materials previously published or written by anotherperson, or substantial proportions of material which have been accepted for theaward of any other degree or diploma at UNSW or any other educational institu-tion, except where due acknowledgement is made in the thesis Any contributionmade to the research by others, with whom I have worked at UNSW or elsewhere,
is explicitly acknowledged in the thesis
I also declare that the intellectual content of this thesis is the product of myown work, except to the extent that assistance from others in the project’s designand conception or in style, presentation and linguistic expression is acknowledged
Trang 3anal-is problematic when parts of the analysed program are not available to pate in analysis In this case, a whole-program analysis has to make conservativeassumptions to be able to produce safe analysis results at the expense of somepossible precision loss.
partici-To improve analysis precision, an analysis can exploit the access controlmechanism provided by the underlying program language This thesis introduces apoints-to analysis technique for incomplete object-oriented programs, called com-pleteness analysis, which exploits the access and modification properties of classes,methods and fields to enhance the analysis precision Two variations of the tech-nique, compositional and sequential completeness analysis, are described Thisthesis also presents a mutability analysis (MA) and MA-based side-effect analy-sis, which are based on the output of completeness analysis, to determine whether
a variable is potentially modified by the execution of a program statement Theresults of experiments carried out on a set of Java library packages are presented
to demonstrate the improvement in analysis precision
Trang 4To my wife and my children.
Trang 5to the developers of the Soot project for providing the optimisation framework inwhich this research was implemented.
I am deeply thankful to my parents for their support and encouragement.Finally, I would like to thank my dearest wife, Thuy Thi Nguyen, for her uncon-ditional encouragement and her belief in my abilities
Trang 6Author’s Publications
• Phung Hua Nguyen and Jingling Xue Strength Reduction for Invariant Types In 27th Australasian Computer Science Conference,Dunedin, New Zealand, January 2004 (Best Student Paper)
Loop-• Phung Hua Nguyen and Jingling Xue Interprocedural Side-effect ysis and Optimisation in the Presence of Dynamic Class Loading In28th Australiasian Computer Science Conference, Newcastle, Australia,January 2005 (Best Paper)
Anal-• Jingling Xue and Phung Hua Nguyen Completeness Analysis for plete Object-Oriented Programs In 14th International Conference onCompiler Construction, Edinburgh, Scotland, April 2005
Trang 7Chapter
1.1 Contributions 3
1.1.1 Completeness Analysis 3
1.1.2 MA-based Side-effect analysis 4
1.2 Thesis organisation 5
2 Background 6 2.1 Model Language 6
2.2 Call Graph Construction Techniques 9
2.3 Whole-program points-to analysis 10
2.3.1 Points-to analysis: accuracy and efficiency 11
2.3.2 Rules for a whole-program points-to analysis 14
2.4 Incomplete program 16
2.5 Accessibility of classes, methods and fields 17
2.6 Readable and writable fields 25
3 Completeness Analysis 26 3.1 Completeness and Detectability 31
3.2 Completeness analysis 33
3.2.1 Compositional completeness analysis (CA) 33
Trang 83.2.2 Sequential completeness analysis (SCA) 40
3.3 Example 47
3.4 Limitations of Completeness Analysis 52
4 Side-effect Analysis 53 4.1 TBAA-based side-effect analysis 55
4.2 PA-based side-effect analysis 57
4.3 MA-based side-effect analysis 59
4.3.1 Inadequacies of completeness analysis for side-effect analysis 60 4.3.2 Classifications of references 63
5 Experimental Results 71 5.1 Experiment Setup 71
5.2 Completeness analysis 74
5.2.1 Analysis Precision 75
5.2.2 Analysis Costs 84
5.3 Side-Effect Analysis 89
5.3.1 Analysis precision 89
5.3.2 Analysis Cost 90
6 Related work 92 6.1 Points-to analysis 92
6.2 Access and Modification Properties of Classes, Methods and Fields 94 6.3 Side-effect analysis 96
7 Conclusion and Future work 99 7.1 Completeness analysis 99
7.2 MA-based side-effect analysis 100
7.3 Future work 101
Trang 9Bibliography 103
Appendix
Trang 10List of Tables
Table
2.1 Instruction set for the IR 7
2.2 The accessibility of a class or a field when UF is in the same package 19 2.3 The accessibility of a class or a field when UF is outside the package 20 2.4 The accessibility of a method when UF is in the same package 21
2.5 The accessibility of a method when UF is outside the package 22
3.1 The solutions when F and W = F ∪ UF are analysed 32
3.2 Solutions of PA, CA-F, CA and SCA for the program in Figure 3.4 49 4.1 Lattice value 66
5.1 Benchmarks 72
5.2 The accessibility of classes, methods and fields 73
5.3 Non-external points-to sets by CA 75
5.4 Non-external points-to sets by SCA 76
5.5 Detectable objects using CA 78
5.6 Detectable objects using SCA 79
5.7 Complete call sites using CA 80
5.8 Complete call sites using SCA 81
5.9 Monomorphic call sites using CA 82
5.10 Monomorphic call sites using SCA 83
Trang 115.11 CSE for field accesses 90
Trang 12List of Figures
Figure
3.1 Three kinds of missing caller-callee relations in CF 27
3.2 An example program 28
3.3 An example program 32
3.4 An example program 48
4.1 Rules for computing MOD 58
4.2 PA-based side-effect checker for whole programs 59
4.3 Rules for computing MMOD 66
4.4 MA-based side-effect checker for programs 69
5.1 Non-external points-to set ratios using CA and SCA 84
5.2 Detectable object ratios using CA and SCA 84
5.3 Complete call site ratios using CA and SCA 85
5.4 Monomorphic call site ratios using CA and SCA 85
5.5 Analysis time ratios of the CA variations over PA 86
5.6 Memory allocation ratios of the CA variations over PA 86
5.7 Analysis time ratios of the SCA variations over PA 87
5.8 Memory allocation ratios of the SCA variations over PA 87
5.9 Common subexpression elimination 89
5.10 Analysis times of CA and SCA relative to that of TBAA 91
Trang 13Static analysis such as points-to or side-effect analysis is significant sincethe information it provides has many applications in compiler optimisations andsoftware engineering These optimisation techniques are devirtualisation, inlining,specialisation, partial redundancy elimination, array bound check elimination, toname just a few In addition, the analysis information has a wide variety of uses insoftware engineering applications For example, the points-to information can beused to construct the call graph of an object-oriented program, which is essentialfor the understanding of the program The side-effect information can also beused during software maintenance to assist in performing and evaluating softwarechanges Furthermore, the points-to information can be used to identify securityvulnerabilities
Interprocedural points-to analysis is a form of static analysis, which analysesmultiple procedures and captures the interactions among these procedures Theanalysis is traditionally designed as whole-program analysis, which processes acomplete program The advantage of processing the entire program is that theanalysis can collect sufficient information about the program This analysis modelhas been used widely in the previous work on points-to and side-effect analyses.However, such whole-program analysis is inadequate when it processes anincomplete program Indeed, a whole-program points-to analysis cannot compute
Trang 14sufficiently points-to information for a reusable component in the absence of thecomponent’s clients In addition, such analysis is not practical when some programcomponents are unknown until they are loaded at run time by a dynamic classloading mechanism Furthermore, some parts of an analysed program, whosecode is in binary form (e.g., native code), can render the whole-program analysisinapplicable.
One approach to processing an incomplete program is that an analysis musttake account of the reciprocal interactions between the incomplete program (i.e.,the analysed code) and an unknown (i.e, unanalysed) code However, some fea-tures of object-oriented languages complicate the interactions Type hierarchysystem allows a reference variable to point to objects of different types and thusthe variable may point to an object of unknown type, i.e., the type declared in theunknown code Dynamic binding mechanism provides a high degree of flexibility
by supporting polymorphic call sites, which may have different targets A callsite in the analysed code may thus invoke a method in the unknown code There-fore, the analysis has to make conservative assumptions to be able to produce safeanalysis results at the expense of some possible precision loss
To improve the analysis precision, the interactions between an incompleteprogram being analysed and an unknown code must be taken into account as accu-rately as possible Indeed, an access control provided by the underlying program-ming language of the analysed program can limit the interactions For example, aprogramming language such as Java provides basic means to restrict the visibility
of classes and their members, and consequently, the ability to modify their states
By exploiting the access control mechanism, an analysis technique can improve itsprecision In this thesis, we propose an analysis, called completeness analysis, tocompute the set of objects pointed to by a given reference in an incomplete object-oriented program The technique exploits the access and modification properties
Trang 15of classes, methods and fields to enhance the analysis precision In addition, wealso introduce one of its applications, called MA-based side-effect analysis, for anincomplete object-oriented program.
to be in the analysed program from those whose targets can be methods in known code Some optimisation techniques such as inlining and devirtualisingcan take advantage of this information In addition, the analysis also identifieswhether an object can be pointed by a reference outside the analysed program ornot The classification can be used by software testing tools
un-We describe two variations of the completeness analysis: compositional pleteness analysis and sequential completeness analysis The compositional com-pleteness analysis works together with a whole-program points-to analysis Byusing ω to represent any object created outside the analysed program, the com-positional completeness analysis can compute completely the points-to set of areference In contrast, the sequential completeness analysis works on the output
com-of a whole-program points-to analysis Due to the incompleteness com-of the analysedprogram, the points-to analysis can produce some points-to sets that are incom-plete, i.e., may not contain all pointed-to objects at run time One of the goals
Trang 16of the sequential completeness analysis is to detect such points-to sets Althoughthe precision of the sequential completeness analysis may be less than that of thecompositional one, the sequential analysis can be implemented independently ofthe implementation of the points-to analysis used.
We evaluate the precision and the performance of our analyses on a largeset of Java library packages Our experiments show that the precision is improvedsignificantly when exploiting the properties of classes, methods and fields
1.1.2 MA-based Side-effect analysis
The second contribution of this thesis is the MA-based side-effect analysisfor incomplete object-oriented programs The analysis relies on the output of thecompleteness analysis and a new technique, called mutability analysis, to deter-mine whether a variable is potentially modified by the execution of a programstatement While the information the completeness analysis provides is sufficient
to determine the side effects of an assignment, it is not sufficient for a call site.The completeness analysis can detect a call site that may invoke an unknownmethod outside the analysed program but it cannot identify whether an objectmay be modified by such a call site or not We present the mutability analysis,which detects objects that may be mutated by an unknown method We also showhow to use the information provided by the completeness analysis and the muta-bility analysis to determine the side effects of a statement on a given variable Wemeasure the precision of the side-effect analysis by the number of redundant loadstatements that can be removed We compare our technique against a side-effectanalysis based on type-based alias analysis, which can be also used for an incom-plete program Our experimental results show that our technique can increase theanalysis precision at small analysis cost
Trang 171.2 Thesis organisation
The remainder of this thesis is organised as follows Chapter 2 reviews thebackground of our work Chapter 3 presents our completeness analysis Chapter 4introduces our MA-based side-effect analysis Chapter 5 shows some experiments
to demonstrate the benefit of our analyses Related work is discussed in Chapter 6.Finally, Chapter 7 summarises the thesis and presents possible directions for futurework
Trang 18This chapter presents some fundamental knowledge that is related to ourwork Section 2.1 introduces an object-oriented language model used in our anal-yses Some well-known call graph construction techniques are presented in Sec-tion 2.2 Section 2.3 provides an overview about points-to analysis techniques forwhole programs In Section 2.4, we define the class of incomplete object-orientedprograms that can be handled by our techniques Section 2.5 summarises theaccess control of classes, methods and fields of Java [22], which is used to de-termine the accessibility of classes, methods and fields by the code outside theanalysed program Finally, Section 2.6 defines the sets of readable and writablefields, which are exploited in our techniques
For simplicity and without loss of generality, we describe our approach for asimple object-oriented language, a subset of Java, with the most relevant featuresfor points-to analysis We use the term class to mean a class or an interface.The term method means an abstract method, an interface method, a nativemethod or a normal method The term reference is used to denote all kinds
of accesses whose type is non-primitive These accesses are non-field variable
Trang 19accesses, array accesses, static field accesses and instance field accesses.
A program that is subjected to a whole-program analysis is defined below
Definition 1 A whole program, W , is a pair W = hLW, ηi, where LW is the set
of classes in W and η is the set of entry methods such that
1 all classes in LW have all their superclasses in LW except the root classjava.lang.Object,
2 all classes loaded during program execution are in LW, and
3 all of the code in W is available to participate in the analysis
Let MW and FW be the set of methods and fields declared in LW, tively Let SW be the set of statements and VW be the set of references appearing
respec-in MW
The points-to analysis for a program is carried out in an intermediate resentation (IR) of the program As our approach is flow- and context-insensitive,the instruction set for our IR is small as shown in Table 2.1 Of the seven state-ments, the first and last are explained below and the other five are self-explanatory
S1 ℓ= new C Object Creation
S3 ℓ= C.f Static Field Load
S4 C.f = r Static Field Store
S5 ℓ= r.f Instance Field Load
S6 ℓ.f = r Instance Field Store
S7 op(a0, , an, ℓ) Call Site
Table 2.1: Instruction set for the IR, where ℓ and r are non-field variables, f afield, op a method name, a0, , an parameters, and C a class name
In Java, objects can be created either explicitly via a statement new orimplicitly via a Java reflection method newInstance In the latter case, the object
Trang 20creation statement can be replaced by ℓ = new C if C is detected statically to bethe class name of this implicitly created object Otherwise, the object creationstatement used is ℓ = new Unknown, where Unknown can be any class in theanalysed program or any new class that may be dynamically loaded at run time.
In this case, the object created by the statement will have the type of Unknown.For notational convenience, each method is denoted op(p0, , pn, r), where
p0, , pn are its n + 1 formal input parameters of reference type and r is itsformal output reference parameter If op is an instance method, then p0 denotesthe input parameter this, which can be accessed inside op without being declared.The return statements in the method are not explicitly represented Instead, everyreturn statement is realised by an assignment to the formal output parameter r.The parameters that are of primitive types are not part of the method signaturebecause they are irrelevant to points-to analysis However, it is understood thatall parameters of the method will need to be used to determine whether a givenmethod is a target of a call site or not
Meanwhile, a call site has correspondingly the form op(a0, , an, ℓ), where
a0, , an are the n + 1 actual input reference parameters and ℓ is the actualoutput reference parameter If op is an instance method, then a0 denotes thereceiver of the call Otherwise, op is static and can be conveniently regarded as
an instance method if p0 is set to be the name of the class in which op is declared
In Java, parameter passing is call by value
Accesses to arrays will be handled as if they were instance field accesses bymeans of introducing a special field, say, sf We do not distinguish accesses todifferent components of an array For example, x[i] and x[j] are both represented
by x.sf
The term fixed call site is used to denote (1) an invokestatic, (2) aninvokespecial, (3) a call site whose (unique) target is declared to be final or
Trang 21in a final class, or (4) a sealed call site [52] All other kinds of call sites aredefined as non-fixed In a fixed call site, all target methods that may be invokedare known at compile time This is obvious in the first three cases The targetmethods of a sealed call site are confined to the underlying sealed package Inaddition, each object creation statement ℓ = new C is associated with a call site If
Cis a concrete class in LW, the call site is fixed because it will invoke a constructor
in C When C is Unknown, the corresponding call site is assumed to invoke allconstructors that may be invoked at the call site according to the language rules
of Java [22] In this case, the call site is non-fixed
A whole-program analysis will start from η and continue to analyse all othermethods invocable directly or indirectly from η Therefore, an analysis may re-quire a call graph to be constructed in advance or simultaneously during analysis
2.2 Call Graph Construction Techniques
A call graph for an analysed program W is represented as an relation,
CW ⊆ SW×MW, such that (s, op) ∈ CW if and only if s is a call site statement and
op is a method that may be a target invoked at the call site In object-orientedlanguages, some call sites are virtual This means that the targets of such callsites are resolved dynamically based on the run-time types of objects pointed bythe underlying receivers Hence, a static whole-program points-to analysis re-quires an approximation of the call graph of the analysed program, which can beconstructed prior to or simultaneously during the analysis
Several methods have been proposed for constructing pre-built call graphs.Dean, Grove and Chambers [9] introduced a class hierarchy analysis (CHA), whichdetermines the targets of a call site based on the type hierarchy in the program.Bacon and Sweeney [4] proposed a rapid type analysis (RTA), which performs aone-pass scan of a program and restricts CHA to classes that appear in object
Trang 22creation statements in the analysed program Tip and Palsberg [45] presentedXTA, which is a simple interprocedural type analysis for approximating the set
of target methods of a call site Sundaresan et al [44] introduced a variabletype analysis (VTA) to determine the set of run-time types of objects that eachreference variable may hold
The call graph of the analysed program can be constructed on-the-fly as thepoints-to analysis [25, 32, 49] proceeds The set of targets of a call site is resolvediteratively based on the computed points-to set of its receiver
2.3 Whole-program points-to analysis
A points-to analysis is a fundamental analysis technique used by optimizingcompilers and software engineering tools A points-to analysis determines the set
of objects that may be pointed to by a given reference
Definition 2 Let W be an analysed program Let v ∈ VW be a reference Let
ØW be the set of objects created in W The points-to set of v is defined as follows:
Trang 23The precision of a points-to analysis is evaluated by the size of the computedpoints-to sets [27, 38] There are several dimensions in the design space of a points-
to analysis that affect the cost versus precision tradeoffs of the analysis In thenext section, we will discuss the cost and accuracy for each dimension
2.3.1 Points-to analysis: accuracy and efficiency
Several classic points-to analyses are compared and contrasted below
2.3.1.1 Flow-sensitive versus flow-insensitive analysis
A flow-sensitive analysis [48, 49] takes into account the flow of control tween program points inside a method, and computes separate solutions for areference at these points A flow-insensitive analysis [25, 32, 42] ignores the flow
be-of control and computes one solution for a reference in the entire program
Example 2.2 Consider the following code snippet:
(1) if ( ) {(2) a = new A; o2
(4) }(5) else {(6) a = new A; o6
(8) }
A flow-sensitive analysis gives the solutions for the points-to set of a at differentprogram points The points-to set of a is{o2} at the point between the statements(2) and (3) and {o6} at the point between the statements (6) and (7) In a flow-insensitive analysis, there is only one points-to set of a, which is {o2, o6}
End of example
Trang 24A insensitive analysis can be less precise but more efficient than a sensitive analysis.
flow-2.3.1.2 Context-sensitive versus context-insensitive analysis
A context-sensitive analysis [11, 36] distinguishes the contexts under which
a method is invoked, and analyses the method separately for each context Acontext-insensitive analysis [25, 32, 42] does not
Example 2.3 Consider the following program:
(1) a = new A; o1
(2) foo(a)(3)
(4) b = new A; o4
(5) foo(b)
void foo(A p) {
}
In a context-sensitive analysis, the points-to set of the reference parameter p inthe method foo is computed separately for the call sites at (2) and (5) P TW(p) ={o1} when foo is analysed for the call site (2) but P TW(p) = {o4} when foo isanalysed for the call site (5) In contrast, one can clearly see that P TW(p) ={o1, o4} as a result of a context-insensitive analysis End of example
A context-insensitive analysis can improve efficiency in memory usage andperformance at the expense of some possible precision loss
2.3.1.3 Equality-based versus subset-based analysis
Moreover, flow-insensitive analyses are either equality-based [36, 41], whichtreat an assignment as bidirectional, or subset-based [25, 32, 42], which treat anassignment as a unidirectional flow of values
Example 2.4 Consider the following assignments:
Trang 252.3.1.4 Field-based and field-sensitive analysis
A field-based analysis [25, 26] ignores the sets of objects pointed by thebases in the accesses of an instance field, considering only the field, while a field-sensitive analysis [25, 32, 42] distinguishes each instance field access by means ofthe points-to set of its base for greater precision
Example 2.5 Consider the following program:
Trang 262.3.1.5 Object naming
It is impossible for a static analysis to determine all objects created duringprogram execution A static analysis uses compile-time objects to provide a prac-tical approximation A popular approach [3, 25, 32, 42] names a compile-timeobject based on its object creation statement A compile-time object, denoted
os, represents one of any object created at the statement s Milanova et al [27]named compile-time objects based on not only object creation statements but alsothe contexts of the invoked methods that contain those statements
In our work, we rely on the object creation statements to name the objectscreated inside the analysed program and use the symbol ω to represent any objectcreated outside the analysed program
2.3.2 Rules for a whole-program points-to analysisSome recent researches [20, 25, 32, 42] have shown that the subset-based,field-sensitive, flow- and context-insensitive points-to analysis can be efficient andpractical even for large object-oriented programs This section attempts to exploredetailed aspects of this approach
Given our instruction set listed in Table 2.1, the rules for computing thepoints-to sets of a whole program W are as follows:
Rule PA1 (s:[ℓ = new C]) {os} ⊆ P TW(ℓ)
Rule PA2 (ℓ = r) P TW(r)⊆ P TW(ℓ)
Rule PA3 (ℓ = C.f ) P TW(C.f )⊆ P TW(ℓ)
Rule PA4 (C.f = r) P TW(r)⊆ P TW(C.f )
Rule PA5 (ℓ = r.f ) P TW(r.f )⊆ P TW(ℓ)
Trang 27Rule PA6 (ℓ.f = r) If ∃ ℓ′.f ∈ VW s.t P TW(ℓ)∩ P TW(ℓ′) 6= ∅, i.e., ℓ and ℓ′
are aliases (with nonempty points-to sets), then P TW(r)⊆ P TW(ℓ′.f)
Rule PA7 (s : [op(a0, , an, ℓ)]) If ∃ op(p0, , pn, r)∈ MW s.t (s, op) ∈ CW,then P TW(a0)⊆ P TW(p0), , P TW(an)⊆ P TW(pn) and P TW(r)⊆ P TW(ℓ).The points-to analysis for W consists of two steps:
• Step 1 determines the constraints for all the statements contained in W
• Step 2 solves the constraints created in Step 1 to determine the points-tosets of all references in the program W
As some new constraints can be created by Rules PA6 and PA7, the analysis
is performed iteratively until a fixed point is reached
The set intersection operation required in Rule PA6 can be expensive Inpractice, the rule is simplified and replaced by the following two rules:
Rule PA6a (ℓ.f ) If ∃ o ∈ P TW(ℓ) then P TW(o.f )⊆ P TW(ℓ.f )
Rule PA6b (ℓ.f = r) If ∃ o ∈ P TW(ℓ) then P TW(r)⊆ P TW(o.f )
Theorem 1 Let ℓ.f = r and ℓ′.f be a store statement and a field access, tively The constraint P TW(r)⊆ P TW(ℓ′.f) is created by Rule PA6 if and only if
respec-it is created by Rules PA6a and PA6b
Proof
We first prove the “⇒” part If P TW(ℓ)∩P TW(ℓ′)6= ∅, then ∃ o ∈ P TW(ℓ)∩
P TW(ℓ′) This implies that o ∈ P TW(ℓ) and o ∈ P TW(ℓ′) By Rule PA6b,
P TW(r) ⊆ P TW(o.f ) By Rule PA6a, P TW(o.f ) ⊆ P TW(ℓ′) Hence, P TW(r) ⊆
P TW(ℓ′.f)
Next, we prove the “⇐” part As the constraint P TW(r) ⊆ P TW(ℓ′.f) iscreated by Rules PA6a and PA6b, then ∃ o ∈ (P TW(ℓ) ∩ P TW(ℓ′)) such that
Trang 28P TW(r) ⊆ P TW(o.f ) ⊆ P TW(ℓ′.f) As P TW(ℓ)∩ P TW(ℓ′) 6= ∅, Rule PA6 will
The concept of the points-to set for a field accesses such as o.f as introduced
in Rules PA6a and PA6b allows Rule PA6 to be simplified and applied moreefficiently The concept is also used to construct some relations between objectssuch as reachability [40] and encapsulation [8] In our work, we aim to determinethe points-to sets of references and examine the applications in these sets Hence,
we describe our rules according to Rule PA6 instead of Rules PA6a and PA6b
2.4 Incomplete program
The whole-program points-to analysis requires the entire program, i.e allits classes and methods, to be available at the time of analysis An incompleteprogram includes only a subset of these classes In addition, some methods in aclass (e.g., native methods) may be unavailable to participate in the static analysis.Let us define precisely the set of incomplete programs that can be handled by ourwork
Definition 3 Let LF be a set of classes containing a set of methods MF and aset of fields FF Let η ⊆ MF be a set of entry methods that may be invoked fromoutside LF An incomplete program, F , is a pair F =hLF, ηi such that
1 all classes in LF except the root class java.lang.Object have all theirsuperclasses in LF,
2 there is not a reference type whose corresponding class is not in LF,
3 there is not a read/write to a field not in FF, and finally,
4 every call site can be statically resolved to at least one method in MF
Trang 29The conditions stated in this definition guarantee that all field accesses andcall sites in F can be statically resolved to some declared fields and methods in F ,respectively Despite these conditions, our work is applicable to library modules
or applications supporting native methods and dynamic class loading
A method is called externally reachable when it is in η or may be invokeddirectly or indirectly from an externally reachable method If the code of anexternally reachable method is unavailable to participate in the static analysis,
we assume conservatively that it may invoke all methods that it can possiblyinvoke according to the language rules of Java [22] Let RMF ⊆ MF be the set
of externally reachable methods in F We use IMF ⊆ RMF to denote the set ofall analysed methods whose code is available and analysed by the static analysisand define EMF = RMF\ IMF as the set of unanalysed methods An unanalysedmethod can be a native method or a method that is requested explicitly by theuser not to be involved in the analysis Let UF symbolise the unknown code, i.e.,the code in EMF and the code outside F The methods in MF\ RMF are ignoredbecause they are unreachable
SF and VF denote the set of statements and the set of references in IMF,respectively Let Fr
2.5 Accessibility of classes, methods and fields
In Java [22], classes are contained in packages and they are accessible bythe code outside the containing packages only if the packages are accessible Theaccessibility of packages depends on the policies of the platform on which Javaprograms run As the policies vary from platform to platform, we assume in our
Trang 30work that all packages are accessible Hence, classes or their members accessiblefrom outside a containing package are also accessible by the unknown code UF.There are three access modifiers, i.e., public, protected, and private, torepresent the four levels of accessibility: public, protected, package, and private.The modifier denotes the corresponding level of accessibility and in the absence
of a modifier, the package level is assumed Another modifier associated withclasses, methods and fields is final with different semantics A final classcannot be extended, i.e cannot have any subclass while a final method cannot beoverridden A final field is initialised only once in the initialiser or constructorsdeclared in the same class of the declared field
With respect to UF, we define two levels of accessibility: accessible andinaccessible A class in LF is accessible if the class type corresponding to the classcan appear in UF and inaccessible otherwise A method in RMF is accessible if
it can be invoked in UF and inaccessible otherwise A field in FF is accessible if
it can be accessed by some statements in UF and inaccessible otherwise
As UF can be inside or outside a package where the analysed classes, ods and fields are declared, we state the conditions for their accessibility in UF
meth-according to whether UF is inside or outside the package
A class is called a nested class when it is declared within the body of anotherclass A top level class is a class that is not a nested class A top level class isaccessible if it is public or there is UF inside the package where the class isdeclared
There are three kinds of nested classes: member classes, local classes andanonymous classes We will discuss the accessibility of a local or anonymous classand its members later in this section A member of a class, which is a memberclass, a method, or a field, is associated with one of the four levels of accessibility
As the conditions for the accessibility of member classes and fields are the same,
Trang 31we first discuss the accessibility of member classes and fields together and thenmethods In each case, we first present the rules to determine the accessibility of amember with respect to UF that is located inside the package where the member
is declared and then the rules when UF is outside the package
Table 2.2 determines whether a member class or a field m declared in theenclosing class C is accessible or not by the unknown code UF that resides in thesame package where m is declared
Accessibility Level of
The accessibility of class or field m
of enclosing accessibility of
class C class or field m
accessible non-private accessible
accessible private accessible if UF is in the top level
class where m is declared
inaccessible non-private accessible if there exists an accessible
subclass of C that inherits minaccessible private inaccessible
Table 2.2: The accessibility of a class or a field in a package with respect to theunknown code UF that resides inside the same package
If the enclosing class C is accessible by UF in the package, then m, which
is a non-private member, is also accessible However, a private member m is onlyaccessible if UF is contained anywhere in the top level class that contains m If C
is inaccessible, m is inaccessible if it is private Meanwhile, when m is non-private,
it can be accessible by UF if C has an accessible subclass that inherits m
Example 2.6 Consider the following program, where the enclosing class, i.e., B,
is inaccessible but its non-private member, i.e., f, is accessible by the unknowncode that resides in the same package
Trang 32//unknown codeA.C.f
}
Actually, the class B is private so it is only accessed in A The unknown code UF
in D cannot access B but it can access C because the level of accessibility of C ispackage As C inherits the field f from B, the field access A.C.f, which refers to
f in B, may appear in UF However, the field access A.C.g in UF will cause acompile-time error because g cannot be accessed outside A End of example
Table 2.3 determines whether a member class or a field m in the enclosingclass C in a package is accessible or not by UF that resides outside the package.Accessibility Level of
The accessibility of class or field m
of enclosing accessibility of
class C class or field m
accessible public accessible
accessible protected accessible if C is non-final
accessible package inaccessible
accessible private inaccessible
inaccessible public accessible if there exists an accessible
subclass of C that inherits minaccessible protected accessible if there exists a non-final
accessible subclass of C that inherits minaccessible package inaccessible
inaccessible private inaccessible
Table 2.3: The accessibility of a member class or a field in a package with respect
to the unknown code UF that is located outside the package
Trang 33The accessibility Level of
The accessibility of method m
of enclosing accessibility
class C of method m
accessible non-private accessible
accessible private accessible if UF is in the top level class
where m is declaredinaccessible non-private accessible if
there exists an accessible subclass of Cthat inherits m
inaccessible private inaccessible
Table 2.4: The accessibility of a method in a package with respect to the unknowncode UF that is contained in the same package
A public member m of an accessible class C is accessible However, if m isprotected, it can be accessed only in the containing package or in a subclass of C.Hence, m is accessible by UF that resides outside the package if C is non-final andaccessible If m has package or private level visibility, it is inaccessible because itcan be accessed only in the containing package or the top level class where it isdeclared When C is inaccessible, m is generally inaccessible However, if there
is an accessible subclass of C that inherits m, a public and protected member mcan be accessed through the subclass
Example 2.7 Consider the analysed program given below:
}
The package unanalysed contains the unknown code UF The package analysed
Trang 34contains two classes A and B, where B is a subclass of A Let us consider the casewhen C is A and m is f A has package visibility, so it is inaccessible by UF inunanalysed However, as B is public, it is accessible As B inherits the field fand f is public, the field access B.f may appear in UF and it refers to the field fdeclared in class A Therefore, f is accessible End of example
Tables 2.4 and 2.5 determine whether a method m declared in the enclosingclass C in a package is accessible or not with respect to UF that is located insideand outside the package, respectively
The accessibility Level of
The accessibility of method m
of enclosing accessibility
class C of method m
accessible public accessible
accessible protected accessible if C is non-final
accessible package inaccessible
accessible private inaccessible
inaccessible public accessible
of C that inherits minaccessible protected accessible if there exists a non-final
accessible subclass of C inherits minaccessible package inaccessible
inaccessible private inaccessible
Table 2.5: The accessibility of a method in a package by the unknown code UF
that is contained outside the package
A method is a member of a class so the accessibility of a method is mined similarly to that of a member class or a field The only difference occurswhen a method m is virtual and overrides another accessible method In this case,
deter-an invocation to its overridden method cdeter-an target m
Example 2.8 This example illustrates different scenarios in which a method mdeclared in a class C in a package can or cannot be invoked by the unknown code
Trang 35in the same package.
First, let us consider the case when C is Inner1 and m is foo0 in Inner1 AsInner1 is accessible and foo0 has package visibility, foo0 is accessible Therefore,
we may have to assume conservatively that UF contains a call site x.foo0, where
x has the declared type of Inner1 This call site can invoke foo0 in Inner1 whenthe receiver x points to an instance of Inner1
Next, let us examine the case when C is Inner2 Inner2 is inaccessiblebecause it is private Therefore, no variable in Outer2 may have the type ofInner2 Despite this, if method m is foo0 in Inner2, then the method can beinvoked from UF Indeed, when the receiver x at the call site x.foo0 in UF points
to an instance of Inner2, the call site can invoke foo0 in Inner2 Note thatfoo0 in Inner2 overrides the method with the same signature declared in Inner1
In contrast, when m is foo1 in Inner2, foo1 is inaccessible because it does notoverride any method in the accessible class Inner End of example
Trang 36The conditions for a class or field in a package to be accessible by UF outsidethe package also apply to a method m The only difference is that if m is publicand declared in an inaccessible class C in a package, then it can be invokedfrom outside the package This case happens when m is virtual and overrides anaccessible method that is public Note that if the overridden method is protected,the public method m is inaccessible.
Example 2.9 Consider the following program:
package analysed;
public void foo() { }
}
public void foo() { }
}
// unknown codepackage unanalysed;
x.foo();
}
The analysed program includes only two classes: Super and Sub1 in the packageanalysed Note that Sub1 is a subclass of Super and method foo in Sub1overridesfoo in Super Assume that the unknown code UF is the package unanalysedcontaining class Sub2, which is also a subclass of Super Let us examine the casewhen C is Sub1 and m is foo in Sub1 Although Sub1 is inaccessible since Sub1 haspackage visibility, UF can invoke foo in this class Actually, as Super is accessibleand foo in Super is also accessible, a call site x.foo(), where x has the type ofSuper, may appear in UF If x points to an instance of Sub1, the call site caninvoke foo in Sub1 As a result, foo in Sub1 is accessible
However, if foo in Super is declared to be protected, a call site x.foo() in
UF must appear in a subclass of Super, i.e Sub2, and the type of x must be Sub2
or a subclass of Sub2 Under these conditions, x cannot point to an instance ofSub1 So the call site cannot invoke the method foo in Sub1 Therefore, foo in
Trang 37An anonymous class is a class without a name so it is inaccessible All classand field members of an anonymous class and its nested classes are accessible ifthere exists UF in the class A method of an anonymous class or its nested classes
is accessible in one of the following cases:
• it overrides a public method declared or inherited in an accessible class
• it overrides a non-private method declared outside the anonymous classbut the method is still in the package containing the anonymous class andthere is UF in the package
• there is UF declared within the body of the anonymous class
A local class is declared in an immediately enclosing code block and isaccessed only in the code block By replacing the anonymous class with the codeblock, the accessibility of a member is determined accessible in the similar way asthat of a member of an anonymous class
2.6 Readable and writable fields
A field in FF can only be read or written by the unknown code UF when
it is accessible In addition, in Java, a final field is initialised once in the staticinitialiser or constructors declared in the same class where the field is declared.After initialisation, the final field cannot be modified Without loss of generality,
we assume that all static initialisers and constructors in F are in the analysedprogram
Trang 38Recall that F includes the set of classes in LF RMF and FF are defined
to be the sets of the reachable methods and fields declared in LF, respectively
RMF includes the set IMF of analysed methods and the set EMF of unanalysedmethods SF and VF are the sets of the statements and references in IMF,respectively UF includes all methods in EMF or absent in F
Suppose W is a whole program containing F and UF Let F and W beanalysed by a points-to analysis technique, respectively Some constraints thatare created when W is analysed can be absent when F is analysed The constraintsrelated to a reference in VF can be missing in the following scenarios:
• The constraints between an input parameter of an unanalysed method
op ∈ EMF and a reference accessed inside op are missing because thebody of op is not available to participate in the analysis
• The call graph CF is incomplete when some caller-callee relation (s, op)
is missing, where (s, op) ∈ (CW \ CF) Therefore, some constraints
Trang 39Figure 3.1: Three kinds of missing caller-callee relations in CF.
troduced by Rule PA7 are missing (i.e., not created when F is analysed).Depending on where s and op are, the following cases are examined:
∗ s is in SF and op is not in RMF (Case 1 in Figure 3.1) Although s
is statically resolved to an overriding method in RMF, s can invoke
a method that is not in RMF This can happen when s is virtualand its target method is overridden by a method not in RMF.Example 3.1 Consider the example in Figure 3.2, where the anal-ysed program F is listed in Column (a) and some possible unknowncode UF in Column (b) The class D in UF is a subclass of class B
in F and the method foo in D overrides foo in B Therefore, the callsite a1.foo() at line 8 will invoke foo in D when a1 points to aninstance of D created at line 29 End of example
∗ s is in UF and op is in RMF (Case 2 in Figure 3.1) Due to the lack
of knowledge about UF, some invocations from UF to an accessiblemethod in RMF are not included in the call graph CF
Example 3.2 In the example given in Figure 3.2, the invocationfrom a3.foo() at line 31 in D to method foo in B cannot be repre-sented in the call graph CF End of example
∗ Both s and op are in the analysed program (Case 3 in Figure 3.1)
Trang 40Figure 3.2: An example program where Column (a) gives the analysed program
F and Column (b) shows some possible unknown code UF that can be combinedwith F to form a whole program W
The caller-callee relation (s, op) can be missing because the tion of the analysed program being a whole program is violated
assump-Example 3.3 Consider the example in Figure 3.2 Assume thatRTA is used to construct the call graph of F RTA determines thatthe call site a0.foo() in A will invoke foo declared in C because only
a single object of type C is created in F This resolution becomesinvalid when the analysed program is not a whole program Giventhe unknown code shown in Figure 3.2(b), the call site a0.foo() caninvoke foo in B because there is an object of type B created at line