Software reliability modeling and release time determination

73 Chapter 5 Sensitivity Analysis of Release Time of Software Reliability Models Incorporating Testing Effort with Multiple Change Points .... Summary This thesis aims to improve softwar

Trang 1

SOFTWARE RELIABILITY MODELING AND RELEASE TIME DETERMINATION

LI XIANG

NATIONAL UNIVERSITY OF SINGAPORE

2011

Trang 2

SOFTWARE RELIABILITY MODELING AND RELEASE TIME DETERMINATION

LI XIANG

(B Eng., UESTC)

A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

DEPARTMENT OF INDUSTRIAL AND SYSTEMS ENGINEERING

NATIONAL UNIVERSITY OF SINGAPORE

2011

Trang 3

First of all, I would like to thank my supervisor, Professor Xie Min, for his pertinent supervision and insightful suggestions throughout my research life at the university This thesis would not have been possible without Prof Xie‟s help It is my great honor to have the chance to study under his guidance

Secondly, I am very much grateful to Associate Professor Ng Szu Hui As my vice supervisor, she is always available for my questions and asking for help I have also benefited a lot from her as her teaching assistant

Thirdly, my appreciation goes to Professor Yang Bo from University of Electronic Science and Technology of China, with whom some of my research work is carried out jointly I have learned a lot from the cooperation with him

Thanks also go to faculty members, staff, seniors and juniors in our ISE department It

is my great appreciation to receive the help from you In particular, I would like to thank all the friends in ISE Computing Lab I really enjoy the time spending with all

of you!

Finally, I would like to express my unbounded gratitude towards my parents, for their unconditional love and consistent support all along the way of my study

Trang 4

Table of Contents

Acknowledgements i

Table of Contents ii

Summary vii

List of Tables ix

List of Figures xi

List of Symbols xiii

Chapter 1 Introduction 1

1.1 Background 1

1.2 Motivation 3

1.2.1 Reliability Analysis for Open Source Software 3

1.2.2 Relationship of Software Failures 4

1.2.3 Software Release Policy under Parameter Uncertainty 5

1.2.4 Formulation of Software Release Time Determination Problem 6

1.3 Objective and Scope of Research 7

Chapter 2 Literature Review 9

2.1 Analytical Software Reliability Models 9

2.1.1 The Jelinski-Moranda Model 10

2.1.2 A General Formulation of NHPP Models 11

2.1.3 Recent Advances on ASRMs 13

2.2 Data-Driven Software Reliability Models 16

Trang 5

2.3 Determination of Software Release Time 18

Chapter 3 Reliability Analysis and Optimal Version-Updating for Open Source Software 21

3.1 Basic Problem Description 21

3.2 Modeling Fault Detection Process of Open Source Software 24

3.3 Determination of Optimal Version-Update Time 28

3.3.1 Quantification of Attributes 30

3.3.2 Elicitation of Single Utility Function for Each Attribute 32

3.3.3 Estimation of Scaling Constants 33

3.3.4 Maximization of Multi-Attribute Utility Function 35

3.3.5 Summary of the Procedure 35

3.4 Numerical Examples 36

3.4.1 The Data Sets 37

3.4.2 Reliability Assessment for Open Source Software 39

3.4.3 A Decision Model Application Example 43

3.4.4 Sensitivity Analysis 47

3.5 Conclusion 50

Chapter 4 Performance Improvement for DDSRMs 53

4.2 A Brief Review of SVM for Regression 58

4.3 A Generic DDSRM with a Hybrid GA-Based Algorithm 62

4.4.1 Example I 69

Trang 6

4.4.2 Example II 71

4.5 Conclusion 73

Chapter 5 Sensitivity Analysis of Release Time of Software Reliability Models Incorporating Testing Effort with Multiple Change Points 75

5.2 General Model Incorporating Testing Effort 77

5.3 Approaches to Sensitivity Analysis 79

5.3.1 One-Factor-at-a-Time Approach 79

5.3.2 Sensitivity Analysis through DOE 80

5.3.3 Global Sensitivity Analysis 83

5.4 An Illustrative Example 88

5.4.1 Results from One-Factor-at-a-Time Approach 88

7.4.2 Results from Sensitivity Analysis through DOE 90

5.4.3 Results from Global Sensitivity Analysis 92

5.5 Limitations of Different Approaches 94

5.6 Interval Estimation from Global Sensitivity Analysis 97

5.7 Conclusion 99

Chapter 6 A Risk-Based Approach for Software Release Time Determination with Delay Costs Considerations 100

6.1 Quantifying Parameter Uncertainty 101

6.2 Model Formulation 104

6.2.1 Risk Considerations 105

6.2.2 Cost Considerations 107

Trang 7

6.3 The Decision Model Based on MAUT 109

6.3.1 Quantification of Attributes 111

6.3.2 Elicitation of Single Utility Function for Each Attribute 112

6.3.3 Estimation of Scaling Constants 114

6.3.4 Maximization of Multi-Attribute Utility Function 115

6.3.5 Summary of the Procedure 116

6.4 An Illustrative Example 117

6.4.1 The Data Set 117

6.4.2 The Determination of Optimal Risk-Based Release Time 118

6.4.3 Illustration of the Proposed Decision Model 122

6.4.4 Sensitivity Analysis 124

6.5 A Simplification of the Decision Model 127

6.6 Threats to Validity 132

6.7 Conclusion 136

Chapter 7 Multi-Objective Optimization Approaches to Software Release Time Determination 138

7.2 Model Formulation for Release Time Determination 139

7.3 Multi-Objective Optimization Approaches 144

7.3.1 The Trade-Off Analysis 144

7.3.2 Multi-Attribute Utility Theory 145

7.3.3 Physical Programming Method 147

7.4.1 Example I 151

Trang 8

7.4.2 Example II 157

7.5 Applicability and Limitations of Different Approaches 161

7.6 Conclusion 163

Chapter 8 Conclusions 164

8.1 Research Results and Contributions 164

8.2 Future Research 167

References 169

Trang 9

Summary

This thesis aims to improve software reliability modeling of software failure process, and to study its corresponding release time determination problem These objectives are achieved by extending traditional software reliability models and decision models Research has been conducted as follows

Software reliability models can be classified into two categories: analytical software reliability models (ASRMs) and data-driven software reliability models (DDSRMs) Both of them are studied in this thesis In particular, an extension on ASRMs is presented in Chapter 3 In this chapter, the modeling framework for open source software reliability is introduced, and the corresponding version-updating problem is studied as well

Besides the research on ASRMs, improvement on DDSRMs is also carried out as shown in Chapter 4 In most existing research on DDSRMs, it is generally assumed that the current failure is correlated with the most recent consecutive failures However, this assumption restricts the failure data analysis into a special case In order to relax this unrealistic assumption, a generic DDSRM is developed with model mining technique The proposed model can greatly enhance the prediction accuracy

Developing models is not the ultimate goal of software reliability modeling It is more important to apply these models to solve corresponding decision-making problems, and software release time determination is a typical application In Chapter 5,

Trang 10

sensitivity analysis of release time of software reliability models incorporating testing effort with multiple change points is studied Sensitivity of the software release time

is investigated through various methods, including one-factor-at-a-time approach, design of experiments and global sensitivity analysis

Although the use of sensitivity analysis can help to find out what significant parameters are and more attention can be paid for them, it is also quite possible that

no more data or information is available for us to obtain more accurate estimates of parameters Therefore, in Chapter 6, the effect of parameter uncertainty on release time determination is investigated A risk-based approach is proposed for release time determination with delay cost considerations It can help management have a boarder view of the release time determination problem

Furthermore, for software release time determination problem, most existing research formulates it as single objective optimization problems However, these formulations can hardly describe the management‟s attitude accurately Therefore, multi-objective optimization model is developed for release time determination problem in Chapter 7

In order to solve this multi-objective optimization problem, different multi-objective optimization approaches, including trade off analysis, multi-attribute utility theory, and physical programming, are used and compared in this chapter By comparing these approaches, management can apply them more appropriately in practice considering their own unique properties

Trang 11

List of Tables

Table 3.1 Detected faults in Apache official public releases 38

Table 3.2 Detected faults in GNOME official public releases 39

Table 3.3 Estimated parameter values and numerical results for Apache 42

Table 3.4 Estimated parameter values and numerical results for GNOME 42

Table 3.5 Results from sensitivity analysis 49

Table 4.1 Sample patterns used in model training process 56

Table 4.2 Software failure data taken from Pham and Pham (2000), Tian and Noore (2005b), Su and Huang (2007) 70

Table 4.3 Model mining result and optimal values of parameters of SVM-based SRM with Gaussian kernel function, using software failure data in Table 4.2 70

Table 4.4 Software failure data reported in Wood (1996) 72

Table 4.5 Model mining result and optimal values of parameters of SVM-based SRM with Gaussian kernel function, using software failure data in Table 4.4 72

Table 5.1 A saturated Resolution III fractional factorial design 81

Table 5.2 Some numerical results from one-factor-at-a-time approach 89

Table 5.3 Fractional factorial design 90

Table 5.4 Main effects of parameters 91

Table 5.5 Results of the first-order sensitivity indices 93

Table 5.6 Comparison of computation resources needed 96

Table 6.1 Sensitivity analysis results given different parameters 126

Table 7.1 Numerical results based on the 5-hour interval estimation 153

Table 7.2 Numerical results from sensitivity analysis under the change of w R 155

Trang 12

Table 7.3 Boundary points of class functions 156

Trang 13

List of Figures

Figure 3.1 Two choices for the determination of the scaling constant

i

w 34

Figure 3.2 The structure of the decision model for the determination of optimal version-update time 36

Figure 3.3 The multi-attribute utility function given different release times 46

Figure 4.1 The processes of using a DDSRM for software reliability modeling and prediction 55

Figure 4.2 The soft margin loss setting for a linear SVM regression (Scholkopf and Smola, 2002) 60

Figure 4.3 Interpretation of a binary code in model mining 64

Figure 4.4 A hybrid GA-based algorithm to determine the time lag terms to be used and the optimal parameters of SVM-based SRM with Gaussian kernel function 65

Figure 4.5 The processes of using the proposed DDSRM 66

Figure 5.1 Main effects of parameters (absolute value) 91

Figure 5.2 Results of first-order sensitivity indices in a descending manner 93

Figure 6.1 An illustrative example of the distribution of the optimal release time T given a reliability requirement 106

Figure 6.2 Two choices for the determination of the scaling constant w1 115

Figure 6.3 The structure of the decision model for the determination of optimal risk-based release time 117

Figure 6.4 Multi-attribute utility function given different release times 122 Figure 6.5 Determination of the optimal risk-based release time under the simplified

Trang 14

Figure 7.1 Relationship between E C (t) and R x t 140Figure 7.2 Qualitative meaning of Class 1-S and Class 2-S (Messac, 1996) 148Figure 7.3 Non-dominated points of the consequence space with reliability and cost 152Figure 7.4 Two choices for the determination of the weighting factor for reliability154Figure 7.5 Non-dominated points of the consequence space with reliability, cost and risk 158

Trang 15

List of Symbols

ANN artificial neural network

ASRM analytical software reliability model

CDF cumulative distribution function

DDSRM data-driven software reliability model

DOE design of experiment

GA genetic algorithm

LSE least square estimation

MAUT multi-attribute utility theory

MLE maximum likelihood estimation

MSE mean square error

NHPP non-homogeneous Poisson process

OSS open source software

SRM software reliability model

SRGM software reliability growth model

SVM support vector machine

t) ( )( 

c the expected set-up cost for software testing

c the expected cost per unit testing time

Trang 16

E expected total cost of software development

I Fisher information matrix

L the likelihood function

r risk function which measures the risk that software cannot meet its

reliability requirement due to parameter uncertainty )

|

(x t

R software reliability at time x after it has been tested for t unit of time

0

R software reliability requirement from customers

t release time of the software

Tˆ mean value of release time based on the reliability requirement R 0

Trang 17

Chapter 1 Introduction

With the rapid increase of applications of computer systems in industries as well as in our daily life, the reliability of computer systems has become a crucial issue Since computer systems are also widely used in safety-critical systems such as control systems in nuclear power plants or in medical instruments, the need for high reliability is even more urgent

Computer systems are generally composed of hardware and software, and therefore ensuring high reliability of the system involves investigating reliability of both hardware and software Unfortunately, unlike hardware reliability assurance which has been well developed and widely applied in various industries, software reliability

is still a relatively new field, and it is generally more difficult to ensure (Xie, 1991) Also, the rapid increase of software size and complexity imposes many challenges to achieve high reliability of software products

1.1 Background

As a matter of fact, software has become the major source of reported outages, and billions of dollars has been wasted each year (Lyu, 1996) The following are some famous examples in recent years (Charette, 2005): in 2001, software problems with supply-chain management system contributed to $100 million (USD) loss to the Nike

Trang 18

Inc.; in 2002, McDonald‟s Corp canceled the Innovate information-purchasing system after $170 million (USD) was spent; in 2004, Hewlett-Packard Co lost $160 million (USD) due to the software problems with ERP system and Ford Motor Co suffered a loss of approximately $400 million (USD) deployment cost from abandoning the purchasing system It is therefore not surprising that software reliability engineering (SRE) has received lots of attention, and abundant research has been carried out recently

To ensure the reliability of software, software needs to be tested prior to its release This testing phase is time-consuming and costly During this phase, the latent software faults are identified, isolated and removed As a result, software reliability is improved Based on the failure data obtained from the testing phase, software reliability can be measured and predicted with appropriate software reliability models (SRMs) (Musa et al., 1987; Xie, 1991; Lyu, 1996; Pham, 2000)

The mainstream of software reliability modeling can be classified into two categories: the analytical approach and the data-driven approach (Hu et al., 2007) Analytical software reliability models (ASRMs) are generally based on certain prior assumptions made on the nature of software faults and the stochastic behavior of software failure process These assumptions include equal fault sizes, perfect debugging, immediate fault repair, independent software failures, etc Although these assumptions may not

be valid in practice, they are made to facilitate software reliability modeling (Musa et al., 1987; Xie, 1991; Lyu, 1996; Pham, 2000)

Trang 19

As to the data-driven approach to software reliability modeling, the software failure process is viewed as a time series Data-driven software reliability models (DDSRMs) are constructed to recognize the inherent patterns of the process which are carried by the recorded failure data By modeling and analyzing the inherent patterns of software failure process, software reliability prediction can be made (Hu et al., 2007)

1.2 Motivation

SRMs are successfully applied in many real world projects, and there are more and more companies adopt the knowledge in software reliability engineering in practice (Wood, 1996; Musa, 2006) However, for both ASRMs and DDSRMs, there are still some assumptions that can be relaxed to better describe the software failure process

In addition, constructing models is not the end, to guide management when to release software is a typical application of these models For this software release time determination problem, it is still an open question on how to describe management‟s attitude more accurately Due to these considerations, research in this thesis is conducted by investigating the following specific topics

1.2.1 Reliability Analysis for Open Source Software

Recently, a new style of software development process, the open source software (OSS) movement has received intensive interests (Raymond, 2001) OSS process is a relatively new way of building and deploying large software systems on a global basis, and differs in many interesting ways from the principles and practices of

Trang 20

traditional software engineering (Feller et al., 2005) There is widespread recognition that open source projects can produce high quality and sustainable software systems (such as Linux operating system, Apache web server, and Mozilla browser) that can

be used by thousands to millions of end-users (Mockus et al., 2002) Currently, most OSS system is developed and maintained by non-commercial communities However, more and more software companies have switched from a closed source to an open source development model in order to win market share and to improve product quality (Hertel et al., 2003)

Since OSS is usually developed outside companies – mostly by volunteers – and the development method is quite different from the standard methods applied in commercial software development, the quality and reliability of the code needs to be investigated (Gyimothy et al., 2005) However, most existing research works have been focusing on the study of fault-proneness detection and defect prediction of OSS, which are essentially indirect reliability measurements without consideration of time effect In fact, only in some recent studies by Tamura and Yamada (2008; 2009), such issue is considered However, in their work, the differences between traditional commercial software and OSS are not highlighted This motivates us to further investigate this problem by incorporating special properties of OSS into the analysis

1.2.2 Relationship of Software Failures

Existing research on data-driven approach to software/system reliability modeling and prediction generally assumes that a failure is strongly correlated with the most recent several failures; thus the sliding window technique has been adopted to describe this

Trang 21

relationship However, this assumption restricts the general time series analysis to a special case as the correlation may be quite complicated in a time series (Tsay, 2002)

In fact, it is possible that a failure is correlated with some of previous failures, not necessarily being the most recent ones For example, a failure, x , could be correlated i

with, say, x i8, x i6, and x i2 If this is the case, these three time lag terms should be used as model inputs instead of using x i3, x i2, and x i1 Obviously, there should

be a systematic way to discover the correlation among failures, which enables the model user to decide appropriate time lag terms to be used in the model, and hence the model performance can be improved

1.2.3 Software Release Policy under Parameter Uncertainty

Software release time determination problem is of great importance in software development Most existing research on this problem has been based on the assumption that parameters of software reliability models are known or accurately estimated However, these model parameters are unknown in nature They are generally estimated based on the limited amount of recorded failure data Hence, the accuracy of the optimum release time obtained is questionable It is necessary for management to know what the significant parameters are, and sensitivity analysis is needed

Trang 22

analysis Also, the so called robust optimization which considers the uncertainty of parameters has been received a lot of research attention recently (Ben-Tal and Nemirovski, 2002; Sahinidis, 2004) Previous research has demonstrated that parameter uncertainty cannot be discarded in the modeling and analysis This also motivates us to study the optimal software release policy under parameter uncertainty

1.2.4 Formulation of Software Release Time Determination Problem

For software release time determination problem, reliability and cost are two important dimensions that are generally considered In order to determine an optimal software release time, existing research formulates this problem in the following three ways: (1) cost minimization (Boland and Singh, 2003; Morali and Soyer, 2003; Xie and Yang, 2003; Huang and Lyu, 2005a), (2) cost minimization given a reliability constraint (Yamada and Osaki, 1985; Pham, 1996; Pham and Zhang, 1999; Huang, 2005; Boland and Chuiv, 2007), and (3) reliability maximization under a cost budget (Leung, 1992) It can be seen that software release time determination problem is formulated as single-objective optimization problems However, this kind of formulation can hardly describe the management‟s attitude accurately In reality, maximizing reliability and minimizing cost is expected to be considered simultaneously, and a compromise should be made between these two objectives based on management‟s preference This motivates us to develop a new formulation for software release time determination problem, such that a more reasonable decision can be made

Trang 23

1.3 Objective and Scope of Research

The objective of this thesis is to develop comprehensive and practical models for software reliability analysis and software release time determination Both ASRMs and DDSRMs are extended considering the practical issues involved in software reliability modeling More specifically, in the framework of ASRMs, a model for open source software (OSS) is developed by incorporating the special properties of OSS; in the framework of DDSRMs, a generic model is proposed by relaxing the basic assumption in most existing DDSRMs

Besides the modeling part of software failure process, software release time determination problem, as a typical application of SRMs, is investigated as well Sensitivity analysis of release time is introduced as a way to deal with parameter uncertainty In particular, sensitivity of the software release time is investigated through various methods, including one-factor-at-a-time approach, design of experiments and global sensitivity analysis By comparing different approaches, applicability and limitations of them will be shown

However, sensitivity analysis can only identify significant parameters It is still imperative to investigate the release policy under parameter uncertainty Theoretically, it can be shown that there is about 50% risk that software reliability requirement cannot be met when the mean value is used This is because model parameters are unknown in nature, and they are estimated based on the limited amount of data Provided that the 50% risk can be too high to be acceptable for management, software release policy under parameter uncertainty is studied

Trang 24

Furthermore, for release time determination problem, most existing research formulates it as single-objective optimization problems, which can hardly describe the decision process accurately Therefore, multi-objective optimization models are developed for software release time determination problem, and different multi-objective optimization approaches are adopted for analysis

The remainder of this thesis is organized as follows Chapter 2 provides a general review on software reliability modeling and the corresponding release time determination problem In Chapter 3, reliability analysis and optimal version-updating for open source software is studied Chapter 4 discusses the proposed generic data-drive software reliability model with model mining technique Chapter 5 discusses the sensitivity analysis of release time of software reliability models incorporating testing effort with multiple change-points Chapter 6 highlights the risk that software cannot meet its reliability requirement due to parameter uncertainty Also, a risk-based approach for release time determination with delay costs considerations is introduced Chapter 7 formulates the software release time determination problem as multi-objective optimization problems, and different multi-objective optimization approaches are compared Chapter 8 concludes current research works and looks at future research prospects

Trang 25

Chapter 2 Literature Review

Software reliability modeling is of great importance for the reason that it can measure the reliability of software quantitatively by analyzing the recorded failure data Due to this, a large number of software reliability models have been proposed and published

in the literature (Musa et al., 1987; Xie, 1991; Lyu, 1996; Pham, 2000) In this chapter, a brief review on software reliability modeling is given, focusing on the two general categories: analytical software reliability models (ASRMs) and data-driven software reliability models (DDSRMs) (Hu et al., 2007) In addition, software release time determination, as a typical application of software reliability models, is briefly reviewed as well

2.1 Analytical Software Reliability Models

Analytical software reliability models (ASRMs) are generally based on certain prior probabilistic assumptions made on the stochastic behavior of software failure process, such as the Markov process assumption and non-homogenous Poisson process (NHPP) assumption It is worth noting that most of the Markov models are times-between-failures models and almost all NHPP models are failure-count models according to the classification system proposed by Goel (1985) In the following sub-sections, the Jelinski-Moranda model and a general formulation of NHPP models will

Trang 26

be briefly introduced In addition, some recent advances on ASRMs will be discussed

as well

2.1.1 The Jelinski-Moranda Model

From a historic point of view, the Jelinski-Moranda model (Jelinski and Moranda, 1972) has a paramount influence on software reliability modeling It is the first published Markov model and the main assumptions of this model are:

(1) The number of initial software faults is an unknown but fixed constant

(2) A detected fault is removed immediately and no new faults are introduced

(3) Times between failures are independent, exponentially distributed random quantities

(4) Each remaining software fault contributes the same amount to the software failure intensity

In the Jelinski-Moranda model, let N denote the number of initial faults in the 0

software before the testing starts; the initial failure intensity is then N0, where  is

a constant denoting the failure intensity contributed by each fault Let T , i

i  denote the time between (i-1)th and ith failures, then T ‟s are i

independent, exponentially distributed random variables with parameter

)]

1([ 0 

 N i

 , i1,2,,N0 (2.1)

Trang 27

It is obvious that the failure intensity is constant between the detection of two consecutive failures This is quite reasonable since the software is unchanged between the detection of two consecutive failures and the testing process is random and homogeneous However, the assumption that software faults are of the same size contributing the same amount to software failure intensity is not realistic In order to relax this unrealistic assumption, some extensions of Jelinski-Moranda model were made, see, e.g., Schick and Wolverton (1978), Shanthikumar (1981), Xie (1990), Chang and Liu (2009) However, due to the complexity of these models, they have not been widely applied in practice compared with the NHPP models, which will be discussed in the following section

2.1.2 A General Formulation of NHPP Models

NHPP models form a major part of analytical software reliability modeling In these models, the underlying software fault detection process is assumed to be a non-homogeneous Poisson process As software faults are detected, isolated and removed, the software being tested becomes more reliable with a decreasing failure intensity function In general, an NHPP software reliability growth model (SRGM) can be developed by solving the following differential equation (Pham, 2003):

 ( ) ( )

)()(

t m t a t b dt

t

dm   , (2.2)

where m(t) is the mean value function of detected faults, a(t) and b(t) are fault content

function and failure detection rate function respectively It can be seen that the idea

Trang 28

relation between failure intensity and the number of remaining faults is studied Given

different expressions and explanations of a(t) and b(t), different NHPP SRGMs can be

obtained (Zhang and Pham, 2000)

Specifically, whena(t)a, b(t)b, the important Goel-Okumoto (GO) model is received (Goel and Okumoto, 1979) This model has strongly influenced the development of many later models Actually, many later NHPP models are modifications or generalizations of this model It should be noted that in the Goel-Okumoto model, both two model parameters are positive, and there are some physical

meanings of them In particular, a represents the number of faults to be eventually detected, and b denotes the failure detection rate per fault

The Goel-Okumoto model was successfully applied in many projects as reported in Wood (1996) However, it was sometimes observed that the curve of the cumulative number of faults is S-shaped The reason for the S-shaped behavior is the “learning” effect of the debugging process (Yamada et al., 1984) To consider this issue, the delayed S-shaped NHPP model (Yamada et al., 1983; 1984) and the inflected S-shaped NHPP model (Ohba, 1984) were proposed For these two models, we still havea(t)a The only difference is the failure detection rate function Specifically,

)1

Trang 29

2.1.3 Recent Advances on ASRMs

Based on the above discussions, it can be seen that different assumptions indicate different descriptions of the software failure process However, it is worth noting that the underlying software failure process can hardly be described precisely, and assumptions are made to develop a model for the sake of mathematical tractability (Goel, 1985) It is obvious that these assumptions are in most cases not valid and cannot be made in some practical applications Thus, relaxing these assumptions has drawn a lot of research attention, and most recent advances on ASRMs were focused

on a better description of the software failure process and more reasonable software reliability analysis

Fault Correction Process

Most existing SRGMs assume that faults are immediately removed when failures are detected, i.e., the repair time is ignored Although this assumption provides simplicity and mathematical tractability for the modeling of the software failure process, it is usually not the case In reality, the fault removal activity rarely occurs immediately after the observation of failure, and the time needed to correct the fault cannot be ignored

Schneidewind (1975) first modeled the fault correction process following the fault detection process with a constant time delay However, a constant time delay assumption may not be appropriate since faults cannot be corrected with the same amount of testing effort in reality For example, based on the empirical study of nearly

Trang 30

200 anomalies from seven NASA spacecraft systems, it was found that some anomalies are in need of multiple corrections (Lutz and Mikulski, 2004) With the consideration of this, some extensions are made under the framework of the model proposed by Schneidewind (1975) Xie and Zhao (1992) substituted the constant time delay with a time dependent delay function in their model, with the assumption that detected faults become harder to be corrected as the testing proceeds Schneidewind (2001) assumed that the time delay is an exponentially distributed random variable Recently, Xie et al (2007) carried out a comprehensive study of the time delay issues with different kinds of distributions Wu et al (2007) discussed the parameter estimations of the combined model Moreover, Huang and Lin (2006) incorporated both fault dependency and debugging time lag into the modeling

The models discussed above incorporated the correction process into analysis by introducing a time delay function In fact, there also exist other alternative ways More specifically, Bustamante and Bustamante (2002) proposed a software reliability model which represents the software failure process with a non-homogeneous Poisson Process and the correction process with a multinomial distribution Gokhale et al (2004; 2006) modeled both fault detection process and correction process with a non-homogeneous Markov chain, where different fault removal policies are studied by different forms of the fault removal rate Lo and Huang (2006) proposed a general framework for modeling these two processes with assumption that the mean number

of faults corrected in a very small time interval is proportional to the mean number of detected but not yet corrected faults remaining in the system Huang and Huang (2008) introduced the use of finite and infinite server queueing models to describe

Trang 31

these two processes, and the correction process is studied by cumulative distribution function of failure correction time

However, most of the existing models considering both of these processes assume that the failure rate at current time is proportional to the number of remaining undetected faults In fact, since the fault removal activity is considered, this assumption no longer

holds, and it is more reasonable to assume that the failure rate at time t is proportional

to the number of remaining uncorrected faults (Hwang and Pham, 2009)

Incorporation of Testing Effort

In recent years, incorporating testing effort into software reliability growth models (SRGMs) has received a lot of attention, probably because testing effort is an essential process parameter for management Huang et al (2007) showed that logistic testing effort function can be directly incorporated into both exponential-type and S-type non-homogeneous Poisson process (NHPP) models, and the proposed models were also discussed under both ideal and imperfect debugging situations Kapur et al (2007) discussed the optimization problem of allocating testing resources by using marginal testing effort function (MTEF) Later, Kapur et al (2008a) studied the testing effort dependent learning process, and faults were classified into two types by the amount of testing effort needed to remove them In addition, some research incorporated change-point analysis in their models as the testing effort consumption may not be smooth over time (Huang, 2005; Kapur et al., 2008b; Lin and Huang, 2008) Moreover, as constructing model is not the end, the optimal release time

Trang 32

problem considering testing effort was also discussed (Yamada et al., 1993; Huang and Kuo, 2002; Huang and Lyu, 2005a; Lin and Huang, 2008)

However, most of the research assumes that parameters of the proposed models are known In fact, there always exist estimation errors as parameters in testing effort function and SRGMs are generally estimated by least squares estimation (LSE) and maximum likelihood estimation (MLE) methods respectively It is necessary to conduct the sensitivity analysis to determine which parameter may have significant influence to the software release time This is even more important when there are an increasing number of parameters involved in the model

2.2 Data-Driven Software Reliability Models

In data-driven approach, the software failure process is viewed as a time series, and data-driven software reliability models (DDSRMs) are constructed to recognize the inherent patterns of the process which are carried by the recorded failure data By modeling and analyzing the inherent patterns of software failure process, software reliability prediction can be made The main advantage of DDSRMs is that they do not require restrictive assumptions on software faults or software failure process; thus they may have better applicability across different software projects compared with traditional ASRMs

Machine learning techniques like artificial neural networks (ANNs) and support vector machines (SVMs) have been successfully applied for constructing DDSRMs

Trang 33

For ANNs, both feed-forward neural networks and recurrent neural networks were used and compared in software reliability analysis (Karunanithis et al., 1992; Sitte, 1999) Later, Cai et al (2001) investigated the effectiveness of the use of ANNs in software reliability prediction, and found that ANNs‟ performance is highly dependent on the „smooth‟ trend of the data Ho et al (2003) revisited the connectionist model with a modified Elman recurrent neural network, which outperforms both of the Jordan model and feed-forward model Tian and Noore (2005a) used genetic algorithm (GA) to optimize the number of the delayed input neurons and the number of neurons in the hidden layer of the neural network architecture Su and Huang (2007) developed a dynamic weighted combinational model for software reliability prediction, and the results showed that the proposed model has a fairly accurate prediction capability Hu et al., (2007) applied recurrent neural networks to model both the fault detection process and the fault correction process in software testing, and the authors proposed a GA-based networks configuration approach

Besides ANNs, another machine learning technique that has emerged as a promising modeling paradigm is support vector machines (SVMs), which have good generalization capability due to the structural risk minimization principle used (Vapnik, 1995; Vapnik, 1999; Kecman, 2001; Scholkopf and Smola, 2002) SVMs have been successfully applied in many domains such as pattern recognition, time series forecasting, diagnostics, robotics, and process control In software reliability modeling and prediction domain, SVM-based SRMs have been proposed and studied

as well Tian and Noore (2005b) proposed an SVM-based modeling approach to software reliability prediction, and experimental results showed that the proposed

Trang 34

approach adapts well across different software projects and has higher next-step prediction accuracy compared with feed-forward ANN and recurrent ANN modeling approaches Pai and Hong (2006) proposed an SVM-based SRM which uses simulated annealing (SA) algorithms to optimize model parameters

However, most existing research on data-driven approach to software reliability modeling and prediction generally assumes that a failure is strongly correlated with the most recent several failures This assumption restricts the general time series analysis to a special case as the correlation may be quite complicated in a time series (Tsay, 2002) There should be a systematic way to discover the correlation among failures, which enables the model user to decide appropriate time lag terms to be used

in the model, and hence the model performance could be improved

2.3 Determination of Software Release Time

Constructing software reliability models is not the end It is almost always the case that the model is developed to help management make some decisions A typical purpose is to guide management on when to release/sell the software in the market Since Okumoto and Goel (1980) firstly proposed the determination of software release time problem in 1980, many research works have been done in the past several decades

Koch and Kubat (1983) introduced the penalty cost into the release time determination model Yamada and Osaki (1985) proposed a decision-making model,

Trang 35

where both reliability and cost are considered In particular, their model was developed to minimize the cost subject to a reliability constraint Dohi (1999) transformed the optimal software release time problem into a time series prediction problem, and the artificial neural network (ANN) was employed Nishio and Dohi (2003) presented the determination of the optimal software release time based on proportional hazards software reliability growth model Huang and Lyu (2005) proposed the optimal release time policy for software systems considering cost, testing-effort, and test efficiency, which enriched the decision model Xie and Yang (2003) and Boland and Chuiv (2007) considered the optimal software release time when repair is imperfect Chiu (2009) proposed a Bayesian method to determine the optimal release time for software systems based on experts‟ prior judgments

It is worth noting that the uncertainty involved in the determination of optimal release time has received special attention recently (Yang et al., 2008; Ho et al., 2008) It has been pointed out that the point estimate received from the traditional way is not precise as the software debugging process is essentially random Yang et al (2008) introduced a risk-control approach to obtain the optimal release time by quantifying the uncertainty in the actual cost of the project by variance Ho et al (2008) determine the optimal release time by considering the randomness of the mean value function, and the randomness is assumed to stem from the error-detection process However, the optimal release policy considering the parameter uncertainty is still lacking

Furthermore, for the determination of optimal software release time, reliability and cost are the two important dimensions that are generally considered It should be noted that most existing research formulates this decision process as single-objective

Trang 36

optimization problems Although these formulations can greatly reduce the complexity, they can hardly reflect the nature of the decision process, which is essentially a multi-objective optimization problem More specifically, maximizing reliability and minimizing cost should be achieved simultaneously

Trang 37

Chapter 3 Reliability Analysis and Optimal

Version-Updating for Open Source Software

3.1 Basic Problem Description

Open source software (OSS) development is a new way of building and deploying large software systems on a global basis, and it differs in many interesting ways from the principles and practices of traditional software engineering (Raymond, 2001) There is a widespread recognition across software industry that open source projects can produce software systems of high quality and functionality, such as Linux operating system, Apache web server, Mozilla browser, MySQL database system, etc., that can be used by thousands to millions of end-users (Mockus et al., 2002)

The OSS development is based on a relatively simple idea: the original core of the OSS system is developed locally by a single programmer or a team of programmers Then a prototype system is released on the internet, so that other programmers can freely read, modify and redistribute that system‟s source code The evolution process

of OSS is much faster than the closed source project The reason is that in the development of OSS, tasks are completed without assigning from hierarchical management and there is no explicit system-level design, no well-defined plan or schedules A central managing group may check the code but this process is much less rigid than in closed-source projects

Trang 38

Several OSS systems have been in widespread use with thousands or millions of users, e.g Mozilla, Apache, OpenOffice, Eclipse, NetBeans, GNOME, and Linux Due to the success of OSS, more and more software companies have switched from a closed source to an open source development in order to win market share and to improve product quality (Ven and Mannaert, 2008) Even the leading commercial software companies, such as IBM and Sun, have begun to embrace the open source model and are actively taking part in the development of OSS products

end-As OSS application rapidly spreads out, it is of great importance to assess the reliability of OSS system to prevent potential financial loss or reputational damage to the company (Gyimothy et al., 2005) Due to this consideration, many studies have been carried out recently on predicting number of defects in the system For instances, Eaddy et al (2008) investigated the relationship between the degree of scattering and the number of defects by stepwise regression and other statistical techniques Marcus (2008) proposed a new measure named Conceptual Cohesion of Classes (C3) to measure the cohesion in object-oriented software They also applied C3 in logistic regression to predict software faults with the comparisons with other object-oriented metrics Kim et al (2008) introduced a new technique for predicting latent software bugs in OSS, called change classification Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes or clean changes In this manner, change classification predicts the existence of bugs in software changes

Although the works above can provide important information to assess the reliability for OSS, the total number of defects in a software system is an essentially indirect

Trang 39

reliability measurement where the time factor is often neglected (Xie, 1991) Only in some recent studies by Tamura and Yamada (2008; 2009), such issue is considered In particular, Tamura and Yamada (2008) combined neural network and software reliability growth modeling for the assessment of OSS reliability In Tamura and Yamada (2009), the stochastic differential equation is introduced for the modeling of OSS reliability, and optimal version-update time is discussed based on it

In this chapter, we will further investigate the modeling of OSS reliability and its optimal version-update time determination Our model is based on non-homogeneous Poisson process (NHPP) which has been proven to be a successful model for software reliability (Musa, 1987; Xie, 1991; Lyu, 1996; Pham, 2000) However, different from the NHPP models for closed source software and the models proposed in Tamura and Yamada (2008; 2009), our model incorporates the unique patterns of OSS development, such as the multiple releases property and the hump-shaped fault detection rate function In addition, because the project cost is no longer a crucial factor for optimal release time determination for most OSS projects, in this study, we formulate a new version-update time determination problem for OSS Specifically, the multi-attribute utility theory (MAUT) is adopted for this decision process, where two important strategies are considered simultaneously: rapid release of the software to maintain sufficient volunteers involved and the acceptable level of OSS reliability

The rest of this chapter is organized as follows Section 3.2 describes our proposed model based on NHPP incorporating unique properties of OSS Section 3.3 formulates the optimal version-update time problem based on MAUT, where the rapid release strategy and the level of reliability are considered simultaneously Section 3.4

Trang 40

provides numerical examples for validation purpose based on the real world data sets Conclusions are made in Section 3.5

3.2 Modeling Fault Detection Process of Open Source Software

The underlying software fault detection process is commonly assumed to be a homogeneous Poisson process (NHPP) (Musa, 1987; Xie, 1991; Lyu, 1996; Pham, 2000) As software faults are detected, isolated and removed, the software being tested becomes more reliable with a decreasing failure intensity function In general,

non-an NHPP software reliability growth model (SRGM) cnon-an be developed by solving the following differential equation (Pham, 2003):

 ( ) ( )

)()(

t m t a t b dt

t

dm   , (3.1)

where m (t), a (t) and b (t) are the mean value function of detected faults, the fault content function and fault detection rate function respectively, and typical boundary point is m(0)0 Given different expressions and explanations of a(t) and b(t), many

NHPP SRGMs can be developed (Zhang and Pham, 2000)

The basic assumption illustrated by the above formulation can also hold in the context

of OSS (Tamura and Yamada, 2009) The reason lies in the fact that in OSS the failure rate at current time is still determined by the product of the fault detection rate

Định dạng
Số trang	199
Dung lượng	1,7 MB