N EW F INDINGS N OT A SSIMILATED I NTO S OFTWARE C- 123docz.net

Angelis et al (2001) used recent data collected by the International Software Benchmarking Standards Group to create a software cost estimation model. This data set consisted of historical data from many different types of organizations. They conducted a regression with the basic effort equation as the model. The results showed that 44% of the variance of was explained when predicting effort with size. A categorical regression was

conducted with many variables, but the variable, maximum team size, was found to be significant. With the maximum team size placed into the model, the explained variance doubled to around 88%.

Using dimensional analysis is common in fields like Physics, Chemistry, or Math where units matter. “Dimensional analysis is a method of comparing the dimensions of the physical quantities occurring in a problem to find relationships between the quantities without having to solve the problem completely” (Random House 1998). Equation checking is part of dimensional analysis. In this step, the formula’s theoretical derivation is checked based on algebra. If the units on both sides of the equation are equal, the equation is said to be commensurable. If the units are not equal, for example, if apples are on one side of the equation, and oranges are on the other side, the equation is said to be incommensurable. After studying all the software cost estimation models, “Conventional software models can not be correct because each is incommensurate” (Nemecek 2001).

Predicting effort with size using regression is not a valid theoretical derivation.

Another study was done to look at the sensitivity of COCOMO II (Musilek, Pedrycz et al. 2002). After conducting three types of sensitivity analysis including mathematical analysis of the effort equation, Monte Carlo simulation, and error

propagation, the size variable in COCOMO II was found to be very sensitive followed by the effort multipliers. The exponential factor has little impact of error. The authors

suggest using fuzzy set of inputs to software size whereby giving the project manager a spectrum of effort estimations rather than a single point estimate.

Neural networks also have been used to predict effort. In this particular study (Idri, Khoshgoftaar et al. 2002), size plus four effort multipliers were placed in the neural network. This study used the COCOMO dataset and the researchers claimed that the

“results are acceptable”. Although, understanding and interpreting the resulting neural network was found to be very difficult.

Estimating by analogies or case-based-reasoning is another technique used to predict effort. The use of analogies as a technique was suggested over 20 years ago (Boehm 1981). The effectiveness of case-based-reasoning greatly relies on the underlying dataset used for analogies. Case-based-reasoning is a type of cluster analysis and inherits the weakness of any cluster analysis methodology.

“Cluster analysis is the name for a group of multivariate techniques whose primary purpose is to group objects based on the characteristics they posses. Cluster analysis classifies objects (e.g., respondents, products, or other entities) so that each object is very similar to others in the cluster with respect to some predetermined selection criterion. The resulting clusters of objects should then exhibit high internal (within-cluster) homogeneity and high external (between-cluster) heterogeneity. Thus, if the classification is successful, the objects within clusters will be close together when plotted geometrically, and different clusters will be far apart” (Hair, Anderson et al. 1998 p.473).

Case-based-reasoning is often used in task domains that have no strong theoretical models and where the domain rules are incomplete, ill-defined, and inconsistent

(Mukhopadhyay, Vicinanza et al. 1992). The number of possible project factors is a problem for many software cost estimation models. Over 74 different project factors have been identified (Wrigley and Dexter 1987). Predetermining some set of project factors then running a cluster-type analysis on a published data set usually yields favorable

results. Consider the case-based-reasoning model called Estor. “Estor did not perform quite as well as the human expert, but it did outperform existing algorithmic model on the data set” (Mukhopadhyay, Vicinanza et al. 1992 p.167). Estor estimates averaged 52%

within actual estimates when COCOMO averaged 618% within actual estimates. The goal of software cost estimation is not to predict the cost of historical data, but rather to predict the cost of new projects. The authors write, “To be fair, Estor would almost certainly fail to accurately estimate project from different environment (e.g. embedded military systems) with additional domain knowledge” (Mukhopadhyay, Vicinanza et al.

1992 p.167). “Estimates of the accuracy of prediction obtained from a training set are always optimistic. To get a more realistic estimate of the accuracy of prediction you either have to use a new, independent data set or adopt a jack-knife approach” (Samson, Ellison et al. 1997 p.59).

An important study was conducted to show the causes of estimating error. Only one managerial practice, which was the use of the estimate in performance evaluation of software managers and professionals, was shown to increase accuracy of estimates.

Software cost estimation models were shown to be no help. The authors write

“… It is unexpected that the application of the algorithmic basis failed to predict estimating accuracy. Apparently, the use of complex statistics, software, and standards do not facilitate more accurate estimates. Such a finding does not imply that software managers and professionals should shun algorithm-based estimating techniques. However it intimates instead that they recognize their shortcomings: Specifically, the employment of algorithm-based estimating methods did not improve the accuracy of cost estimates for subjects in this research. When using such methods, software managers and professionals probably need to be very careful to avoid the

By holding estimators responsible for their estimates is probably the only way software cost estimation is going to improve. Once people are responsible for their estimates, substandard models will not be tolerated.

N EW F INDINGS N OT A SSIMILATED I NTO S OFTWARE C OST E STIMATION M ODELS

S ATISFACTION AND P ERCEIVED U SEFULNESS