This makes us wondering whether software is reliable at all, whether we should use software in safety-critical embedded applications. With processors and software permeating safety critical embedded world, the reliability of software is simply a matter of life and death. Are we embedding potential disasters while we embed software into systems? Electronic and mechanical parts may become "old" and wear out with time and usage, but software will not rust or wear-out during its life cycle. Software will not change over time unless intentionally changed or upgraded.
Software Reliability is an important to attribute of software quality, together with functionality, usability, performance, serviceability, capability, installability, maintainability, and documentation. Software Reliability is hard to achieve, because the complexity of software tends to be high. While any system with a high degree of complexity, including software, will be hard to reach a certain level of reliability, system developers tend to push complexity into the software layer, with the rapid growth of system size and ease of doing so by upgrading the software.
For example, large next-generation aircraft will have over one million source lines of software on-board; next-generation air traffic control systems will contain between one and two million lines; the upcoming international Space Station will have over two million lines on-board and over ten million lines of ground support software; several major life-critical defense systems will have over five million source lines of software.
Emphasizing these features will tend to add more complexity to software. Software failures may be due to errors, ambiguities, oversights or misinterpretation of the specification that the software is supposed to satisfy, carelessness or incompetence in writing code, inadequate testing, incorrect or unexpected usage of the software or other unforeseen problems. Hardware faults are mostly physical faults , while software faults are design faults , which are harder to visualize, classify, detect, and correct. In hardware, design faults may also exist, but physical faults usually dominate.
In software, we can hardly find a strict corresponding counterpart for "manufacturing" as hardware manufacturing process, if the simple action of uploading software modules into place does not count. Therefore, the quality of software will not change once it is uploaded into the storage and start running. Trying to achieve higher reliability by simply duplicating the same software modules will not work, because design faults can not be masked off by voting.
A partial list of the distinct characteristics of software compared to hardware is listed below [Keene94]:. Over time, hardware exhibits the failure characteristics shown in Figure 1, known as the bathtub curve. Period A, B and C stands for burn-in phase, useful life phase and end-of-life phase. A detailed discussion about the curve can be found in the topic Traditional Reliability. Software reliability, however, does not show the same characteristics similar as hardware. A possible curve is shown in Figure 2 if we projected software reliability on the same axes.
One difference is that in the last phase, software does not have an increasing failure rate as hardware does. In this phase, software is approaching obsolescence; there are no motivation for any upgrades or changes to the software. Therefore, the failure rate will not change. The second difference is that in the useful-life phase, software will experience a drastic increase in failure rate each time an upgrade is made.
The failure rate levels off gradually, partly because of the defects found and fixed after the upgrades. Revised bathtub curve for software reliability. The upgrades in Figure 2 imply feature upgrades, not upgrades for reliability. For feature upgrades, the complexity of software is likely to be increased, since the functionality of software is enhanced. Even bug fixes may be a reason for more software failures, if the bug fix induces other defects into software. For reliability upgrades, it is possible to incur a drop in software failure rate, if the goal of the upgrade is enhancing software reliability, such as a redesign or reimplementation of some modules using better engineering approaches, such as clean-room method.
A proof can be found in the result from Ballista project, robustness testing of off-the-shelf software Components. Since software robustness is one aspect of software reliability, this result indicates that the upgrade of those systems shown in Figure 3 should have incorporated reliability upgrades. Since Software Reliability is one of the most important aspects of software quality, Reliability Engineering approaches are practiced in software field as well. Software Reliability Engineering SRE is the quantitative study of the operational behavior of software-based systems with respect to user requirements concerning reliability [IEEE95].
A proliferation of software reliability models have emerged as people try to understand the characteristics of how and why software fails, and try to quantify software reliability. Over models have been developed since the early s, but how to quantify software reliability still remains largely unsolved.
Interested readers may refer to [RAC96] , [Lyu95]. As many models as there are and many more emerging, none of the models can capture a satisfying amount of the complexity of software; constraints and assumptions have to be made for the quantifying process. Therefore, there is no single model that can be used in all situations. No model is complete or even representative. One model may work well for a set of certain software, but may be completely off track for other kinds of problems. The mathematical function is usually higher order exponential or logarithmic.
Software modeling techniques can be divided into two subcategories: The major difference of the two models are shown in Table 1. Difference between software reliability prediction models and software reliability estimation models.
With processors and software permeating safety critical embedded world, the reliability of software is simply a matter of life and death. The initial quest in software reliability study is based on an analogy of traditional and hardware reliability. As discussed in National Research Council , to adequately test software, given the combinatorial complexity of the sequence of statements activated as a function of possible inputs, one is obligated to use some form of automated test generation, with high code coverage assessed using one of the various coverage metrics proposed in the research literature. A primary limitation is that there can be a very large number of states in a large software program. The use of fault seeding could also be biased in other ways, causing problems in estimation, but there are various generalizations and extensions of the technique that can address these various problems. The Clean Coder Robert C. For details, see Jelinksi and Moranda
Using prediction models, software reliability can be predicted early in the development phase and enhancements can be initiated to improve the reliability. Representative estimation models include exponential distribution models, Weibull distribution model, Thompson and Chelson's model, etc. The field has matured to the point that software models can be applied in practical situations and give meaningful results and, second, that there is no one model that is best in all situations.
Only limited factors can be put into consideration. By doing so, complexity is reduced and abstraction is achieved, however, the models tend to specialize to be applied to only a portion of the situations and a certain class of the problems. We have to carefully choose the right model that suits our specific case.
Furthermore, the modeling results can not be blindly believed and applied. Measurement is commonplace in other engineering field, but not in software engineering. Though frustrating, the quest of quantifying software reliability has never ceased. Until now, we still have no good way of measuring software reliability. Measuring software reliability remains a difficult problem because we don't have a good understanding of the nature of software. There is no clear definition to what aspects are related to software reliability.
We can not find a suitable way to measure software reliability, and most of the aspects related to software reliability.
Even the most obvious product metrics such as software size have not uniform definition. It is tempting to measure something related to reliability to reflect the characteristics, if we can not measure reliability directly. The current practices of software reliability measurement can be divided into four categories: Software size is thought to be reflective of complexity, development effort and reliability. But there is not a standard way of counting.
This method can not faithfully compare software not written in the same language. The advent of new technologies of code reuse and code generation technique also cast doubt on this simple method. Function point metric is a method of measuring the functionality of a proposed software development based upon a count of inputs, outputs, master files, inquires, and interfaces.
The method can be used to estimate the size of a software system as soon as these functions can be identified. It is a measure of the functional complexity of the program. It measures the functionality delivered to the user and is independent of the programming language. It is used primarily for business systems; it is not proven in scientific or real-time applications. Complexity is directly related to software reliability, so representing complexity is important.
Complexity-oriented metrics is a method of determining the complexity of a program's control structure, by simplify the code into a graphical representation. Representative metric is McCabe's Complexity Metric. Detailed discussion about various software testing methods can be found in topic Software Testing.
Researchers have realized that good management can result in better products. Research has demonstrated that a relationship exists between the development process and the ability to complete projects on time and within the desired quality objectives. Costs increase when developers use inadequate processes. Higher reliability can be achieved by using better development process, risk management process, configuration management process, etc. Based on the assumption that the quality of the product is a direct function of the process, process metrics can be used to estimate, monitor and improve the reliability and quality of software.
ISO certification, or "quality management standards", is the generic reference for a family of standards developed by the International Standards Organization ISO. The goal of collecting fault and failure metrics is to be able to determine when the software is approaching failure-free execution. Minimally, both the number of faults found during testing i. Test strategy is highly relative to the effectiveness of fault metrics, because if the testing scenario does not cover the full functionality of the software, the software may pass all tests and yet be prone to failure once delivered.
Usually, failure metrics are based upon customer information regarding failures found after release of the software. The failure data collected is therefore used to calculate failure density, Mean Time Between Failures MTBF or other parameters to measure or predict software reliability. Before the deployment of software products, testing, verification and validation are necessary steps.
Software testing is heavily used to trigger, locate and remove software defects. Software testing is still in its infant stage; testing is crafted to suit specific needs in various software development projects in an ad-hoc manner. Various analysis tools such as trend analysis, fault-tree analysis, Orthogonal Defect classification and formal methods, etc, can also be used to minimize the possibility of defect occurrence after release and therefore improve software reliability.
After deployment of the software product, field data can be gathered and analyzed to study the behavior of software defects. Software Reliability is a part of software quality.
It relates to many areas where software quality is concerned. Markov models require transition probabilities from state to state where the states are defined by the current values of key variables that define the functioning of the software system. Using these transition probabilities, a stochastic model is created and analyzed for stability.
A primary limitation is that there can be a very large number of states in a large software program. For details, see Whittaker In this model, fault clustering is estimated using time-series analysis. For details, see Crow and Singpurwalla In these models, if there is a fault in the mapping of the space of inputs to the space of intended outputs, then that mapping is identified as a potential fault to be rectified. These models are often infeasible because of the very large number of possibilities in a large software system.
For details, see Bastani and Ramamoorthy and Weiss and Weyuker It is quite likely that for broad categories of software systems, there already exist prediction models that could be used earlier in development than performance metrics for use in tracking and assessment. It is possible that such models could also be used to help identify better performing contractors at the proposal stage. Further, there has been a substantial amount of research in the software engineering community on building generalizable prediction models i.
Given the benefits from earlier identification of problematic software, we strongly encourage the U. Department of Defense DoD to stay current with the state of the art in software reliability as is practiced in the commercial software industry, with increased emphasis on data analytics and analysis. When it is clear that there are prediction models that are broadly applicable, DoD should consider mandating their use by contractors in software development.
A number of metrics have been found to be related to software system reliability and therefore are candidates for monitoring to assess progress toward meeting reliability requirements. These include code churn, code complexity, and code dependencies see below. We note that the course on reliability and maintainability offered by the Defense Acquisition University lists 10 factors for increasing software reliability and maintainability:.
These factors are all straightforward to measure, and they can be supplied by the contractor throughout development. Metrics-based models are a special type of software reliability growth model that have not been widely used in defense acquisition. The purpose of this section is to provide an understanding of when metrics-based models are applicable during software development.
The validation of such internal metrics requires a convincing demonstration that the metric measures what it purports to measure and that the metric is associated with an important external metric, such as field reliability, maintainability, or fault-proneness for details, see El-Emam, Software fault-proneness is defined as the probability of the presence of faults in the software.
Failure-proneness is the probability that a particular software element will fail in operation. The higher the failure-proneness of the software, logically, the lower the reliability and the quality of the software produced, and vice versa.
Using operational profiling information, it is possible to relate generic failure-proneness and fault-proneness of a product. Research on fault-proneness has focused on two areas: While software fault-proneness can be measured before deployment such as the count of faults per structural unit, e. Five types of metrics have been used to study software quality: The rest of this section, although not comprehensive, discusses the type of statistical models that can be built using these measures.
Code churn measures the changes made to a component, file, or system over some period of time. The most commonly used code churn measures are the number of lines of code that are added, modified, or deleted. Other churn measures include temporal churn churn relative to the time of release of the system and repetitive churn frequency of changes to the same file or component. Several research studies have used code churn as an indicator.
Munson and Elbaum observed that as a system is developed, the relative complexity of each program module that has been altered will change. They studied a software component with , lines of code embedded in a real-time system with 3, modules programmed in C. Code churn metrics were found to be among the most highly correlated with problem reports.
Another kind of code churn is debug churn, which Khoshgoftaar et al. They studied two consecutive releases of a large legacy system for telecommunications that contained more than 38, procedures in modules. Discriminant analysis identified fault-prone modules on the basis of 16 static software product metrics. Their model, when used on the second release, showed type I and type II misclassification rates of Using information on files with status new, changed, and unchanged, along with other explanatory variables such as lines of code, age, prior faults as predictors in a negative binomial regression equation, Ostrand et al.
Their model had high accuracy for faults found in both early and later stages of development. In a study on Windows Server , Nagappan and Ball demonstrated the use of relative code churn measures normalized values of the various measures obtained during the evolution of the system to predict defect density at statistically significant levels. The top three recommendations made by their system identified a correct location for future change with an accuracy of 70 percent. Code complexity measures range from the classical cyclomatic complexity measures see McCabe, to the more recent object-oriented metrics, one of which is known as the CK metric suite after its authors see Chidamber and Kemerer, McCabe designed cyclomatic complexity.
Cyclomatic complexity is adapted from the classical graph theoretical cyclomatic number and can be defined as the number of linearly independent paths through a program. The CK metric suite identifies six object-oriented metrics:. The CK metrics have also been investigated in the context of fault-proneness.
They found the first five object-oriented metrics listed above were correlated with defects while the last metric was not. Subramanyam and Krishnan present a survey on eight more empirical studies, all showing that object-oriented metrics are significantly associated with defects. Early work by Pogdurski and Clarke presented a formal model of program dependencies based on the relationship between two pieces of code inferred from the program text.
They proposed an alternate way of predicting failures for Java classes. Rather than looking at the complexity of a class, they looked exclusively at the components that a class uses. For Eclipse, the open source integrated development environment, they found that using compiler packages resulted in a significantly higher failure-proneness 71 percent than using graphical user interface packages. Zimmermann and Nagappan built a systemwide code dependency graph of Windows Server and found that models built from social network measures had accuracy of greater than 10 percentage points in comparison with models built from complexity metrics.
Defect growth curves i. And Biyani and Santhanam showed that for four industrial systems at IBM there was a very strong relationship between development defects per module and field defects per module. This approach allows the building of prediction models based on development defects to identify field defects. They found that the models built using such social measures revealed 58 percent of the failures in 20 percent of the files in the system.
Studies performed by Nagappan et al. In predicting software reliability with software metrics, a number of approaches have been proposed. Logistic regression is a popular technique that has been used for building metric-based reliability models. The general form of a logistic regression equation is given as follows:. In the case of metrics-based reliability models, the independent variables can be any of the combination of measures ranging from code churn and code complexity to people and social network measures.
Another common technique used in metrics-based prediction models is a support vector machine for details, see Han and Kamber, For a quick overview of this technique, consider a two-dimensional training set with two classes as shown in Figure In part a of the figure, points representing software modules are either defect-free circles or have defects boxes.
A support vector machine separates the data cloud into two sets by searching for a maximum marginal hyperplane; in the two-dimensional case, this hyperplane is simply a line. There are an infinite number of possible hyperplanes in part a of the figure that separate the two groups. Support vector machines choose the hyperplane with the margin that gives the largest separation between classes.
Part a of the figure shows a hyperplane with a small margin; part b shows one with the maximum margin. Support vector machines thus compute a decision boundary, which is used to classify or predict new points. One example is the triangle in part c of Figure The boundary shows on which side of the hyperplane the new software module is located. In the example, the triangle is below the hyperplane; thus it is classified as defect free. Separating data with a single hyperplane is not always possible. Part d of Figure shows an example of nonlinear data for which it is not possible to separate the two-dimensional data with a line.
In this case, support vector machines transform the input data into a higher dimensional space using a nonlinear mapping. In this new space, the data are then linearly separated for details, see Han and Kamber, Support vector machines are less prone to overfitting than some other approaches because the complexity is characterized by the number of support vectors and not by the dimensionality of the input. See text for discussion. Other techniques that have been used instead of logistic regression and support vector machines are discriminant analysis and decision and classification trees.
Drawing general conclusions from empirical studies in software engineering is difficult because any process is highly dependent on a potentially large number of relevant contextual variables. Consequently, the panel does not assume a priori that the results of any study will generalize beyond the specific environment in which it was conducted, although researchers understandably become more confident in a theory when similar findings emerge in different contexts. Given that software is a vitally important aspect of reliability and that predicting software reliability early in development is a severe challenge, we suggest that DoD make a substantial effort to stay current with efforts employed in industry to produce useful predictions.
There is a generally accepted view that it is appropriate to combine software failures with hardware failures to assess system performance in a given test. However, in this section we are focusing on earlier non-system-level testing in developmental testing, akin to component-level testing for hardware. The concern is that if insufficient software testing is carried out during the early stages of developmental testing, then addressing software problems discovered in later stages of developmental testing or in operational testing will be much more expensive.
As discussed in National Research Council , to adequately test software, given the combinatorial complexity of the sequence of statements activated as a function of possible inputs, one is obligated to use some form of automated test generation, with high code coverage assessed using one of the various coverage metrics proposed in the research literature. This is necessary both to discover software defects and to evaluate the reliability of the software component or subsystem. However, given the current lack of software engineering expertise accessible in government developmental testing, the testing that can be usefully carried out, in addition to the testing done for the full system, is limited.
Consequently, we recommend that the primary testing of software components and subsystems be carried out by the developers and carefully documented and reported to DoD and that contractors provide software that can be used to run automated tests of the component or subsystem Recommendation 14 , in Chapter This includes information technology systems and major automated information systems. If DoD acquires the ability to carry out automated testing, then there are model-based techniques, including those developed by Poore see, e.
Finally, if contractor code is also shared with DoD, then DoD could validate some contractor results through the use of fault injection seeding techniques see Box , above.
International Series in Software Engineering. Free Preview. © Software Defect and Operational Profile Modeling. Authors: Kai-Yuan Cai. Software Defect and Operational Profile Modeling. Authors Part of the The Kluwer International Series in Software Engineering book series (SOFT, volume 4).
However, operational testing of a software system can raise an issue known as fault masking, whereby the occurrence of a fault prevents the software system from continuing and therefore misses faults that are conditional on the previous code functioning properly. Therefore, fault seeding can fail to provide unbiased estimates in such cases.
The use of fault seeding could also be biased in other ways, causing problems in estimation, but there are various generalizations and extensions of the technique that can address these various problems. They include explicit recognition of order constraints and fault masking, Bayesian constructs that provide profiles for each subroutine, and segmenting system runs. One of the most important principles found in commercial best practices is the benefit from the display of collected data in terms of trend charts to track progress. Along these lines, Selby demonstrates the use of analytics dashboards in large-scale software systems.
Analytics dashboards provide easily interpretable information that can help many users, including front-line software developers, software managers, and project managers. These dashboards can cater to a variety of requirements: Several of the metrics shown in the figure, for example, the trend of post-delivery defects, can help assess the overall stability of the system.
Selby states that organizations should define data trends that are reflective of success in meeting software requirements so that, over time, one could develop statistical tests that could effectively discriminate between successful and unsuccessful development programs.
Analytics dashboards can also give context-specific help, and the ability to drill down to provide further details is also useful: A high percentage of defense systems fail to meet their reliability requirements. This is a serious problem for the U. Department of Defense DOD , as well as the nation. Those systems are not only less likely to successfully carry out their intended missions, but they also could endanger the lives of the operators.
Furthermore, reliability failures discovered after deployment can result in costly and strategic delays and the need for expensive redesign, which often limits the tactical situations in which the system can be used. Finally, systems that fail to meet their reliability requirements are much more likely to need additional scheduled and unscheduled maintenance and to need more spare parts and possibly replacement systems, all of which can substantially increase the life-cycle costs of a system. Beginning in , DOD undertook a concerted effort to raise the priority of reliability through greater use of design for reliability techniques, reliability growth testing, and formal reliability growth modeling, by both the contractors and DOD units.
To this end, handbooks, guidances, and formal memoranda were revised or newly issued to reduce the frequency of reliability deficiencies for defense systems in operational testing and the effects of those deficiencies. Reliability Growth evaluates these recent changes and, more generally, assesses how current DOD principles and practices could be modified to increase the likelihood that defense systems will satisfy their reliability requirements.
This report examines changes to the reliability requirements for proposed systems; defines modern design and testing for reliability; discusses the contractor's role in reliability testing; and summarizes the current state of formal reliability growth modeling. The recommendations of Reliability Growth will improve the reliability of defense systems and protect the health of the valuable personnel who operate them. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.
Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book. Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.
To search the entire text of this book, type in your search term here and press Enter. Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available. Do you enjoy reading reports from the Academies online for free? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.