Search | arXiv e-print repository

Causal Reasoning in Software Quality Assurance: A Systematic Review

Authors: Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, Stefano Russo

Abstract: Context: Software Quality Assurance (SQA) is a fundamental part of software engineering to ensure stakeholders that software products work as expected after release in operation. Machine Learning (ML) has proven to be able to boost SQA activities and contribute to the development of quality software systems. In this context, Causal Reasoning is gaining increasing interest as a methodology to solve… ▽ More Context: Software Quality Assurance (SQA) is a fundamental part of software engineering to ensure stakeholders that software products work as expected after release in operation. Machine Learning (ML) has proven to be able to boost SQA activities and contribute to the development of quality software systems. In this context, Causal Reasoning is gaining increasing interest as a methodology to solve some of the current ML limitations. It aims to go beyond a purely data-driven approach by exploiting the use of causality for more effective SQA strategies. Objective: Provide a broad and detailed overview of the use of causal reasoning for SQA activities, in order to support researchers to access this research field, identifying room for application, main challenges and research opportunities. Methods: A systematic literature review of causal reasoning in the SQA research area. Scientific papers have been searched, classified, and analyzed according to established guidelines for software engineering secondary studies. Results: Results highlight the primary areas within SQA where causal reasoning has been applied, the predominant methodologies used, and the level of maturity of the proposed solutions. Fault localization is the activity where causal reasoning is more exploited, especially in the web services/microservices domain, but other tasks like testing are rapidly gaining popularity. Both causal inference and causal discovery are exploited, with the Pearl's graphical formulation of causality being preferred, likely due to its intuitiveness. Tools to favour their application are appearing at a fast pace - most of them after 2021. Conclusions: The findings show that causal reasoning is a valuable means for SQA tasks with respect to multiple quality attributes, especially during V&V, evolution and maintenance to ensure reliability, while it is not yet fully exploited for phases like ... △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: Preprint Journal Information and Software Technology

arXiv:2403.19271 [pdf, other]

doi 10.1145/3597503.3639584

DeepSample: DNN sampling-based testing for operational accuracy assessment

Authors: Antonio Guerriero, Roberto Pietrantuono, Stefano Russo

Abstract: Deep Neural Networks (DNN) are core components for classification and regression tasks of many software systems. Companies incur in high costs for testing DNN with datasets representative of the inputs expected in operation, as these need to be manually labelled. The challenge is to select a representative set of test inputs as small as possible to reduce the labelling cost, while sufficing to yie… ▽ More Deep Neural Networks (DNN) are core components for classification and regression tasks of many software systems. Companies incur in high costs for testing DNN with datasets representative of the inputs expected in operation, as these need to be manually labelled. The challenge is to select a representative set of test inputs as small as possible to reduce the labelling cost, while sufficing to yield unbiased high-confidence estimates of the expected DNN accuracy. At the same time, testers are interested in exposing as many DNN mispredictions as possible to improve the DNN, ending up in the need for techniques pursuing a threefold aim: small dataset size, trustworthy estimates, mispredictions exposure. This study presents DeepSample, a family of DNN testing techniques for cost-effective accuracy assessment based on probabilistic sampling. We investigate whether, to what extent, and under which conditions probabilistic sampling can help to tackle the outlined challenge. We implement five new sampling-based testing techniques, and perform a comprehensive comparison of such techniques and of three further state-of-the-art techniques for both DNN classification and regression tasks. Results serve as guidance for best use of sampling-based testing for faithful and high-confidence estimates of DNN accuracy in operation at low cost. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Accepted for publication at ICSE 2024, Lisbon, Portugal

arXiv:2303.01295 [pdf, other]

doi 10.1109/ICSE-NIER58687.2023.00014

Iterative Assessment and Improvement of DNN Operational Accuracy

Authors: Antonio Guerriero, Roberto Pietrantuono, Stefano Russo

Abstract: Deep Neural Networks (DNN) are nowadays largely adopted in many application domains thanks to their human-like, or even superhuman, performance in specific tasks. However, due to unpredictable/unconsidered operating conditions, unexpected failures show up on field, making the performance of a DNN in operation very different from the one estimated prior to release. In the life cycle of DNN systems,… ▽ More Deep Neural Networks (DNN) are nowadays largely adopted in many application domains thanks to their human-like, or even superhuman, performance in specific tasks. However, due to unpredictable/unconsidered operating conditions, unexpected failures show up on field, making the performance of a DNN in operation very different from the one estimated prior to release. In the life cycle of DNN systems, the assessment of accuracy is typically addressed in two ways: offline, via sampling of operational inputs, or online, via pseudo-oracles. The former is considered more expensive due to the need for manual labeling of the sampled inputs. The latter is automatic but less accurate. We believe that emerging iterative industrial-strength life cycle models for Machine Learning systems, like MLOps, offer the possibility to leverage inputs observed in operation not only to provide faithful estimates of a DNN accuracy, but also to improve it through remodeling/retraining actions. We propose DAIC (DNN Assessment and Improvement Cycle), an approach which combines ''low-cost'' online pseudo-oracles and ''high-cost'' offline sampling techniques to estimate and improve the operational accuracy of a DNN in the iterations of its life cycle. Preliminary results show the benefits of combining the two approaches and integrating them in the DNN life cycle. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Paper accepted at 45th International Conference on Software Engineering (ICSE'23 NIER), May 2023

arXiv:2102.04287 [pdf, other]

doi 10.1109/ICSE43902.2021.00042

Operation is the hardest teacher: estimating DNN accuracy looking for mispredictions

Authors: Antonio Guerriero, Roberto Pietrantuono, Stefano Russo

Abstract: Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of ex… ▽ More Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get trustworthy unbiased estimates. This paper presents a test selection technique (DeepEST) that actively looks for failing test cases in the operational dataset of a DNN, with the goal of assessing the DNN expected accuracy by a small and ''informative'' test suite (namely with a high number of mispredictions) for subsequent DNN improvement. Experiments with five subjects, combining four DNN models and three datasets, are described. The results show that DeepEST provides DNN accuracy estimates with precision close to (and often better than) those of existing sampling-based DNN testing techniques, while detecting from 5 to 30 times more mispredictions, with the same test suite size. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: Paper accept at 43rd ACM/IEEE International Conference on Software Engineering, Madrid, Spain. May 2021

arXiv:0711.2971 [pdf]

IT services design to support coordination practices in the Luxembourguish AEC sector

Authors: Sylvain Kubicki, Annie Guerriéro, Damien Hanser, Gilles Halin

Abstract: In the Architecture Engineering and Construction sector (AEC) cooperation between actors is essential for project success. The configuration of actors' organization takes different forms like the associated coordination mechanisms. Our approach consists in analyzing these coordination mechanisms through the identification of the "base practices" realized by the actors of a construction project t… ▽ More In the Architecture Engineering and Construction sector (AEC) cooperation between actors is essential for project success. The configuration of actors' organization takes different forms like the associated coordination mechanisms. Our approach consists in analyzing these coordination mechanisms through the identification of the "base practices" realized by the actors of a construction project to cooperate. We also try with practitioners to highlight the "best practices" of cooperation. Then we suggest here two prototypes of IT services aiming to demonstrate the value added of IT to support cooperation. These prototype tools allow us to sensitize the actors through terrain experiments and then to bring inch by inch the Luxembourgish AEC sector towards electronic cooperation. △ Less

Submitted 19 November, 2007; originally announced November 2007.

Showing 1–5 of 5 results for author: Guerriero, A