Search | arXiv e-print repository

Anomaly Detection on Unstable Logs with GPT Models

Authors: Fatemeh Hadadi, Qinghua Xu, Domenico Bianculli, Lionel Briand

Abstract: Log-based anomaly detection has been widely studied in the literature as a way to increase the dependability of software-intensive systems. In reality, logs can be unstable due to changes made to the software during its evolution. This, in turn, degrades the performance of downstream log analysis activities, such as anomaly detection. The critical challenge in detecting anomalies on these unstable… ▽ More Log-based anomaly detection has been widely studied in the literature as a way to increase the dependability of software-intensive systems. In reality, logs can be unstable due to changes made to the software during its evolution. This, in turn, degrades the performance of downstream log analysis activities, such as anomaly detection. The critical challenge in detecting anomalies on these unstable logs is the lack of information about the new logs, due to insufficient log data from new software versions. The application of Large Language Models (LLMs) to many software engineering tasks has revolutionized various domains. In this paper, we report on an experimental comparison of a fine-tuned LLM and alternative models for anomaly detection on unstable logs. The main motivation is that the pre-training of LLMs on vast datasets may enable a robust understanding of diverse patterns and contextual information, which can be leveraged to mitigate the data insufficiency issue in the context of software evolution. Our experimental results on the two-version dataset of LOGEVOL-Hadoop show that the fine-tuned LLM (GPT-3) fares slightly better than supervised baselines when evaluated on unstable logs. The difference between GPT-3 and other supervised approaches tends to become more significant as the degree of changes in log sequences increases. However, it is unclear whether the difference is practically significant in all cases. Lastly, our comparison of prompt engineering (with GPT-4) and fine-tuning reveals that the latter provides significantly superior performance on both stable and unstable logs, offering valuable insights into the effective utilization of LLMs in this domain. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2401.17019 [pdf, other]

Towards Generating Executable Metamorphic Relations Using Large Language Models

Authors: Seung Yeob Shin, Fabrizio Pastore, Domenico Bianculli, Alexandra Baicoianu

Abstract: Metamorphic testing (MT) has proven to be a successful solution to automating testing and addressing the oracle problem. However, it entails manually deriving metamorphic relations (MRs) and converting them into an executable form; these steps are time-consuming and may prevent the adoption of MT. In this paper, we propose an approach for automatically deriving executable MRs (EMRs) from requireme… ▽ More Metamorphic testing (MT) has proven to be a successful solution to automating testing and addressing the oracle problem. However, it entails manually deriving metamorphic relations (MRs) and converting them into an executable form; these steps are time-consuming and may prevent the adoption of MT. In this paper, we propose an approach for automatically deriving executable MRs (EMRs) from requirements using large language models (LLMs). Instead of merely asking the LLM to produce EMRs, our approach relies on a few-shot prompting strategy to instruct the LLM to perform activities in the MT process, by providing requirements and API specifications, as one would do with software engineers. To assess the feasibility of our approach, we conducted a questionnaire-based survey in collaboration with Siemens Industry Software, a worldwide leader in providing industry software and services, focusing on four of their software applications. Additionally, we evaluated the accuracy of the generated EMRs for a Web application. The outcomes of our study are highly promising, as they demonstrate the capability of our approach to generate MRs and EMRs that are both comprehensible and pertinent for testing purposes. △ Less

Submitted 7 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: preprint - accepted for QUATIC 2024

arXiv:2311.13517 [pdf, other]

doi 10.1145/3635708

Learning-Based Relaxation of Completeness Requirements for Data Entry Forms

Authors: Hichem Belgacem, Xiaochen Li, Domenico Bianculli, Lionel C. Briand

Abstract: Data entry forms use completeness requirements to specify the fields that are required or optional to fill for collecting necessary information from different types of users. However, some required fields may not be applicable for certain types of users anymore. Nevertheless, they may still be incorrectly marked as required in the form; we call such fields obsolete required fields. Since obsol… ▽ More Data entry forms use completeness requirements to specify the fields that are required or optional to fill for collecting necessary information from different types of users. However, some required fields may not be applicable for certain types of users anymore. Nevertheless, they may still be incorrectly marked as required in the form; we call such fields obsolete required fields. Since obsolete required fields usually have not-null validation checks before submitting the form, users have to enter meaningless values in such fields in order to complete the form submission. These meaningless values threaten the quality of the filled data. To avoid users filling meaningless values, existing techniques usually rely on manually written rules to identify the obsolete required fields and relax their completeness requirements. However, these techniques are ineffective and costly. In this paper, we propose LACQUER, a learning-based automated approach for relaxing the completeness requirements of data entry forms. LACQUER builds Bayesian Network models to automatically learn conditions under which users had to fill meaningless values. To improve its learning ability, LACQUER identifies the cases where a required field is only applicable for a small group of users, and uses SMOTE, an oversampling technique, to generate more instances on such fields for effectively mining dependencies on them. Our experimental results show that LACQUER can accurately relax the completeness requirements of required fields in data entry forms with precision values ranging between 0.76 and 0.90 on different datasets. LACQUER can prevent users from filling 20% to 64% of meaningless values, with negative predictive values between 0.72 and 0.91. Furthermore, LACQUER is efficient; it takes at most 839 ms to predict the completeness requirement of an instance. △ Less

Submitted 13 December, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: Accepted for publication by ACM Transactions on Software Engineering and Methodology

Journal ref: ACM Trans. Softw. Eng. Methodol. 33, 3, Article 77 (March 2024), 32 pages

arXiv:2307.16714 [pdf, other]

A Comprehensive Study of Machine Learning Techniques for Log-Based Anomaly Detection

Authors: Shan Ali, Chaima Boufaied, Domenico Bianculli, Paula Branco, Lionel Briand

Abstract: Growth in system complexity increases the need for automated techniques dedicated to different log analysis tasks such as Log-based Anomaly Detection (LAD). The latter has been widely addressed in the literature, mostly by means of a variety of deep learning techniques. Despite their many advantages, that focus on deep learning techniques is somewhat arbitrary as traditional Machine Learning (ML… ▽ More Growth in system complexity increases the need for automated techniques dedicated to different log analysis tasks such as Log-based Anomaly Detection (LAD). The latter has been widely addressed in the literature, mostly by means of a variety of deep learning techniques. Despite their many advantages, that focus on deep learning techniques is somewhat arbitrary as traditional Machine Learning (ML) techniques may perform well in many cases, depending on the context and datasets. In the same vein, semi-supervised techniques deserve the same attention as supervised techniques since the former have clear practical advantages. Further, current evaluations mostly rely on the assessment of detection accuracy. However, this is not enough to decide whether or not a specific ML technique is suitable to address the LAD problem in a given context. Other aspects to consider include training and prediction times as well as the sensitivity to hyperparameter tuning, which in practice matters to engineers. In this paper, we present a comprehensive empirical study, in which we evaluate supervised and semi-supervised, traditional and deep ML techniques w.r.t. four evaluation criteria: detection accuracy, time performance, sensitivity of detection accuracy and time performance to hyperparameter tuning. The experimental results show that supervised traditional and deep ML techniques fare similarly in terms of their detection accuracy and prediction time. Moreover, overall, sensitivity analysis to hyperparameter tuning w.r.t. detection accuracy shows that supervised traditional ML techniques are less sensitive than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques. △ Less

Submitted 20 May, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

arXiv:2305.15897 [pdf, other]

doi 10.1007/s10664-024-10533-w

Impact of Log Parsing on Deep Learning-Based Anomaly Detection

Authors: Zanis Ali Khan, Donghwan Shin, Domenico Bianculli, Lionel Briand

Abstract: Software systems log massive amounts of data, recording important runtime information. Such logs are used, for example, for log-based anomaly detection, which aims to automatically detect abnormal behaviors of the system under analysis by processing the information recorded in its logs. Many log-based anomaly detection techniques based on deep learning models include a pre-processing step called l… ▽ More Software systems log massive amounts of data, recording important runtime information. Such logs are used, for example, for log-based anomaly detection, which aims to automatically detect abnormal behaviors of the system under analysis by processing the information recorded in its logs. Many log-based anomaly detection techniques based on deep learning models include a pre-processing step called log parsing. However, understanding the impact of log parsing on the accuracy of anomaly detection techniques has received surprisingly little attention so far. Investigating what are the key properties log parsing techniques should ideally have to help anomaly detection is therefore warranted. In this paper, we report on a comprehensive empirical study on the impact of log parsing on anomaly detection accuracy, using 13 log parsing techniques, seven anomaly detection techniques (five based on deep learning and two based on traditional machine learning) on three publicly available log datasets. Our empirical results show that, despite what is widely assumed, there is no strong correlation between log parsing accuracy and anomaly detection accuracy, regardless of the metric used for measuring log parsing accuracy. Moreover, we experimentally confirm existing theoretical results showing that it is a property that we refer to as distinguishability in log parsing results as opposed to their accuracy that plays an essential role in achieving accurate anomaly detection. △ Less

Submitted 19 August, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Journal ref: Empir Software Eng 29, 139 (2024)

arXiv:2303.07230 [pdf, other]

doi 10.1007/s10664-024-10501-4

Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction

Authors: Fatemeh Hadadi, Joshua H. Dawes, Donghwan Shin, Domenico Bianculli, Lionel Briand

Abstract: With the increasing complexity and scope of software systems, their dependability is crucial. The analysis of log data recorded during system execution can enable engineers to automatically predict failures at run time. Several Machine Learning (ML) techniques, including traditional ML and Deep Learning (DL), have been proposed to automate such tasks. However, current empirical studies are limited… ▽ More With the increasing complexity and scope of software systems, their dependability is crucial. The analysis of log data recorded during system execution can enable engineers to automatically predict failures at run time. Several Machine Learning (ML) techniques, including traditional ML and Deep Learning (DL), have been proposed to automate such tasks. However, current empirical studies are limited in terms of covering all main DL types -- Recurrent Neural Network (RNN), Convolutional Neural network (CNN), and transformer -- as well as examining them on a wide range of diverse datasets. In this paper, we aim to address these issues by systematically investigating the combination of log data embedding strategies and DL types for failure prediction. To that end, we propose a modular architecture to accommodate various configurations of embedding strategies and DL-based encoders. To further investigate how dataset characteristics such as dataset size and failure percentage affect model accuracy, we synthesised 360 datasets, with varying characteristics, for three distinct system behavioral models, based on a systematic and automated generation approach. Using the F1 score metric, our results show that the best overall performing configuration is a CNN-based encoder with Logkey2vec. Additionally, we provide specific dataset conditions, namely a dataset size >350 or a failure percentage >7.5%, under which this configuration demonstrates high accuracy for failure prediction. △ Less

Submitted 24 June, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted by EMSE'24

Journal ref: Empir Software Eng 29, 105 (2024)

arXiv:2302.13913 [pdf, other]

doi 10.1145/3624742

Stress Testing Control Loops in Cyber-Physical Systems

Authors: Claudio Mandrioli, Seung Yeob Shin, Martina Maggio, Domenico Bianculli, Lionel Briand

Abstract: Cyber-Physical Systems (CPSs) are often safety-critical and deployed in uncertain environments. Identifying scenarios where CPSs do not comply with requirements is fundamental but difficult due to the multidisciplinary nature of CPSs. We investigate the testing of control-based CPSs, where control and software engineers develop the software collaboratively. Control engineers make design assumption… ▽ More Cyber-Physical Systems (CPSs) are often safety-critical and deployed in uncertain environments. Identifying scenarios where CPSs do not comply with requirements is fundamental but difficult due to the multidisciplinary nature of CPSs. We investigate the testing of control-based CPSs, where control and software engineers develop the software collaboratively. Control engineers make design assumptions during system development to leverage control theory and obtain guarantees on CPS behaviour. In the implemented system, however, such assumptions are not always satisfied, and their falsification can lead to loss of guarantees. We define stress testing of control-based CPSs as generating tests to falsify such design assumptions. We highlight different types of assumptions, focusing on the use of linearised physics models. To generate stress tests falsifying such assumptions, we leverage control theory to qualitatively characterise the input space of a control-based CPS. We propose a novel test parametrisation for control-based CPSs and use it with the input space characterisation to develop a stress testing approach. We evaluate our approach on three case study systems, including a drone, a continuous-current motor (in five configurations), and an aircraft.Our results show the effectiveness of the proposed testing approach in falsifying the design assumptions and highlighting the causes of assumption violations. △ Less

Submitted 18 September, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted for publication in August 2023 on the ACM Transactions on Software Engineering and Methodology (TOSEM)

Journal ref: ACM Trans. Softw. Eng. Methodol. 33, 2, Article 35 (February 2024), 58 pages

arXiv:2211.16587 [pdf, other]

doi 10.1145/3640332

Rigorous Assessment of Model Inference Accuracy using Language Cardinality

Authors: Donato Clun, Donghwan Shin, Antonio Filieri, Domenico Bianculli

Abstract: Models such as finite state automata are widely used to abstract the behavior of software systems by capturing the sequences of events observable during their execution. Nevertheless, models rarely exist in practice and, when they do, get easily outdated; moreover, manually building and maintaining models is costly and error-prone. As a result, a variety of model inference methods that automatical… ▽ More Models such as finite state automata are widely used to abstract the behavior of software systems by capturing the sequences of events observable during their execution. Nevertheless, models rarely exist in practice and, when they do, get easily outdated; moreover, manually building and maintaining models is costly and error-prone. As a result, a variety of model inference methods that automatically construct models from execution traces have been proposed to address these issues. However, performing a systematic and reliable accuracy assessment of inferred models remains an open problem. Even when a reference model is given, most existing model accuracy assessment methods may return misleading and biased results. This is mainly due to their reliance on statistical estimators over a finite number of randomly generated traces, introducing avoidable uncertainty about the estimation and being sensitive to the parameters of the random trace generative process. This paper addresses this problem by developing a systematic approach based on analytic combinatorics that minimizes bias and uncertainty in model accuracy assessment by replacing statistical estimation with deterministic accuracy measures. We experimentally demonstrate the consistency and applicability of our approach by assessing the accuracy of models inferred by state-of-the-art inference tools against reference models from established specification mining benchmarks. △ Less

Submitted 16 January, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: Accepted for publication in the ACM Transactions on Software Engineering and Methodology

Journal ref: ACM Trans. Softw. Eng. Methodol. 33, 4, Article 95 (May 2024), 39 pages

arXiv:2206.04024 [pdf, other]

doi 10.1109/TSE.2023.3242588

Trace Diagnostics for Signal-based Temporal Properties

Authors: Chaima Boufaied, Claudio Menghi, Domenico Bianculli, Lionel Briand

Abstract: Most of the trace-checking tools only yield a Boolean verdict. However, when a property is violated by a trace, engineers usually inspect the trace to understand the cause of the violation; such manual diagnostic is time-consuming and error-prone. Existing approaches that complement trace-checking tools with diagnostic capabilities either produce low-level explanations that are hardly comprehensib… ▽ More Most of the trace-checking tools only yield a Boolean verdict. However, when a property is violated by a trace, engineers usually inspect the trace to understand the cause of the violation; such manual diagnostic is time-consuming and error-prone. Existing approaches that complement trace-checking tools with diagnostic capabilities either produce low-level explanations that are hardly comprehensible by engineers or do not support complex signal-based temporal properties. In this paper, we propose TD-SB-TemPsy, a trace-diagnostic approach for properties expressed using SB-TemPsy-DSL. Given a property and a trace that violates the property, TD-SB-TemPsy determines the root cause of the property violation. TD-SB-TemPsy relies on the concepts of violation cause, which characterizes one of the behaviors of the system that may lead to a property violation, and diagnoses, which are associated with violation causes and provide additional information to help engineers understand the violation cause. As part of TD-SB-TemPsy, we propose a language-agnostic methodology to define violation causes and diagnoses. In our context, its application resulted in a catalog of 34 violation causes, each associated with one diagnosis, tailored to properties expressed in SB-TemPsy-DSL. We assessed the applicability of TD-SB-TemPsy on two datasets, including one based on a complex industrial case study.The results show that TD-SB-TemPsy could finish within a timeout of 1 min for ~83.66% of the trace-property combinations in the industrial dataset, yielding a diagnosis in ~99.84% of these cases. Moreover, it also yielded a diagnosis for all the trace-property combinations in the other dataset. These results suggest that our tool is applicable and efficient in most cases. △ Less

Submitted 31 January, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Journal ref: IEEE Transactions on Software Engineering, vol. 49, no. 5, pp. 3131-3154, 1 May 2023

arXiv:2202.08572 [pdf, other]

doi 10.1145/3533021

A Machine Learning Approach for Automated Filling of Categorical Fields in Data Entry Forms

Authors: Hichem Belgacem, Xiaochen Li, Domenico Bianculli, Lionel C. Briand

Abstract: Users frequently interact with software systems through data entry forms. However, form filling is time-consuming and error-prone. Although several techniques have been proposed to auto-complete or pre-fill fields in the forms, they provide limited support to help users fill categorical fields, i.e., fields that require users to choose the right value among a large set of options. In this paper,… ▽ More Users frequently interact with software systems through data entry forms. However, form filling is time-consuming and error-prone. Although several techniques have been proposed to auto-complete or pre-fill fields in the forms, they provide limited support to help users fill categorical fields, i.e., fields that require users to choose the right value among a large set of options. In this paper, we propose LAFF, a learning-based automated approach for filling categorical fields in data entry forms. LAFF first builds Bayesian Network models by learning field dependencies from a set of historical input instances, representing the values of the fields that have been filled in the past. To improve its learning ability, LAFF uses local modeling to effectively mine the local dependencies of fields in a cluster of input instances. During the form filling phase, LAFF uses such models to predict possible values of a target field, based on the values in the already-filled fields of the form and their dependencies; the predicted values (endorsed based on field dependencies and prediction confidence) are then provided to the end-user as a list of suggestions. We evaluated LAFF by assessing its effectiveness and efficiency in form filling on two datasets, one of them proprietary from the banking domain. Experimental results show that LAFF is able to provide accurate suggestions with a Mean Reciprocal Rank value above 0.73. Furthermore, LAFF is efficient, requiring at most 317 ms per suggestion. △ Less

Submitted 28 April, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: Accepted for publication by ACM Transactions on Software Engineering and Methodology

Journal ref: ACM Trans. Softw. Eng. Methodol. 32, 2, Article 47 (March 2023), 40 page

arXiv:2110.14960 [pdf, other]

An AI-based Approach for Tracing Content Requirements in Financial Documents

Authors: Xiaochen Li, Domenico Bianculli, Lionel C. Briand

Abstract: The completeness (in terms of content) of financial documents is a fundamental requirement for investment funds. To ensure completeness, financial regulators spend a huge amount of time for carefully checking every financial document based on the relevant content requirements, which prescribe the information types to be included in financial documents (e.g., the description of shares' issue condit… ▽ More The completeness (in terms of content) of financial documents is a fundamental requirement for investment funds. To ensure completeness, financial regulators spend a huge amount of time for carefully checking every financial document based on the relevant content requirements, which prescribe the information types to be included in financial documents (e.g., the description of shares' issue conditions). Although several techniques have been proposed to automatically detect certain types of information in documents in various application domains, they provide limited support to help regulators automatically identify the text chunks related to financial information types, due to the complexity of financial documents and the diversity of the sentences characterizing an information type. In this paper, we propose FITI, an artificial intelligence (AI)-based method for tracing content requirements in financial documents. Given a new financial document, FITI selects a set of candidate sentences for efficient information type identification. Then, FITI uses a combination of rule-based and data-centric approaches, by leveraging information retrieval (IR) and machine learning (ML) techniques that analyze the words, sentences, and contexts related to an information type, to rank candidate sentences. Finally, using a list of indicator phrases related to each information type, a heuristic-based selector, which considers both the sentence ranking and the domain-specific phrases, determines a list of sentences corresponding to each information type. We evaluated FITI by assessing its effectiveness in tracing financial content requirements in 100 financial documents. Experimental results show that FITI provides accurate identification with average precision and recall values of 0.824 and 0.646, respectively. Furthermore, FITI can detect about 80% of missing information types in financial documents. △ Less

Submitted 28 October, 2021; originally announced October 2021.

Comments: 17 pages, 3 figures

arXiv:2106.01987 [pdf, other]

doi 10.1007/s10664-021-10111-4

PRINS: Scalable Model Inference for Component-based System Logs

Authors: Donghwan Shin, Domenico Bianculli, Lionel Briand

Abstract: Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, quickly become outdated as implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite state models from execution logs. However, existing techniques do not scale well when… ▽ More Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, quickly become outdated as implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite state models from execution logs. However, existing techniques do not scale well when processing very large logs that can be commonly found in practice. In this paper, we address the scalability problem of inferring the model of a component-based system from large system logs, without requiring any extra information. Our model inference technique, called PRINS, follows a divide-and-conquer approach. The idea is to first infer a model of each system component from the corresponding logs; then, the individual component models are merged together taking into account the flow of events across components, as reflected in the logs. We evaluated PRINS in terms of scalability and accuracy, using nine datasets composed of logs extracted from publicly available benchmarks and a personal computer running desktop business applications. The results show that PRINS can process large logs much faster than a publicly available and well-known state-of-the-art tool, without significantly compromising the accuracy of inferred models. △ Less

Submitted 13 April, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

Journal ref: Empir Software Eng 27, 87 (2022)

arXiv:2010.03525 [pdf]

Empirical Standards for Software Engineering Research

Authors: Paul Ralph, Nauman bin Ali, Sebastian Baltes, Domenico Bianculli, Jessica Diaz, Yvonne Dittrich, Neil Ernst, Michael Felderer, Robert Feldt, Antonio Filieri, Breno Bernard Nicolau de França, Carlo Alberto Furia, Greg Gay, Nicolas Gold, Daniel Graziotin, Pinjia He, Rashina Hoda, Natalia Juristo, Barbara Kitchenham, Valentina Lenarduzzi, Jorge Martínez, Jorge Melegati, Daniel Mendez, Tim Menzies, Jefferson Molleri , et al. (18 additional authors not shown)

Abstract: Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around resear… ▽ More Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair. △ Less

Submitted 4 March, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: For the complete standards, supplements and other resources, see https://github.com/acmsigsoft/EmpiricalStandards

arXiv:2009.12250 [pdf, other]

Trace-Checking CPS Properties: Bridging the Cyber-Physical Gap

Authors: Claudio Menghi, Enrico Viganò, Domenico Bianculli, Lionel C. Briand

Abstract: Cyber-physical systems combine software and physical components. Specification-driven trace-checking tools for CPS usually provide users with a specification language to express the requirements of interest, and an automatic procedure to check whether these requirements hold on the execution traces of a CPS. Although there exist several specification languages for CPS, they are often not sufficien… ▽ More Cyber-physical systems combine software and physical components. Specification-driven trace-checking tools for CPS usually provide users with a specification language to express the requirements of interest, and an automatic procedure to check whether these requirements hold on the execution traces of a CPS. Although there exist several specification languages for CPS, they are often not sufficiently expressive to allow the specification of complex CPS properties related to the software and the physical components and their interactions. In this paper, we propose (i) the Hybrid Logic of Signals (HLS), a logic-based language that allows the specification of complex CPS requirements, and (ii) ThEodorE, an efficient SMT-based trace-checking procedure. This procedure reduces the problem of checking a CPS requirement over an execution trace, to checking the satisfiability of an SMT formula. We evaluated our contributions by using a representative industrial case study in the satellite domain. We assessed the expressiveness of HLS by considering 212 requirements of our case study. HLS could express all the 212 requirements. We also assessed the applicability of ThEodorE by running the trace-checking procedure for 747 trace-requirement combinations. ThEodorE was able to produce a verdict in 74.5% of the cases. Finally, we compared HLS and ThEodorE with other specification languages and trace-checking tools from the literature. Our results show that, from a practical standpoint, our approach offers a better trade-off between expressiveness and performance. △ Less

Submitted 15 September, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

arXiv:2004.07194 [pdf, other]

Effective Removal of Operational Log Messages: an Application to Model Inference

Authors: Donghwan Shin, Domenico Bianculli, Lionel Briand

Abstract: Model inference aims to extract accurate models from the execution logs of software systems. However, in reality, logs may contain some "noise" that could deteriorate the performance of model inference. One form of noise can commonly be found in system logs that contain not only transactional messages---logging the functional behavior of the system---but also operational messages---recording the o… ▽ More Model inference aims to extract accurate models from the execution logs of software systems. However, in reality, logs may contain some "noise" that could deteriorate the performance of model inference. One form of noise can commonly be found in system logs that contain not only transactional messages---logging the functional behavior of the system---but also operational messages---recording the operational state of the system (e.g., a periodic heartbeat to keep track of the memory usage). In low-quality logs, transactional and operational messages are randomly interleaved, leading to the erroneous inclusion of operational behaviors into a system model, that ideally should only reflect the functional behavior of the system. It is therefore important to remove operational messages in the logs before inferring models. In this paper, we propose LogCleaner, a novel technique for removing operational logs messages. LogCleaner first performs a periodicity analysis to filter out periodic messages, and then it performs a dependency analysis to calculate the degree of dependency for all log messages and to remove operational messages based on their dependencies. The experimental results on two proprietary and 11 publicly available log datasets show that LogCleaner, on average, can accurately remove 98% of the operational messages and preserve 81% of the transactional messages. Furthermore, using logs pre-processed with LogCleaner decreases the execution time of model inference (with a speed-up ranging from 1.5 to 946.7 depending on the characteristics of the system) and significantly improves the accuracy of the inferred models, by increasing their ability to accept correct system behaviors (+43.8 pp on average, with pp=percentage points) and to reject incorrect system behaviors (+15.0 pp on average). △ Less

Submitted 16 April, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

arXiv:1910.08330 [pdf, ps, other]

doi 10.1016/j.jss.2020.110881

Signal-Based Properties of Cyber-Physical Systems: Taxonomy and Logic-based Characterization

Authors: Chaima Boufaied, Maris Jukss, Domenico Bianculli, Lionel Claude Briand, Yago Isasi Parache

Abstract: The behavior of a cyber-physical system (CPS) is usually defined in terms of the input and output signals processed by sensors and actuators. Requirements specifications of CPSs are typically expressed using signal-based temporal properties. Expressing such requirements is challenging, because of (1) the many features that can be used to characterize a signal behavior; (2) the broad variation in e… ▽ More The behavior of a cyber-physical system (CPS) is usually defined in terms of the input and output signals processed by sensors and actuators. Requirements specifications of CPSs are typically expressed using signal-based temporal properties. Expressing such requirements is challenging, because of (1) the many features that can be used to characterize a signal behavior; (2) the broad variation in expressiveness of the specification languages (i.e., temporal logics) used for defining signal-based temporal properties. Thus, system and software engineers need effective guidance on selecting appropriate signal behavior types and an adequate specification language, based on the type of requirements they have to define. In this paper, we present a taxonomy of the various types of signal-based properties and provide, for each type, a comprehensive and detailed description as well as a formalization in a temporal logic. Furthermore, we review the expressiveness of state-of-the-art signal-based temporal logics in terms of the property types identified in the taxonomy. Moreover, we report on the application of our taxonomy to classify the requirements specifications of an industrial case study in the aerospace domain, in order to assess the feasibility of using the property types included in our taxonomy and the completeness of the latter. △ Less

Submitted 28 December, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: 37 pages, revised version, accepted for publication by the Elsevier Journal of Systems and Software

Journal ref: Volume 174, April 2021, 110881

arXiv:1908.02329 [pdf, other]

Scalable Inference of System-level Models from Component Logs

Authors: Donghwan Shin, Salma Messaoudi, Domenico Bianculli, Annibale Panichella, Lionel Briand, Raimondas Sasnauskas

Abstract: Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, they quickly become outdated as the implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite-state models from execution logs. However, existing techniques do not scale w… ▽ More Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, they quickly become outdated as the implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite-state models from execution logs. However, existing techniques do not scale well when processing very large logs, such as system-level logs obtained by combining component-level logs. Furthermore, in the case of component-based systems, existing techniques assume to know the definitions of communication channels between components. However, this information is usually not available in the case of systems integrating 3rd-party components with limited documentation. In this paper, we address the scalability problem of inferring the model of a component-based system from the individual component-level logs, when the only available information about the system are high-level architecture dependencies among components and a (possibly incomplete) list of log message templates denoting communication events between components. Our model inference technique, called SCALER, follows a divide and conquer approach. The idea is to first infer a model of each system component from the corresponding logs; then, the individual component models are merged together taking into account the dependencies among components, as reflected in the logs. We evaluated SCALER in terms of scalability and accuracy, using a dataset of logs from an industrial system; the results show that SCALER can process much larger logs than a state-of-the-art tool, while yielding more accurate models. △ Less

Submitted 16 April, 2020; v1 submitted 6 August, 2019; originally announced August 2019.

arXiv:1811.06740 [pdf, ps, other]

A Survey of Challenges for Runtime Verification from Advanced Application Domains (Beyond Software)

Authors: César Sánchez, Gerardo Schneider, Wolfgang Ahrendt, Ezio Bartocci, Domenico Bianculli, Christian Colombo, Yliés Falcone, Adrian Francalanza, Srđan Krstić, JoHao M. Lourenço, Dejan Nickovic, Gordon J. Pace, Jose Rufino, Julien Signoles, Dmitriy Traytel, Alexander Weiss

Abstract: Runtime verification is an area of formal methods that studies the dynamic analysis of execution traces against formal specifications. Typically, the two main activities in runtime verification efforts are the process of creating monitors from specifications, and the algorithms for the evaluation of traces against the generated monitors. Other activities involve the instrumentation of the system t… ▽ More Runtime verification is an area of formal methods that studies the dynamic analysis of execution traces against formal specifications. Typically, the two main activities in runtime verification efforts are the process of creating monitors from specifications, and the algorithms for the evaluation of traces against the generated monitors. Other activities involve the instrumentation of the system to generate the trace and the communication between the system under analysis and the monitor. Most of the applications in runtime verification have been focused on the dynamic analysis of software, even though there are many more potential applications to other computational devices and target systems. In this paper we present a collection of challenges for runtime verification extracted from concrete application domains, focusing on the difficulties that must be overcome to tackle these specific challenges. The computational models that characterize these domains require to devise new techniques beyond the current state of the art in runtime verification. △ Less

Submitted 16 November, 2018; originally announced November 2018.

arXiv:1508.06613 [pdf, other]

Efficient Large-scale Trace Checking Using MapReduce

Authors: Marcello M. Bersani, Domenico Bianculli, Carlo Ghezzi, Srdan Krstic, Pierluigi San Pietro

Abstract: The problem of checking a logged event trace against a temporal logic specification arises in many practical cases. Unfortunately, known algorithms for an expressive logic like MTL (Metric Temporal Logic) do not scale with respect to two crucial dimensions: the length of the trace and the size of the time interval for which logged events must be buffered to check satisfaction of the specification.… ▽ More The problem of checking a logged event trace against a temporal logic specification arises in many practical cases. Unfortunately, known algorithms for an expressive logic like MTL (Metric Temporal Logic) do not scale with respect to two crucial dimensions: the length of the trace and the size of the time interval for which logged events must be buffered to check satisfaction of the specification. The former issue can be addressed by distributed and parallel trace checking algorithms that can take advantage of modern cloud computing and programming frameworks like MapReduce. Still, the latter issue remains open with current state-of-the-art approaches. In this paper we address this memory scalability issue by proposing a new semantics for MTL, called lazy semantics. This semantics can evaluate temporal formulae and boolean combinations of temporal-only formulae at any arbitrary time instant. We prove that lazy semantics is more expressive than standard point-based semantics and that it can be used as a basis for a correct parametric decomposition of any MTL formula into an equivalent one with smaller, bounded time intervals. We use lazy semantics to extend our previous distributed trace checking algorithm for MTL. We evaluate the proposed algorithm in terms of memory scalability and time/memory tradeoffs. △ Less

Submitted 26 August, 2015; originally announced August 2015.

Comments: 13 pages, 8 figures

MSC Class: 68N30 ACM Class: D.2.4

arXiv:1409.4653 [pdf, other]

Offline Trace Checking of Quantitative Properties of Service-Based Applications

Authors: Domenico Bianculli, Carlo Ghezzi, Srdan Krstic, Pierluigi San Pietro

Abstract: Service-based applications are often developed as compositions of partner services. A service integrator needs precise methods to specify the quality attributes expected by each partner service, as well as effective techniques to verify these attributes. In previous work, we identified the most common specification patterns related to provisioning service-based applications and developed an expres… ▽ More Service-based applications are often developed as compositions of partner services. A service integrator needs precise methods to specify the quality attributes expected by each partner service, as well as effective techniques to verify these attributes. In previous work, we identified the most common specification patterns related to provisioning service-based applications and developed an expressive specification language (SOLOIST) that supports them. SOLOIST is an extension of metric temporal logic with aggregate temporal modalities that can be used to write quantitative temporal properties. In this paper we address the problem of performing offline checking of service execution traces against quantitative requirements specifications written in SOLOIST. We present a translation of SOLOIST into CLTLB(D), a variant of linear temporal logic, and reduce the trace checking of SOLOIST to bounded satisfiability checking of CLTLB(D), which is supported by ZOT, an SMT-based verification toolkit. We detail the results of applying the proposed offline trace checking procedure to different types of traces, and compare its performance with previous work. △ Less

Submitted 16 September, 2014; originally announced September 2014.

Comments: 19 pages, 7 figures, Extended version of the SOCA 2014 paper

ACM Class: D.2.4

arXiv:1406.3661 [pdf, ps, other]

Trace checking of Metric Temporal Logic with Aggregating Modalities using MapReduce

Authors: Domenico Bianculli, Carlo Ghezzi, Srdan Krstic

Abstract: Modern complex software systems produce a large amount of execution data, often stored in logs. These logs can be analyzed using trace checking techniques to check whether the system complies with its requirements specifications. Often these specifications express quantitative properties of the system, which include timing constraints as well as higher-level constraints on the occurrences of signi… ▽ More Modern complex software systems produce a large amount of execution data, often stored in logs. These logs can be analyzed using trace checking techniques to check whether the system complies with its requirements specifications. Often these specifications express quantitative properties of the system, which include timing constraints as well as higher-level constraints on the occurrences of significant events, expressed using aggregate operators. In this paper we present an algorithm that exploits the MapReduce programming model to check specifications expressed in a metric temporal logic with aggregating modalities, over large execution traces. The algorithm exploits the structure of the formula to parallelize the evaluation, with a significant gain in time. We report on the assessment of the implementation - based on the Hadoop framework - of the proposed algorithm and comment on its scalability. △ Less

Submitted 13 June, 2014; originally announced June 2014.

Comments: 16 pages, 6 figures, Extended version of the SEFM 2014 paper

ACM Class: D.2.4

arXiv:1304.8034 [pdf, ps, other]

A Syntactic-Semantic Approach to Incremental Verification

Authors: Domenico Bianculli, Antonio Filieri, Carlo Ghezzi, Dino Mandrioli

Abstract: Software verification of evolving systems is challenging mainstream methodologies and tools. Formal verification techniques often conflict with the time constraints imposed by change management practices for evolving systems. Since changes in these systems are often local to restricted parts, an incremental verification approach could be beneficial. This paper introduces SiDECAR, a general frame… ▽ More Software verification of evolving systems is challenging mainstream methodologies and tools. Formal verification techniques often conflict with the time constraints imposed by change management practices for evolving systems. Since changes in these systems are often local to restricted parts, an incremental verification approach could be beneficial. This paper introduces SiDECAR, a general framework for the definition of verification procedures, which are made incremental by the framework itself. Verification procedures are driven by the syntactic structure (defined by a grammar) of the system and encoded as semantic attributes associated with the grammar. Incrementality is achieved by coupling the evaluation of semantic attributes with an incremental parsing technique. We show the application of SiDECAR to the definition of two verification procedures: probabilistic verification of reliability requirements and verification of safety properties. △ Less

Submitted 1 May, 2013; v1 submitted 30 April, 2013; originally announced April 2013.

Comments: 22 pages, 8 figures. Corrected typos

ACM Class: D.2.4; D.3.4

Showing 1–22 of 22 results for author: Bianculli, D