-
Normative Requirements Operationalization with Large Language Models
Authors:
Nick Feng,
Lina Marsso,
S. Getir Yaman,
Isobel Standen,
Yesugen Baatartogtokh,
Reem Ayad,
Victória Oldemburgo de Mello,
Bev Townsend,
Hanne Bartels,
Ana Cavalcanti,
Radu Calinescu,
Marsha Chechik
Abstract:
Normative non-functional requirements specify constraints that a system must observe in order to avoid violations of social, legal, ethical, empathetic, and cultural norms. As these requirements are typically defined by non-technical system stakeholders with different expertise and priorities (ethicists, lawyers, social scientists, etc.), ensuring their well-formedness and consistency is very chal…
▽ More
Normative non-functional requirements specify constraints that a system must observe in order to avoid violations of social, legal, ethical, empathetic, and cultural norms. As these requirements are typically defined by non-technical system stakeholders with different expertise and priorities (ethicists, lawyers, social scientists, etc.), ensuring their well-formedness and consistency is very challenging. Recent research has tackled this challenge using a domain-specific language to specify normative requirements as rules whose consistency can then be analysed with formal methods. In this paper, we propose a complementary approach that uses Large Language Models to extract semantic relationships between abstract representations of system capabilities. These relations, which are often assumed implicitly by non-technical stakeholders (e.g., based on common sense or domain knowledge), are then used to enrich the automated reasoning techniques for eliciting and analyzing the consistency of normative requirements. We show the effectiveness of our approach to normative requirements elicitation and operationalization through a range of real-world case studies.
△ Less
Submitted 28 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance
Authors:
Huakun Shen,
Boyue Caroline Hu,
Krzysztof Czarnecki,
Lina Marsso,
Marsha Chechik
Abstract:
While Neural Networks (NNs) have surpassed human accuracy in image classification on ImageNet, they often lack robustness against image corruption, i.e., corruption robustness. Yet such robustness is seemingly effortless for human perception. In this paper, we propose visually-continuous corruption robustness (VCR) -- an extension of corruption robustness to allow assessing it over the wide and co…
▽ More
While Neural Networks (NNs) have surpassed human accuracy in image classification on ImageNet, they often lack robustness against image corruption, i.e., corruption robustness. Yet such robustness is seemingly effortless for human perception. In this paper, we propose visually-continuous corruption robustness (VCR) -- an extension of corruption robustness to allow assessing it over the wide and continuous range of changes that correspond to the human perceptive quality (i.e., from the original image to the full distortion of all perceived visual information), along with two novel human-aware metrics for NN evaluation. To compare VCR of NNs with human perception, we conducted extensive experiments on 14 commonly used image corruptions with 7,718 human participants and state-of-the-art robust NN models with different training objectives (e.g., standard, adversarial, corruption robustness), different architectures (e.g., convolution NNs, vision transformers), and different amounts of training data augmentation. Our study showed that: 1) assessing robustness against continuous corruption can reveal insufficient robustness undetected by existing benchmarks; as a result, 2) the gap between NN and human robustness is larger than previously known; and finally, 3) some image corruptions have a similar impact on human perception, offering opportunities for more cost-effective robustness assessments. Our validation set with 14 image corruptions, human robustness data, and the evaluation code is provided as a toolbox and a benchmark.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Analyzing and Debugging Normative Requirements via Satisfiability Checking
Authors:
Nick Feng,
Lina Marsso,
Sinem Getir Yaman,
Yesugen Baatartogtokh,
Reem Ayad,
Victória Oldemburgo de Mello,
Beverley Townsend,
Isobel Standen,
Ioannis Stefanakos,
Calum Imrie,
Genaína Nunes Rodrigues,
Ana Cavalcanti,
Radu Calinescu,
Marsha Chechik
Abstract:
As software systems increasingly interact with humans in application domains such as transportation and healthcare, they raise concerns related to the social, legal, ethical, empathetic, and cultural (SLEEC) norms and values of their stakeholders. Normative non-functional requirements (N-NFRs) are used to capture these concerns by setting SLEEC-relevant boundaries for system behavior. Since N-NFRs…
▽ More
As software systems increasingly interact with humans in application domains such as transportation and healthcare, they raise concerns related to the social, legal, ethical, empathetic, and cultural (SLEEC) norms and values of their stakeholders. Normative non-functional requirements (N-NFRs) are used to capture these concerns by setting SLEEC-relevant boundaries for system behavior. Since N-NFRs need to be specified by multiple stakeholders with widely different, non-technical expertise (ethicists, lawyers, regulators, end users, etc.), N-NFR elicitation is very challenging. To address this challenge, we introduce N-Check, a novel tool-supported formal approach to N-NFR analysis and debugging. N-Check employs satisfiability checking to identify a broad spectrum of N-NFR well-formedness issues (WFI), such as conflicts, redundancy, restrictiveness, insufficiency, yielding diagnostics which pinpoint their causes in a user-friendly way that enables non-technical stakeholders to understand and fix them. We show the effectiveness and usability of our approach through nine case studies in which teams of ethicists, lawyers, philosophers, psychologists, safety analysts, and engineers used N-Check to analyse and debug 233 N-NFRs comprising 62 issues for the software underpinning the operation of systems ranging from assistive-care robots and tree-disease detection drones to manufacturing collaborative robots.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Early Verification of Legal Compliance via Bounded Satisfiability Checking
Authors:
Nick Feng,
Lina Marsso,
Mehrdad Sabetzadeh,
Marsha Chechik
Abstract:
Legal properties involve reasoning about data values and time. Metric first-order temporal logic (MFOTL) provides a rich formalism for specifying legal properties. While MFOTL has been successfully used for verifying legal properties over operational systems via runtime monitoring, no solution exists for MFOTL-based verification in early-stage system development captured by requirements. Given a l…
▽ More
Legal properties involve reasoning about data values and time. Metric first-order temporal logic (MFOTL) provides a rich formalism for specifying legal properties. While MFOTL has been successfully used for verifying legal properties over operational systems via runtime monitoring, no solution exists for MFOTL-based verification in early-stage system development captured by requirements. Given a legal property and system requirements, both formalized in MFOTL, the compliance of the property can be verified on the requirements via satisfiability checking. In this paper, we propose a practical, sound, and complete (within a given bound) satisfiability checking approach for MFOTL. The approach, based on satisfiability modulo theories (SMT), employs a counterexample-guided strategy to incrementally search for a satisfying solution. We implemented our approach using the Z3 SMT solver and evaluated it on five case studies spanning the healthcare, business administration, banking and aviation domains. Our results indicate that our approach can efficiently determine whether legal properties of interest are met, or generate counterexamples that lead to compliance violations.
△ Less
Submitted 27 May, 2023; v1 submitted 8 September, 2022;
originally announced September 2022.
-
Formally Modeling Autonomous Vehicles in LNT for Simulation and Testing
Authors:
Lina Marsso,
Radu Mateescu,
Lucie Muller,
Wendelin Serwe
Abstract:
We present two behavioral models of an autonomous vehicle and its interaction with the environment. Both models use the formal modeling language LNT provided by the CADP toolbox. This paper discusses the modeling choices and the challenges of our autonomous vehicle models, and also illustrates how formal validation tools can be applied to a single component or the overall vehicle.
We present two behavioral models of an autonomous vehicle and its interaction with the environment. Both models use the formal modeling language LNT provided by the CADP toolbox. This paper discusses the modeling choices and the challenges of our autonomous vehicle models, and also illustrates how formal validation tools can be applied to a single component or the overall vehicle.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
If a Human Can See It, So Should Your System: Reliability Requirements for Machine Vision Components
Authors:
Boyue Caroline Hu,
Lina Marsso,
Krzysztof Czarnecki,
Rick Salay,
Huakun Shen,
Marsha Chechik
Abstract:
Machine Vision Components (MVC) are becoming safety-critical. Assuring their quality, including safety, is essential for their successful deployment. Assurance relies on the availability of precisely specified and, ideally, machine-verifiable requirements. MVCs with state-of-the-art performance rely on machine learning (ML) and training data but largely lack such requirements.
In this paper, we…
▽ More
Machine Vision Components (MVC) are becoming safety-critical. Assuring their quality, including safety, is essential for their successful deployment. Assurance relies on the availability of precisely specified and, ideally, machine-verifiable requirements. MVCs with state-of-the-art performance rely on machine learning (ML) and training data but largely lack such requirements.
In this paper, we address the need for defining machine-verifiable reliability requirements for MVCs against transformations that simulate the full range of realistic and safety-critical changes in the environment. Using human performance as a baseline, we define reliability requirements as: 'if the changes in an image do not affect a human's decision, neither should they affect the MVC's.' To this end, we provide: (1) a class of safety-related image transformations; (2) reliability requirement classes to specify correctness-preservation and prediction-preservation for MVCs; (3) a method to instantiate machine-verifiable requirements from these requirements classes using human performance experiment data; (4) human performance experiment data for image recognition involving eight commonly used transformations, from about 2000 human participants; and (5) a method for automatically checking whether an MVC satisfies our requirements. Further, we show that our reliability requirements are feasible and reusable by evaluating our methods on 13 state-of-the-art pre-trained image classification models. Finally, we demonstrate that our approach detects reliability gaps in MVCs that other existing methods are unable to detect.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
Specifying a Cryptographical Protocol in Lustre and SCADE
Authors:
Lina Marsso
Abstract:
We present SCADE and Lustre models of the Message Authenticator Algorithm (MAA), which is one of the first cryptographic functions for computing a message authentication code. The MAA was adopted between 1987 and 2001, in international standards (ISO 8730 and ISO 8731-2), to ensure the authenticity and integrity of banking transactions. This paper discusses the choices and the challenges of our MA…
▽ More
We present SCADE and Lustre models of the Message Authenticator Algorithm (MAA), which is one of the first cryptographic functions for computing a message authentication code. The MAA was adopted between 1987 and 2001, in international standards (ISO 8730 and ISO 8731-2), to ensure the authenticity and integrity of banking transactions. This paper discusses the choices and the challenges of our MAA implementations. Our SCADE and Lustre models validate 201 official test vectors for the MAA.
△ Less
Submitted 28 April, 2020;
originally announced April 2020.
-
Comparative Study of Eight Formal Specifications of the Message Authenticator Algorithm
Authors:
Hubert Garavel,
Lina Marsso
Abstract:
The Message Authenticator Algorithm (MAA) is one of the first cryptographic functions for computing a Message Authentication Code. Between 1987 and 2001, the MAA was adopted in international standards (ISO 8730 and ISO 8731-2) to ensure the authenticity and integrity of banking transactions. In 1990 and 1991, three formal, yet non-executable, specifications of the MAA (in VDM, Z, and LOTOS) were d…
▽ More
The Message Authenticator Algorithm (MAA) is one of the first cryptographic functions for computing a Message Authentication Code. Between 1987 and 2001, the MAA was adopted in international standards (ISO 8730 and ISO 8731-2) to ensure the authenticity and integrity of banking transactions. In 1990 and 1991, three formal, yet non-executable, specifications of the MAA (in VDM, Z, and LOTOS) were developed at NPL. Since then, five formal executable specifications of the MAA (in LOTOS, LNT, and term rewrite systems) have been designed at INRIA Grenoble. This article provides an overview of the MAA and compares its formal specifications with respect to common-sense criteria, such as conciseness, readability, and efficiency of code generation.
△ Less
Submitted 27 March, 2018;
originally announced March 2018.
-
A Formal TLS Handshake Model in LNT
Authors:
Josip Bozic,
Lina Marsso,
Radu Mateescu,
Franz Wotawa
Abstract:
Testing of network services represents one of the biggest challenges in cyber security. Because new vulnerabilities are detected on a regular basis, more research is needed. These faults have their roots in the software development cycle or because of intrinsic leaks in the system specification. Conformance testing checks whether a system behaves according to its specification. Here model-based te…
▽ More
Testing of network services represents one of the biggest challenges in cyber security. Because new vulnerabilities are detected on a regular basis, more research is needed. These faults have their roots in the software development cycle or because of intrinsic leaks in the system specification. Conformance testing checks whether a system behaves according to its specification. Here model-based testing provides several methods for automated detection of shortcomings. The formal specification of a system behavior represents the starting point of the testing process. In this paper, a widely used cryptographic protocol is specified and tested for conformance with a test execution framework. The first empirical results are presented and discussed.
△ Less
Submitted 27 March, 2018;
originally announced March 2018.
-
A Large Term Rewrite System Modelling a Pioneering Cryptographic Algorithm
Authors:
Hubert Garavel,
Lina Marsso
Abstract:
We present a term rewrite system that formally models the Message Authenticator Algorithm (MAA), which was one of the first cryptographic functions for computing a Message Authentication Code and was adopted, between 1987 and 2001, in international standards (ISO 8730 and ISO 8731-2) to ensure the authenticity and integrity of banking transactions. Our term rewrite system is large (13 sorts,…
▽ More
We present a term rewrite system that formally models the Message Authenticator Algorithm (MAA), which was one of the first cryptographic functions for computing a Message Authentication Code and was adopted, between 1987 and 2001, in international standards (ISO 8730 and ISO 8731-2) to ensure the authenticity and integrity of banking transactions. Our term rewrite system is large (13 sorts, 18 constructors, 644 non-constructors, and 684 rewrite rules), confluent, and terminating. Implementations in thirteen different languages have been automatically derived from this model and used to validate 200 official test vectors for the MAA.
△ Less
Submitted 19 March, 2017;
originally announced March 2017.