License: arXiv.org perpetual non-exclusive license
arXiv:2403.01359v1 [cs.SE] 03 Mar 2024

ModelWriter: Text & Model-Synchronized Document Engineering Platform

Ferhat Erata14, Claire Gardent2, Bikash Gyawali2, Anastasia Shimorina2, Yvan Lussaud6, Bedir Tekinerdogan1,
Geylani Kardas57 and Anne Monceaux3
1Information Technology Group, Wageningen University and Research Centre, The Netherlands 2CNRS, LORIA, UMR 7503 Vandoeuvre-les-Nancy, F-54500, Nancy, France 3System Engineering Platforms, Airbus Group Innovations, Toulouse, France 4UNIT Information Technologies R&D Ltd., Izmir, Turkey 5Ege University, International Computer Institute, Izmir, Turkey 7KoçSistem Information and Communication Services Inc. Istanbul, Turkey 6OBEO, Nantes, France
Abstract

The ModelWriter platform provides a generic framework for automated traceability analysis. In this paper, we demonstrate how this framework can be used to trace the consistency and completeness of technical documents that consist of a set of System Installation Design Principles used by Airbus to ensure the correctness of aircraft system installation. We show in particular, how the platform allows the integration of two types of reasoning: reasoning about the meaning of text using semantic parsing and description logic theorem proving; and reasoning about document structure using first-order relational logic and finite model finding for traceability analysis.

https://itea3.org/project/modelwriter.html

I Introduction

The complexity of software systems in safety critical domains (e.g. avionics and automotive) has significantly increased over the years. Development of such systems requires various phases which result in several artifacts (e.g., requirements documents, architecture models and test cases). In this context, traceability [1, 2] not only establishes and maintains consistency between these artifacts but also helps guarantee that each requirement is fulfilled by the source code and test cases properly cover all requirements, a very important objective in safety critical systems and the standards they need to comply with DO-178C (Software Considerations in Airborne Systems and Equipment Certification) [3] and ISO-26262 (Road Vehicles - Functional Safety) [4]. As a result, the engineers have to establish and maintain several types of traces, having different semantics, between and within various development artifacts.

Traceability is a quality concern that helps users understand each and every steps in the development or even the end to end life cycle of a product. Its implementation is highly contextual as the key artifacts produced or used along a process differ depending on the product. We want to provide a framework for users to specify which artifacts they want to precisely identify and monitor and what is the meaning for trace links between these artifacts.

The considered Artifacts represented in our context by trace locations might be of different levels of granularity, ranging from a complete document or model to fragments of text or code. Focusing on documents and text, both the structure and the content might be used to reason about traceability.

To this end, ModelWriter platform provides a generic traceability analysis applicable to Text & Model artifacts. Trace locations can be fragments of text, elements of an architectural model, and parts of program codes. Traces are relations between trace locations. ModelWriter platform allows axiomatization of these relations and reasoning about them, i.e. supporting traceability analysis for different types of artifacts.

In this paper, we focus on demonstrating the features of ModelWriter platform for the traceability analysis applied to technical documentation. A particular challenge in this use case is to take into account the meaning of natural language. We integrate techniques from Natural Language Processing (NLP) and Automated Reasoning to reason both about the meaning and about the structure of text. We use techniques from semantic parsing to assign formal meaning representations to NL text. We then use techniques from theorem proving and model building to infer traceability relations between text fragments (here SIDPs), to check consistency and to ensure completeness.

II The Airbus SIDP Usecase

We illustrate the workings of the ModelWriter platform based on a set of System Installation Design Principles (SIDP) used by Airbus to ensure the correctness of aircraft design. A SIDP rule is actually a kind of system installation requirement, that is a description of system properties which should be fulfilled. In this usecase, SIDPs are trace locations and there are five types of trace links defined between trace locations, namely contains, refines, conflicts, equals, and requires. In the following, we informally give the meaning of the trace-types.

Rule r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT contains Rule r2rnsubscript𝑟2subscript𝑟𝑛r_{2}\dots r_{n}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT if r2rnsubscript𝑟2subscript𝑟𝑛r_{2}\dots r_{n}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are parts of the whole r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (part-whole hierarchy). The contained rule is a sub-rule of containing rule. Rule r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT refines another Rule r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT if r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is derived from r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by adding more details to its properties. The refined rule can be seen as an abstraction of the detailed rules. In Fig. 1 contains and refines traces are illustrated. Each box represents a property of the corresponding rule.

r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTcontainscontains\operatorname{\mathrel{contains}}roman_containscontainscontains\operatorname{\mathrel{contains}}roman_containsr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTrefinesrefines\operatorname{\mathrel{refines}}roman_refines
Figure 1: r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT contains r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and r3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, r4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT refines r5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT

Rule r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT conflicts with Rule r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT if the fulfillment of r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT excludes the fulfillment of r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and vice versa. The existence of a conflict trace indicates an inconsistency between two rules. Rule r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT equals to Rule r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT if r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT states exactly the same properties with their constraints with r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and vice versa. Rule r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT requires Rule r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT if r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is fulfilled only when r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is fulfilled. The required rule can be seen as a pre-condition for the requiring rule. In the following Fig. 2 conflicts, equals and requires traces are illustrated.

r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTconflictsconflicts\operatorname{\mathrel{conflicts}}roman_conflictsr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTequalsequals\operatorname{\mathrel{equals}}roman_equalsr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTrequiresrequires\operatorname{\mathrel{requires}}roman_requires
Figure 2: Illustration of “conflicts, equals and requires

Given a set of SIDPs, the ModelWriter platform can be used to check completeness and consistency as follows. First, SIDPs are parsed and assigned Description Logic formulae representing their meaning (cf. Section III-A). Second, traces are either manually specified by the end user or can be inferred using semantic parsing and DL theorem proving (cf. Section III-B). Third, new traces can be inferred upon existing ones using Relational Logic (cf. Section III-C) and Model Finding (cf. Section III-D). Importantly, the inference of trace links allows for the detection of missing or inconsistent SIDPs.

Table I illustrates this process. Given the SIDPs r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-r6subscript𝑟6r_{6}italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT, conflicts and refines trace links are first inferred using semantic parsing and the Hermit theorem prover [5] (DL lines in the table).

Table I: Example SIDPs and inference of Trace Links
Nr. Artifact Annotations (Trace-locations)
r1subscriptr1\text{r}_{\text{1}}r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT Bracket shall be used in hydraulic area Alpha
r2subscriptr2\text{r}_{\text{2}}r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT Adhesive bonded bracket shall be used in hydraulic area
r3subscriptr3\text{r}_{\text{3}}r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT Adhesive bonded bracket shall be used in hydraulic area Alpha
r4subscriptr4\text{r}_{\text{4}}r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT Bracket shall be used in hydraulic area
r5subscriptr5\text{r}_{\text{5}}r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT Bracket shall be installed in hydraulic area
r6subscriptr6\text{r}_{\text{6}}r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT Bracket shall be installed in fuel tank
Nr. Inferred Traces Nr. Inferred Traces
DL1subscriptDL1\text{DL}_{\text{1}}DL start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT conflicts(r5,r6)𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠subscript𝑟5subscript𝑟6conflicts(r_{5},r_{6})italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_s ( italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ) RL1subscriptRL1\text{RL}_{\text{1}}RL start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT conflicts(r6,r4)𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠subscript𝑟6subscript𝑟4conflicts(r_{6},r_{4})italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_s ( italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT )
DL2subscriptDL2\text{DL}_{\text{2}}DL start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT refines(r3,r2)𝑟𝑒𝑓𝑖𝑛𝑒𝑠subscript𝑟3subscript𝑟2refines(r_{3},r_{2})italic_r italic_e italic_f italic_i italic_n italic_e italic_s ( italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) RL2subscriptRL2\text{RL}_{\text{2}}RL start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT requires(r1,r5)𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠subscript𝑟1subscript𝑟5requires(r_{1},r_{5})italic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_s ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT )
DL3subscriptDL3\text{DL}_{\text{3}}DL start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT refines(r2,r4)𝑟𝑒𝑓𝑖𝑛𝑒𝑠subscript𝑟2subscript𝑟4refines(r_{2},r_{4})italic_r italic_e italic_f italic_i italic_n italic_e italic_s ( italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) RL3subscriptRL3\text{RL}_{\text{3}}RL start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT conflicts(r6,r1)𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠subscript𝑟6subscript𝑟1conflicts(r_{6},r_{1})italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_s ( italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )
DL4subscriptDL4\text{DL}_{\text{4}}DL start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT refines(r1,r4)𝑟𝑒𝑓𝑖𝑛𝑒𝑠subscript𝑟1subscript𝑟4refines(r_{1},r_{4})italic_r italic_e italic_f italic_i italic_n italic_e italic_s ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) RL4subscriptRL4\text{RL}_{\text{4}}RL start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT requires(r2,r5)𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠subscript𝑟2subscript𝑟5requires(r_{2},r_{5})italic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_s ( italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT )
DL5subscriptDL5\text{DL}_{\text{5}}DL start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT requires(r4,r5)𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠subscript𝑟4subscript𝑟5requires(r_{4},r_{5})italic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_s ( italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ) RL5subscriptRL5\text{RL}_{\text{5}}RL start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT conflicts(r2,r6)𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠subscript𝑟2subscript𝑟6conflicts(r_{2},r_{6})italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_s ( italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT )

For example, the DL formulae obtained by parsing sentences r5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT and r6subscript𝑟6r_{6}italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT conflict with each other because the underlying ontology to which these axioms are added specifies that concepts “hydraulic area” and “fuel tank” are disjoint. Similarly, the axiom obtained for the sentence r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT refines the axiom obtained for r4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT because the ontology specifies that “Bracket” is a sub concept of “Adhesive bonded bracket”. In Fig. 3, Table I is represented as a digraph model in which the nodes represent trace-locations, i.e. SIDP rules listed in the table and edges represents traces. A red edge specifically corresponds to the trace inferred using semantic parsing and DL theorem proving. The black one is an example trace, refines(r3,r6)𝑟𝑒𝑓𝑖𝑛𝑒𝑠subscript𝑟3subscript𝑟6refines(r_{3},r_{6})italic_r italic_e italic_f italic_i italic_n italic_e italic_s ( italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ) created by the user manually.

r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTr1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTr5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTr3subscript𝑟3r_{3}italic_r start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTr6subscript𝑟6r_{6}italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPTrefines𝑟𝑒𝑓𝑖𝑛𝑒𝑠refinesitalic_r italic_e italic_f italic_i italic_n italic_e italic_srefines𝑟𝑒𝑓𝑖𝑛𝑒𝑠refinesitalic_r italic_e italic_f italic_i italic_n italic_e italic_srefines𝑟𝑒𝑓𝑖𝑛𝑒𝑠refinesitalic_r italic_e italic_f italic_i italic_n italic_e italic_srequires𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠requiresitalic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_srefines𝑟𝑒𝑓𝑖𝑛𝑒𝑠refinesitalic_r italic_e italic_f italic_i italic_n italic_e italic_sconflicts𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠conflictsitalic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_sconflicts𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠conflictsitalic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_srequires𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠requiresitalic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_sconflicts𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠conflictsitalic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_srequires𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠requiresitalic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_sconflicts𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠conflictsitalic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_s
Figure 3: Inferred Traces (red traces indicate reasoning using DL, blue indicates reasoning using RL, the black one indicates a manual trace)

Later, additional trace links are inferred using Relational Model Finding (RL lines in the Table I and dashed blue edges on Fig. 3). For instance, as part of the trace semantics of this use case, according to the axiom schema (3) formalized in Section III-C where a, b and c are artifact elements, if a refines, requires oder contains b, while b conflicts with c, then a also conflicts with c. In this way, ModelWriter  generates conflicts traces such that combination of conflicts(r5,r6)𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠subscript𝑟5subscript𝑟6conflicts(r_{5},r_{6})italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_s ( italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ) and requires(r4,r5)𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠subscript𝑟4subscript𝑟5requires(r_{4},r_{5})italic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_s ( italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ) makes conflicts(r6,r4)𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠subscript𝑟6subscript𝑟4conflicts(r_{6},r_{4})italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_s ( italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ); on the other hand, according to axiom schema (1) described in Section III-C, the combination of refines(r2,r4)𝑟𝑒𝑓𝑖𝑛𝑒𝑠subscript𝑟2subscript𝑟4refines(r_{2},r_{4})italic_r italic_e italic_f italic_i italic_n italic_e italic_s ( italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) and requires(r4,r5)𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠subscript𝑟4subscript𝑟5requires(r_{4},r_{5})italic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_s ( italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ) generates requires(r2,r5)𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠subscript𝑟2subscript𝑟5requires(r_{2},r_{5})italic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_s ( italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ) corresponding to the patterns shown in Fig. 4.

a𝑎aitalic_ab𝑏bitalic_bc𝑐citalic_ca𝑎aitalic_ab𝑏bitalic_bc𝑐citalic_crefines𝑟𝑒𝑓𝑖𝑛𝑒𝑠refinesitalic_r italic_e italic_f italic_i italic_n italic_e italic_srequires𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠requiresitalic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_srequires𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠requiresitalic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_sconflicts𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠conflictsitalic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_srequires𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠requiresitalic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_sconflicts𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠conflictsitalic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_s
Figure 4: Inferring “requires” with “refines” and inferring “conflicts

Finally, in this example, DL-based reasoning process inferred only one conflicts trace using the meaning of the sentences, i.e. r5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT conflicts with r6subscript𝑟6r_{6}italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT whereas the ModelWriter detects three more conflicts traces using the meaning of trace types by means of RL-based reasoning on top of DL-based reasoning. As a result, it can be seen that not only r5subscript𝑟5r_{5}italic_r start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT and r6subscript𝑟6r_{6}italic_r start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT but also r4subscript𝑟4r_{4}italic_r start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are inconsistent.

III Overview of the Approach

We now describe the four main modules making up the ModelWriter platform. Section III-A introduces the semantic parser, i.e., the module that converts text to Description Logic formulae. Section III-B explains how the Hermit reasoner can be used to detect refines, conflicts and equals trace links between text fragments (here, SIDPs). Section III-C shows how Alloy formalism [6] can be customized to axiomatize trace types and semantics. Finally, Section III-D explains how the KodKod model finder [7] is used to infer new trace links between SIDPs to detect the inconsistent SIDPs.

III-A Mapping Text to Description Logic Formulae

The semantic parser used in ModelWriter to convert text to DL formulae is described in details in [8]. In what follows, we briefly summarize its working and some evaluation results on a set of 960 SIDPs used for testing.

The ModelWriter semantic processing framework combines an automatically derived lexicon, a small hand-written grammar, a parsing algorithm to convert text to DL formulae and a generation algorithm to generate text from DL formulae. This framework is modular, robust and reversible. It is modular in that, different lexicons or grammars may be plugged to meet the requirements of the semantic application being considered. For instance, the lexicon (which relates words and concepts) could be built using a concept extraction tool, i.e. a text mining tool that extracts concepts from text (e.g., [9]). And the grammar could be replaced by a grammar describing the syntax of other document styles such as cooking recipes. It is robust in that, in the presence of unknown words, the parser can skip words and deliver a connected (partial) parse. And, it is reversible in that the same grammar and lexicon can be used both for parsing and for generation. Fig. 5 outlines our approach showing the interaction of various components.

ptInputSIDPsSemantic ParserptGrammar (Manual)Lexicon (Automatic)OWL AxiomsSurface RealiserptGeneratedSIDPsptBLEUScoringptFull/PartialParseptSyntaxValidation
Figure 5: Parsing and Generation of Airbus SIDPs.

The lexicon maps verbs and noun phrases to grammar rules and to complex and simple concepts respectively. Fig. 6 shows an illustrating example with a lexical entry on the left and the corresponding grammar unit on the right. During generation/parsing, the semantic literals listed in the lexicon (here, Use and useArg2inv) are used to instantiate the variables (here, A2 and Rel) in the semantic schema (here, L00{}_{0}start_FLOATSUBSCRIPT 0 end_FLOATSUBSCRIPT:subset(X,L11{}_{1}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT) L22{}_{2}start_FLOATSUBSCRIPT 2 end_FLOATSUBSCRIPT:exists(A2,L33{}_{3}start_FLOATSUBSCRIPT 3 end_FLOATSUBSCRIPT) L33{}_{3}start_FLOATSUBSCRIPT 3 end_FLOATSUBSCRIPT:Rel(Y)). Similarly, the Anchor value (used) is used to label the terminal node marked with the anchor sign (\diamond) and each coanchor is used to label the terminal node with corresponding name. Thus, the strings shall and be will be used to label the terminal nodes V1𝑉1V1italic_V 1 and V2𝑉2V2italic_V 2 respectively. Importantly, this separation between grammar and lexicon supports modularity in that e.g., different lexicons and/or grammars could be plugged into the system. For the work presented here, we built this lexicon by applying regular expressions and a customised NP chunker (the NLTK regular expression chunker) to extract verbal and nominal lexical entries from SIDPs.

Refer to caption
Figure 6: Example Lexical Entry and Grammar Unit

The grammar provides a declarative specification of how text relates to meaning (as represented by OWL DL [10] formulae). We use a Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG) [11] augmented with a unification-based flat semantics. Fig. 7 shows an example FB-LTAG for the words “not”, “pipes” and “shall be used”. An FB-LTAG tree is a set of initial and auxiliary trees which have been lexicalised using the lexicon and can be combined using either substitution or adjunction. Auxiliary trees are trees such as the tree for “not” which contains a foot node (marked with *) whose category (here AUX) matches that of the root node. Initial trees are trees such as that of “pipes” and “shall be used” whose terminal nodes may be substitution nodes (marked with \downarrow). Substitution inserts a tree with root category C𝐶Citalic_C into a substitution node of the same category. For instance, the tree for “pipes” may be substituted in the NPY𝑁superscriptsubscript𝑃𝑌NP_{\downarrow}^{Y}italic_N italic_P start_POSTSUBSCRIPT ↓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Y end_POSTSUPERSCRIPT node of th “shall be used” tree. Adjunction inserts an auxiliary tree with foot node category C𝐶Citalic_C into a tree at a node of category C𝐶Citalic_C. For instance, the tree for “not” may be adjoined into the tree for “shall be used” at the AUX node.

Refer to caption
Figure 7: Example FB-LTAG with Unification-Based Semantics. The variables decorating the tree nodes (e.g., X𝑋Xitalic_X) abbreviate feature structures of the form [idx:X]delimited-[]:𝑖𝑑𝑥𝑋[idx:X][ italic_i italic_d italic_x : italic_X ] where X𝑋Xitalic_X is a unification variable.

The parser and the generator exploit the grammar and the lexicon to map natural language to OWL DL formulae (semantic parsing) and OWL DL formulae to natural language (generation) respectively. For instance, given the sentence “Pipes shall not be used”, the parser will first select the grammar trees associated with “Pipes”, “shall be used” and “not” and then combines these trees using substitution and adjunction. As shown in Fig. 8, the semantics derived for the input sentence is then the union of the semantics of these trees modulo unification. Conversely, given the flat semantics shown in the figure the generator will generate the sentence “Pipes shall not be used” by first, selecting grammar trees whose semantics subsumes the input and then combining them using substitution and operation. The generated sentences are given by the yield of the derived trees whose root is of category S (sentence) and whose semantics is exactly the input semantics.

Refer to caption
Figure 8: Derived Tree. The flat semantics representation produced by the grammar is equivalent to the Description Logic Formula shown.

While the grammar integrates a so-called flat semantics, as shown in Fig. 9, there is a direct translation from this semantics to OWL functional syntax. Further details about Semantic Parser can be found at:

τ(ϕ)={ObjectSomeValuesFrom(:R τ(C))if ϕ=li:exists(R,lj)lj:CSubClassOf(τ(C1)τ(C2))if ϕ=li:subset(lj,lk)lj:C1lk:C2ObjectIntersectionOf(τ(C1)τ(C2))if ϕ=li:and(lj,lk)lj:C1lk:C2(τ(C1)τ(C2))if ϕ=li:and(lj,lk)lj:C1lk:C2(τ(C1)τ(C2))if ϕ=li:or(lj,lk)lj:C1lk:C2not(τ(C))if ϕ=li:not(lj)lj:CRif ϕ=RinvCif ϕ=li:C(x)𝜏italic-ϕcasesObjectSomeValuesFrom:R 𝜏𝐶:if italic-ϕsubscript𝑙𝑖𝑒𝑥𝑖𝑠𝑡𝑠𝑅subscript𝑙𝑗subscript𝑙𝑗:𝐶SubClassOf𝜏subscript𝐶1𝜏subscript𝐶2:if italic-ϕsubscript𝑙𝑖𝑠𝑢𝑏𝑠𝑒𝑡subscript𝑙𝑗subscript𝑙𝑘subscript𝑙𝑗:subscript𝐶1subscript𝑙𝑘:subscript𝐶2ObjectIntersectionOf𝜏subscript𝐶1𝜏subscript𝐶2:if italic-ϕsubscript𝑙𝑖𝑎𝑛𝑑subscript𝑙𝑗subscript𝑙𝑘subscript𝑙𝑗:subscript𝐶1subscript𝑙𝑘:subscript𝐶2square-intersection𝜏𝐶1𝜏𝐶2:if italic-ϕsubscript𝑙𝑖𝑎𝑛𝑑subscript𝑙𝑗subscript𝑙𝑘subscript𝑙𝑗:𝐶1subscript𝑙𝑘:𝐶2square-union𝜏𝐶1𝜏𝐶2:if italic-ϕsubscript𝑙𝑖𝑜𝑟subscript𝑙𝑗subscript𝑙𝑘subscript𝑙𝑗:𝐶1subscript𝑙𝑘:𝐶2not𝜏𝐶:if italic-ϕsubscript𝑙𝑖𝑛𝑜𝑡subscript𝑙𝑗subscript𝑙𝑗:𝐶Rif italic-ϕ𝑅𝑖𝑛𝑣C:if italic-ϕsubscript𝑙𝑖𝐶𝑥\tau(\phi)=\begin{cases}\mbox{ObjectSomeValuesFrom}(\mbox{:R }\tau(C))&\text{% if }\phi=l_{i}:exists(R,l_{j})\;\;l_{j}:C\\ \mbox{SubClassOf}(\tau(C_{1})\;\tau(C_{2}))&\text{if }\phi=l_{i}:subset(l_{j},% l_{k})\;\;l_{j}:C_{1}\;\;l_{k}:C_{2}\\ \mbox{ObjectIntersectionOf}(\tau(C_{1})\;\tau(C_{2}))&\text{if }\phi=l_{i}:and% (l_{j},l_{k})\;\;l_{j}:C_{1}\;\;l_{k}:C_{2}\\ (\tau(C1)\sqcap\tau(C2))&\text{if }\phi=l_{i}:and(l_{j},l_{k})\;\;l_{j}:C1\;\;% l_{k}:C2\\ (\tau(C1)\sqcup\tau(C2))&\text{if }\phi=l_{i}:or(l_{j},l_{k})\;\;l_{j}:C1\;\;l% _{k}:C2\\ \mbox{not}(\tau(C))&\text{if }\phi=l_{i}:not(l_{j})\;\;l_{j}:C\\ \mbox{R${}^{-}$}&\text{if }\phi=Rinv\\ \mbox{C}&\text{if }\phi=l_{i}:C(x)\\ \end{cases}italic_τ ( italic_ϕ ) = { start_ROW start_CELL ObjectSomeValuesFrom ( :R italic_τ ( italic_C ) ) end_CELL start_CELL if italic_ϕ = italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_e italic_x italic_i italic_s italic_t italic_s ( italic_R , italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : italic_C end_CELL end_ROW start_ROW start_CELL SubClassOf ( italic_τ ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_τ ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) end_CELL start_CELL if italic_ϕ = italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_s italic_u italic_b italic_s italic_e italic_t ( italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ObjectIntersectionOf ( italic_τ ( italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_τ ( italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) end_CELL start_CELL if italic_ϕ = italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_a italic_n italic_d ( italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ( italic_τ ( italic_C 1 ) ⊓ italic_τ ( italic_C 2 ) ) end_CELL start_CELL if italic_ϕ = italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_a italic_n italic_d ( italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : italic_C 1 italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : italic_C 2 end_CELL end_ROW start_ROW start_CELL ( italic_τ ( italic_C 1 ) ⊔ italic_τ ( italic_C 2 ) ) end_CELL start_CELL if italic_ϕ = italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_o italic_r ( italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : italic_C 1 italic_l start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : italic_C 2 end_CELL end_ROW start_ROW start_CELL not ( italic_τ ( italic_C ) ) end_CELL start_CELL if italic_ϕ = italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_n italic_o italic_t ( italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : italic_C end_CELL end_ROW start_ROW start_CELL R start_FLOATSUPERSCRIPT - end_FLOATSUPERSCRIPT end_CELL start_CELL if italic_ϕ = italic_R italic_i italic_n italic_v end_CELL end_ROW start_ROW start_CELL C end_CELL start_CELL if italic_ϕ = italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_C ( italic_x ) end_CELL end_ROW
Figure 9: Mapping Flat Semantics to Owl Functional Syntax

III-B Inferring Traces using DL Theorem Proving

We use Hermit theorem prover to detect inconsistencies, entailment and equivalence between two SIDPs s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Given the DL formulae ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT associated by the semantic parsing process to s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we determine these relations as follows: (i) if ϕ1ϕ2square-intersectionsubscriptitalic-ϕ1subscriptitalic-ϕ2\phi_{1}\sqcap\phi_{2}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊓ italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is not satisfiable, we infer a conflicts trace between s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, (ii) if ¬ϕ1ϕ2square-unionsubscriptitalic-ϕ1subscriptitalic-ϕ2\neg\phi_{1}\sqcup\phi_{2}¬ italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊔ italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is satisfiable, we infer a requires trace between s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and (iii) if ϕ1ϕ2subscriptitalic-ϕ1subscriptitalic-ϕ2\phi_{1}\equiv\phi_{2}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≡ italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is satisfiable, we infer an equals trace between s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

III-C Formal Semantics of Trace-types

Tarski is the module of ModelWriter approach for automated reasoning about traces based on configurable trace semantics, recently described in [12]. The tool provides an enhanced text editor to allow users to define new trace types in a restricted form of Alloy [6], i.e., First-Order Relational Logic.

In the following, we axiomatize trace semantics based on the informal definition explained in Section II using First-order Predicate Logic with the signature:

ΣT:{=,}ΣT1ΣT2:subscriptΣ𝑇subscriptsuperscriptΣ1𝑇subscriptsuperscriptΣ2𝑇\displaystyle\Sigma_{T}:\{=,\in\}\cup\Sigma^{1}_{T}\cup\Sigma^{2}_{T}roman_Σ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : { = , ∈ } ∪ roman_Σ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∪ roman_Σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT
ΣT1:{Artifact,Requirement,Specification}:subscriptsuperscriptΣ1𝑇𝐴𝑟𝑡𝑖𝑓𝑎𝑐𝑡𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑚𝑒𝑛𝑡𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛\displaystyle\Sigma^{1}_{T}:\{Artifact,Requirement,Specification\}roman_Σ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : { italic_A italic_r italic_t italic_i italic_f italic_a italic_c italic_t , italic_R italic_e italic_q italic_u italic_i italic_r italic_e italic_m italic_e italic_n italic_t , italic_S italic_p italic_e italic_c italic_i italic_f italic_i italic_c italic_a italic_t italic_i italic_o italic_n }
ΣT2:{requires,refines,contains,equals,conflicts}:subscriptsuperscriptΣ2𝑇𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑠𝑟𝑒𝑓𝑖𝑛𝑒𝑠𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠𝑒𝑞𝑢𝑎𝑙𝑠𝑐𝑜𝑛𝑓𝑙𝑖𝑐𝑡𝑠\displaystyle\Sigma^{2}_{T}:\{requires,refines,contains,equals,conflicts\}roman_Σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT : { italic_r italic_e italic_q italic_u italic_i italic_r italic_e italic_s , italic_r italic_e italic_f italic_i italic_n italic_e italic_s , italic_c italic_o italic_n italic_t italic_a italic_i italic_n italic_s , italic_e italic_q italic_u italic_a italic_l italic_s , italic_c italic_o italic_n italic_f italic_l italic_i italic_c italic_t italic_s }

ΣT1subscriptsuperscriptΣ1𝑇\Sigma^{1}_{T}roman_Σ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is the set of unary predicate symbols and ΣT2subscriptsuperscriptΣ2𝑇\Sigma^{2}_{T}roman_Σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is the set of binary predicate symbols. For simplicity, we assume that the universe only consists of the type, Artifact which is partitioned into disjoint subsets of Requirement and Specification. From now on, A𝐴Aitalic_A represents the set of Artifacts. === and \in symbols are interpreted and represent equality and membership respectively. In the following several axiom schemas are listed to formalize Traceability Theory, that is used in the SIDP case.

Reasoning about requires traces is stated as follows:

a,b,cA(a,b)(b,c)require(a,c)requireprovesabsentfor-all𝑎𝑏𝑐conditional𝐴𝑎𝑏𝑏𝑐require𝑎𝑐require\displaystyle\vdash\forall a,b,c\in A\mid(a,b)\in\operatorname{\mathrel{% \square}}\wedge\,(b,c)\in\operatorname{\mathrel{require}}\to(a,c)\in% \operatorname{\mathrel{require}}⊢ ∀ italic_a , italic_b , italic_c ∈ italic_A ∣ ( italic_a , italic_b ) ∈ □ ∧ ( italic_b , italic_c ) ∈ roman_require → ( italic_a , italic_c ) ∈ roman_require (1)
a,b,cA(a,b)require(b,c)(a,c)requireprovesabsentfor-all𝑎𝑏𝑐conditional𝐴𝑎𝑏require𝑏𝑐𝑎𝑐require\displaystyle\vdash\forall a,b,c\in A\mid(a,b)\in\operatorname{\mathrel{% require}}\wedge\,(b,c)\in\operatorname{\mathrel{\square}}\to(a,c)\in% \operatorname{\mathrel{require}}⊢ ∀ italic_a , italic_b , italic_c ∈ italic_A ∣ ( italic_a , italic_b ) ∈ roman_require ∧ ( italic_b , italic_c ) ∈ □ → ( italic_a , italic_c ) ∈ roman_require (2)
 where {requires,refines,contains} where requiresrefinescontains\displaystyle\mbox{\quad where \>}\,\square\in\{\operatorname{\mathrel{% requires}},\operatorname{\mathrel{refines}},\operatorname{\mathrel{contains}}\}where □ ∈ { roman_requires , roman_refines , roman_contains }

The following axiom schema is being used for generating conflicts traces.

a,b,cA(a,b)(b,c)(a,c)provesabsentfor-all𝑎𝑏𝑐conditional𝐴𝑎𝑏𝑏𝑐𝑎𝑐\displaystyle\vdash\forall a,b,c\in A\mid(a,b)\in\square\wedge(b,c)\in% \triangle\to(a,c)\in\triangle⊢ ∀ italic_a , italic_b , italic_c ∈ italic_A ∣ ( italic_a , italic_b ) ∈ □ ∧ ( italic_b , italic_c ) ∈ △ → ( italic_a , italic_c ) ∈ △ (3)
aA(a,a)provesabsentfor-all𝑎conditional𝐴𝑎𝑎\displaystyle\vdash\forall a\in A\mid(a,a)\in\triangle⊢ ∀ italic_a ∈ italic_A ∣ ( italic_a , italic_a ) ∈ △ (4)
 where {requires,refines,contains} and =conflicts where requiresrefinescontains and conflicts\displaystyle\mbox{\quad where \>}\square\in\{\operatorname{\mathrel{requires}% },\operatorname{\mathrel{refines}},\operatorname{\mathrel{contains}}\}\mbox{ % and \>}\triangle=\operatorname{\mathrel{conflicts}}where □ ∈ { roman_requires , roman_refines , roman_contains } and △ = roman_conflicts

Reasoning about equals traces:

a,b,cA(a,b)equals(b,c)(a,c)provesabsentfor-all𝑎𝑏𝑐conditional𝐴𝑎𝑏equals𝑏𝑐𝑎𝑐\displaystyle\vdash\forall a,b,c\in A\mid(a,b)\in\operatorname{\mathrel{equals% }}\wedge\,(b,c)\in\square\to(a,c)\in\square⊢ ∀ italic_a , italic_b , italic_c ∈ italic_A ∣ ( italic_a , italic_b ) ∈ roman_equals ∧ ( italic_b , italic_c ) ∈ □ → ( italic_a , italic_c ) ∈ □ (5)
a,b,cA(a,b)equals(c,b)(c,a)provesabsentfor-all𝑎𝑏𝑐conditional𝐴𝑎𝑏equals𝑐𝑏𝑐𝑎\displaystyle\vdash\forall a,b,c\in A\mid(a,b)\in\operatorname{\mathrel{equals% }}\wedge\,(c,b)\in\square\to(c,a)\in\square⊢ ∀ italic_a , italic_b , italic_c ∈ italic_A ∣ ( italic_a , italic_b ) ∈ roman_equals ∧ ( italic_c , italic_b ) ∈ □ → ( italic_c , italic_a ) ∈ □ (6)
aA(a,a)equalsprovesabsentfor-all𝑎conditional𝐴𝑎𝑎equals\displaystyle\vdash\forall a\in A\mid(a,a)\in\operatorname{\mathrel{equals}}⊢ ∀ italic_a ∈ italic_A ∣ ( italic_a , italic_a ) ∈ roman_equals (7)
where {contains,requires,refines,conflicts}where containsrequiresrefinesconflicts\displaystyle\mbox{where \>}\square\in\{\operatorname{\mathrel{contains}},% \operatorname{\mathrel{requires}},\operatorname{\mathrel{refines}},% \operatorname{\mathrel{conflicts}}\}where □ ∈ { roman_contains , roman_requires , roman_refines , roman_conflicts }

In the following axiom schema, transitivity (8) is used for reasoning new traces, whereas anti-symmetry (9) and irreflexivity (10) are used to check consistency.

a,b,cA(a,b)(b,c)(a,c),provesabsentfor-all𝑎𝑏𝑐conditional𝐴𝑎𝑏𝑏𝑐𝑎𝑐\displaystyle\vdash\forall a,b,c\in A\mid(a,b)\in\square\wedge(b,c)\in\square% \to(a,c)\in\square,⊢ ∀ italic_a , italic_b , italic_c ∈ italic_A ∣ ( italic_a , italic_b ) ∈ □ ∧ ( italic_b , italic_c ) ∈ □ → ( italic_a , italic_c ) ∈ □ , (8)
a,bA(a,b)(b,a)a=b,provesabsentfor-all𝑎𝑏conditional𝐴𝑎𝑏𝑏𝑎𝑎𝑏\displaystyle\vdash\forall a,b\in A\mid(a,b)\in\square\wedge(b,a)\in\square\to a% =b,⊢ ∀ italic_a , italic_b ∈ italic_A ∣ ( italic_a , italic_b ) ∈ □ ∧ ( italic_b , italic_a ) ∈ □ → italic_a = italic_b , (9)
aA(a,a),provesabsentfor-all𝑎conditional𝐴𝑎𝑎\displaystyle\vdash\forall a\in A\mid(a,a)\notin\square,⊢ ∀ italic_a ∈ italic_A ∣ ( italic_a , italic_a ) ∉ □ , (10)
where {contains,requires,refines}where containsrequiresrefines\displaystyle\mbox{where \>}\square\in\{\operatorname{\mathrel{contains}},% \operatorname{\mathrel{requires}},\operatorname{\mathrel{refines}}\}where □ ∈ { roman_contains , roman_requires , roman_refines }

contains traces is left-unique (injective relation) in some scenarios that induces an inconsistency when transitivity axiom (8) for contains is instantiated in the specification.

a,a,bA(a,b)(a,b)a=aprovesabsentfor-all𝑎superscript𝑎𝑏conditional𝐴𝑎𝑏superscript𝑎𝑏𝑎superscript𝑎\displaystyle\vdash\forall a,a^{\prime},b\in A\mid(a,b)\in\square\wedge(a^{% \prime},b)\in\square\to a=a^{\prime}⊢ ∀ italic_a , italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_b ∈ italic_A ∣ ( italic_a , italic_b ) ∈ □ ∧ ( italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_b ) ∈ □ → italic_a = italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (11)
 where =contains where contains\displaystyle\mbox{\quad where \>}\,\square=\operatorname{\mathrel{contains}}where □ = roman_contains

We encode above axioms in First-order Relational Logic using the Tarski’s text editor to configure the Tarski module (see Figure 10).

Refer to caption
Figure 10: Some Example Trace Types and Trace Semantics in Tarski

III-D Inferring Trace Links using Model Finding

We employ Kodkod [13, 7], an efficient SAT-based constraint solver for FOL with relational algebra and partial models, for automated trace reasoning using the trace semantics that user provides. Once the user performs reasoning operations about traces, the result is reported back to the user by dashed traces as shown in Fig. 11. If there exists different solutions, the user can traverse them back and forth. He can also accept the inferred traces, and perform another analysis operation including inferred traces. Further details about Tarski can be found at:

Refer to caption
Figure 11: Inferred Relations based on the current snapshot

IV Evaluation

We evaluate Semantic Parsing approach of ModelWriter on a dataset of 960 SIDPs provided by Airbus which demonstrate (i) that the approach is robust (97.50% of the SIDPs can be parsed) and (ii) that DL axioms assigned to full parses are very likely to be correct in 96% of the cases. Regarding inference phase, since we observed that DL-based reasoning is relatively faster than the SAT-based reasoning in the context of SIDP case, we only focus on the Tarski module to evaluate the performance of ModelWriter approach. Table II shows the solving results of three configurations of the formal trace specification running with Alloy Analyzer [6], KodKod[7] and Z3[14]. Minisat[15] SAT Solver is chosen for both Alloy (alloy4.2-2015-02-22.jar) and KodKod (Kodkod 2.1) solvers. From Alloy to SMT solver translation for these cases, we employ the translation method proposed by El Ghazi et.al. [16] and the problems are encoded in SMT-LIB [17] syntax which is fed into Z3 solver. Transitive closure and integer arithmetic are not used in these use cases to fairly benchmark the results with the SMT solver. In SMT-LIB, the logic is set for Equality Logic with Uninterpreted Functions (UF).

Table II: Comparisons of Several Use Cases for Trace Inferring
Artifacts Traces Inferred Alloy KodKod Z3
#1#1\#1# 1 123 102 89 67922 25668 40900
#2#2\#2# 2 56 27 25 4428 84 480
#3#3\#3# 3 42 103 75 724 1 1460

Evaluation results are obtained on a machine, that runs 64 bit debian linux operating system with 8 GB of memory and 2.90GHz Intel i7-3520M CPU. Solving times are indicated in milliseconds. The best results are obtained by the direct use of KodKod API since to find satisfiable models, KodKod allows us to configure lower and upper bounds for the solution space employing different pre-processing techniques such as slicing, incremental upper bounds and unrolling transitive closures. The evaluation shows that our tool is practical and beneficial in industrial settings to specify trace semantics for automated trace reasoning. We plan to conduct more case studies to better evaluate the practical utility and usability of the platform.

V Related Work

Many existing works on semantic parsing describe the task of obtaining axiomatic representation of natural language sentences. However, they suffer from two main limitations: (i) use of controlled languages such as Attempto Controlled English[18] (e.g. [19, 20]) and/or (ii) inability to deduce complex axioms involving logical connectives, role restrictions and other expressive features of OWL (e.g. [21, 22]), as noted in [23]. In contrast, we work on human authored real-world text (Airbus SIDPs) and produce complex OWL axioms involving the following DL constructs: top\top (the most general concept), disjunction, conjunction, negation, role inverse, universal and existential restrictions. Moreover, we extended the scope of our application by deducing traces among the semantic parse outputs. Such traces were then used as baseline input to Tarski platform which could infer additional traces propagating over the whole system.

Similarly, several approaches and tools have been proposed for automated trace reasoning using the trace semantics [24, 25, 26, 27, 28, 29, 30, 31]. These approaches employ a predefined set of trace types and their corresponding semantics. For instance, Goknil et al. [29] provide a tool for inferencing and consistency checking of traces between requirements using a set of trace types and their formal semantics. Similarly, Egyed and Grünbacher [25] propose a trace generation approach. They do not allow the user to introduce new trace types and their semantics for automated reasoning. In the development of complex systems, it is required to enable the adoption of various trace types, and herewith automated reasoning using their semantics. Tarski module of ModelWriter allows the user to interactively define new trace types with their semantics to be used in automated reasoning about traces.

VI Conclusion

We presented an integrated platform for automatically mapping natural language text to trace types and performing further inferencing on those traces. Starting with the semantic parser module, we showed how complex axioms could be derived to represent text coming from real world use case. We identified the traces among the parse outputs and fed it to the Tarski tool. The Tarski tool, in turn, allowed users to specify configurable trace semantics for various forms of automated trace reasoning such as inferencing and consistency checking. The key characteristics of our tool are (1) automatic identification of traces existing in texts using semantic parsing (2) allowing user to define new trace types and their semantics which can be later configured, (3) deducing new traces based on the traces which the user has already specified, and (4) identifying traces whose existence causes a contradiction.

Acknowledgment

This work is conducted within ModelWriter project[32] labeled by the European Union’s EUREKA Cluster programme ITEA and partially supported by the Scientific and Technological Research Council of Turkey under project #9140014, #9150181 and Industry and Digital Affairs of France, Directorate-General for Enterprise under contract #142930204. The authors would like to acknowledge networking support by European Cooperation in Science and Technology Action IC1404 ”Multi-Paradigm Modelling for Cyber-Physical Systems”.

References

  • [1] B. Ramesh and M. Jarke, “Toward reference models for requirements traceability,” IEEE Transactions on Software Engineering, vol. 27, no. 1, pp. 58–93, 2001.
  • [2] I. C. Society, P. Bourque, and R. E. Fairley, Guide to the Software Engineering Body of Knowledge (SWEBOK(R)): Version 3.0, 3rd ed.   Los Alamitos, CA, USA: IEEE Computer Society Press, 2014.
  • [3] RTCA and EUROCAE, “DO-178C: Software considerations in airborne systems and equipment certification,” 2017.
  • [4] ISO, “ISO-26262: Road vehicles – functional safety,” 2017.
  • [5] R. Shearer, B. Motik, and I. Horrocks, “Hermit: A highly-efficient owl reasoner.” in 5th OWL Experienced and Directions Workshop, vol. 432, 2008, p. 91.
  • [6] D. Jackson, Software Abstractions: Logic, Language, and Analysis.   MIT press, 2012.
  • [7] E. Torlak, “A constraint solver for software engineering: Finding models and cores of large relational specifications,” Ph.D. dissertation, Massachusetts Institute of Technology, 2008.
  • [8] B. Gyawali, A. Shimorina, C. Gardent, S. Cruz-Lara, and M. Mahfoudh, “Mapping natural language to description logic,” in The Semantic Web: 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28 – June 1, 2017, Proceedings, Part I, E. Blomqvist, D. Maynard, A. Gangemi, R. Hoekstra, P. Hitzler, and O. Hartig, Eds.   Cham: Springer International Publishing, 2017, pp. 273–288. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-58068-5_17
  • [9] E. Bozsak, M. Ehrig, S. Handschuh, A. Hotho, A. Maedche, B. Motik, D. Oberle, C. Schmitz, S. Staab, L. Stojanovic et al., “KAON – towards a large scale Semantic Web,” in E-Commerce and Web Technologies.   Springer, 2002, pp. 304–313.
  • [10] D. L. McGuinness, F. Van Harmelen et al., “OWL web ontology language overview,” W3C recommendation, vol. 10, no. 10, p. 2004, 2004.
  • [11] C. Gardent and L. Kallmeyer, “Semantic construction in feature-based TAG,” in Proceedings of EACL.   Association for Computational Linguistics, 2003, pp. 123–130.
  • [12] F. Erata, M. Challenger, B. Tekinerdogan, A. Monceaux, E. Tüzün, and G. Kardas, “Tarski: A platform for automated analysis of dynamically configurable traceability semantics,” in Proceedings of the Symposium on Applied Computing, ser. SAC ’17.   New York, NY, USA: ACM, 2017, pp. 1607–1614.
  • [13] E. Torlak and D. Jackson, “Kodkod: A relational model finder,” in the 13th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’07), 2007, pp. 632–647.
  • [14] L. D. Moura and N. Bjørner, “Z3: An efficient smt solver,” in International conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08), 2008, pp. 337–340.
  • [15] N. Eén and N. Sörensson, “An extensible sat-solver,” in the 6th International Conference on Theory and Applications of Satisfiability Testing (SAT’03), 2003, pp. 502–518.
  • [16] A. A. E. Ghazi and M. Taghdiri, “Relational reasoning via smt solving,” in the 17th International Conference on Formal Methods (FM’11), 2011, pp. 133–148.
  • [17] C. Barrett, A. Stump, C. Tinelli et al., “The smt-lib standard: Version 2.0,” in the 8th International Workshop on Satisfiability Modulo Theories, vol. 13, 2010, p. 14.
  • [18] K. Kaljurand and N. E. Fuchs, “Verbalizing OWL in Attempto Controlled English,” in OWLED, vol. 258, 2007.
  • [19] V. Tablan, T. Polajnar, H. Cunningham, and K. Bontcheva, “User-friendly ontology authoring using a controlled language,” in Proceedings of LREC, 2006.
  • [20] A. Bernstein, E. Kaufmann, A. Göhring, and C. Kiefer, “Querying ontologies: A controlled english interface for end-users,” in International Semantic Web Conference.   Springer, 2005, pp. 112–126.
  • [21] P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications.   IOS press, 2005, vol. 123.
  • [22] M. Ruiz-Casado, E. Alfonseca, and P. Castells, “Automatic Extraction of Semantic Relationships for Wordnet by Means of Pattern Learning from Wikipedia,” in International Conference on Application of Natural Language to Information Systems.   Springer, 2005, pp. 67–79.
  • [23] J. Völker, P. Hitzler, and P. Cimiano, “Acquisition of OWL DL axioms from lexical resources,” in European Semantic Web Conference.   Springer, 2007, pp. 670–685.
  • [24] N. Aizenbud-Reshef, R. F. Paige, J. Rubin, Y. Shaham-Gafni, and D. S. Kolovos, “Operational semantics for traceability,” in ECMDA Traceability Workshop (ECMDA-TW’05), 2005, pp. 8–14.
  • [25] A. Egyed and P. Grünbacher, “Supporting software understanding with automated requirements traceability,” International Journal of Software Engineering and Knowledge Engineering, vol. 15, no. 5, pp. 783–810, 2005.
  • [26] A. Egyed, “A scenario-driven approach to trace dependency analysis,” IEEE Transactions on Software Engineering, vol. 29, no. 2, pp. 116–132, 2003.
  • [27] J. Cleland-Huang, C. K. Chang, and M. J. Christensen, “Event-based traceability for managing evolutionary change,” IEEE Transactions on Software Engineering, vol. 29, no. 9, pp. 796–810, 2003.
  • [28] L. C. Lamb, W. Jirapanthong, and A. Zisman, “Formalizing traceability relations for product lines,” in the 6th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’11), 2011, pp. 42–45.
  • [29] A. Goknil, I. Kurtev, K. van den Berg, and J.-W. Veldhuis, “Semantics of trace relations in requirements models for consistency checking and inferencing,” Software and System Modeling, vol. 10, no. 1, pp. 31–54, 2011.
  • [30] A. Goknil, I. Kurtev, and K. V. D. Berg, “Generation and validation of traces between requirements and architecture based on formal trace semantics,” Journal of Systems and Software, vol. 88, pp. 112–137, 2014.
  • [31] N. Drivalos, D. S. Kolovos, R. F. Paige, and K. J. Fernandes, “Engineering a dsl for software traceability,” in 1st International Conference on Software Language Engineering (SLE’08), 2008, pp. 151–167.
  • [32] F. Erata, “ModelWriter: Text & model synchronized document engineering platform,” https://itea3.org/project/modelwriter.html, Sep 2014.

Appendix A Availability & Open Source License

This work is being developed under technical Work Package 2 (Semantic Parser - Text Part), Work Package 3 (Tarski - Model Part), and Work Package 4 (Federated Knowledge Base) within ModelWriter project, labeled by the European Union’s EUREKA Cluster programme ITEA (Information Technology for European Advancement). Further details about the project can be found at:

A video demonstration of ModelWriter is available shows the use of ModelWriter in the context of the industrial use case SIDP presented in the paper. The video is available at:

https://youtu.be/TcVCosW8HkU

The source codes files and datasets of ModelWriter are publicly available for download and use at the project repository. Tarski and SemanticParser are components of ModelWriter platform. Source codes, screencast and datasets regarding the project are also available and can be found at:

ModelWriter is distributed with an open source software license, namely Eclipse Public License v1. This commercially friendly copyleft license provides the ability to commercially license binaries; a modern royalty-free patent license grant; and the ability for linked works to use other licenses, including commercial ones.

Appendix B Tool Demonstration Plan

There will be four parts to our presentation: (1) motivation and industrial use cases, (2) overview of the approach and tool architecture, (3) demonstration walktrough, and (4) evaluation. Parts 1, 2 and 4 are presented using slides while Part 3 is presented as a demo using the industrial use case scenario described in Section II. To present these parts, we use a combination of slides, animations, and a live demo. In the following subsections, we provide further details about our presentation plan.

B-A Motivation & Challenges

B-A1 Motivation

We will emphasize the importance of traceability by introducing ”DO-178C Software Considerations in Airborne Systems and Equipment Certification” [3] from aviation industry.

B-A2 Industrial Use Cases

We will briefly describe the challenges of Traceability Analysis Activities faced in industry by introducing industrial use cases from Airbus. We will explain the importance of semantically meaningful traceability, traceability configuration and automated traceability analysis in industry.

B-B Tool Overview

B-B1 Overview of the Solution.

We will explain the approach and the user workflow of ModelWriter by following the steps of Section III.

B-B2 ModelWriter Features

We will briefly explain tool features such as semantic parser and its reasoning engine and configurable automated traceability analysis using animated slides by giving concrete examples from the industrial use case, namely System Installation Design Principles (SIDP) presented in the paper.

B-C Walk-trough of the Tool Demonstration

In this section, we will perform a live demonstration aligns with the industrial use case SIDP, which is illustrated in Section II.

B-D Evaluation and Lessons Learned

We conclude with a summary that presents the evaluation results and the lessons learned.