-
The maximum four point condition matrix of a tree
Authors:
Ali Azimi,
Rakesh Jana,
Mukesh Kumar Nagar,
Sivaramakrishnan Sivasubramanian
Abstract:
$\newcommand{\Max}{\mathrm{Max4PC}}$ The Four point condition (4PC henceforth) is a well known condition characterising distances in trees $T$. Let $w,x,y,z$ be four vertices in $T$ and let $d_{x,y}$ denote the distance between vertices $x,y$ in $T$. The 4PC condition says that among the three terms $d_{w,x} + d_{y,z}$, $d_{w,y} + d_{x,z}$ and $d_{w,z} + d_{x,y}…
▽ More
$\newcommand{\Max}{\mathrm{Max4PC}}$ The Four point condition (4PC henceforth) is a well known condition characterising distances in trees $T$. Let $w,x,y,z$ be four vertices in $T$ and let $d_{x,y}$ denote the distance between vertices $x,y$ in $T$. The 4PC condition says that among the three terms $d_{w,x} + d_{y,z}$, $d_{w,y} + d_{x,z}$ and $d_{w,z} + d_{x,y}$ the maximum value equals the second maximum value.
We define an $\binom{n}{2} \times \binom{n}{2}$ sized matrix $\Max_T$ from a tree $T$ where the rows and columns are indexed by size-2 subsets. The entry of $\Max_T$ corresponding to the row indexed by $\{w,x\}$ and column $\{y,z\}$ is the maximum value among the three terms $d_{w,x} + d_{y,z}$, $d_{w,y} + d_{x,z}$ and $d_{w,z} + d_{x,y}$. In this work, we determine basic properties of this matrix like rank, give an algorithm that outputs a family of bases, and find the determinant of $\Max_T$ when restricted to our basis. We further determine the inertia and the Smith Normal Form (SNF) of $\Max_T$.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Automating and Mechanizing Cutoff-based Verification of Distributed Protocols
Authors:
Shreesha G. Bhat,
Kartik Nagar
Abstract:
Distributed protocols are generally parametric and can be executed on a system with any number of nodes, and hence proving their correctness becomes an infinite state verification problem. The most popular approach for verifying distributed protocols is to find an inductive invariant which is strong enough to prove the required safety property. However, finding inductive invariants is known to be…
▽ More
Distributed protocols are generally parametric and can be executed on a system with any number of nodes, and hence proving their correctness becomes an infinite state verification problem. The most popular approach for verifying distributed protocols is to find an inductive invariant which is strong enough to prove the required safety property. However, finding inductive invariants is known to be notoriously hard, and is especially harder in the context of distributed protocols which are quite complex due to their asynchronous nature. In this work, we investigate an orthogonal cut-off based approach to verifying distributed protocols which sidesteps the problem of finding an inductive invariant, and instead reduces checking correctness to a finite state verification problem. The main idea is to find a finite, fixed protocol instance called the cutoff instance, such that if the cutoff instance is safe, then any protocol instance would also be safe. Previous cutoff based approaches have only been applied to a restricted class of protocols and specifications. We formalize the cutoff approach in the context of a general protocol modeling language (RML), and identify sufficient conditions which can be efficiently encoded in SMT to check whether a given protocol instance is a cutoff instance. Further, we propose a simple static analysis-based algorithm to automatically synthesize a cut-off instance. We have applied our approach successfully on a number of complex distributed protocols, providing the first known cut-off results for many of them.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Certified Mergeable Replicated Data Types
Authors:
Vimala Soundarapandian,
Adharsh Kamath,
Kartik Nagar,
KC Sivaramakrishnan
Abstract:
Replicated data types (RDTs) are data structures that permit concurrent modification of multiple, potentially geo-distributed, replicas without coordination between them. RDTs are designed in such a way that conflicting operations are eventually deterministically reconciled ensuring convergence. Constructing correct RDTs remains a difficult endeavour due to the complexity of reasoning about indepe…
▽ More
Replicated data types (RDTs) are data structures that permit concurrent modification of multiple, potentially geo-distributed, replicas without coordination between them. RDTs are designed in such a way that conflicting operations are eventually deterministically reconciled ensuring convergence. Constructing correct RDTs remains a difficult endeavour due to the complexity of reasoning about independently evolving states of the replicas. With the focus on the correctness of RDTs (and rightly so), existing approaches to RDTs are less efficient compared to their sequential counterparts in terms of time and space complexity of local operations. This is unfortunate since RDTs are often used in a local-first setting where the local operations far outweigh remote communication.
In this paper, we present Peepul, a pragmatic approach to building and verifying efficient RDTs. To make reasoning about correctness easier, we cast RDTs in the mould of a distributed version control system, and equip it with a three-way merge function for reconciling conflicting versions. Further, we go beyond just verifying convergence, and provide a methodology to verify arbitrarily complex specifications. We develop a replication-aware simulation relation to relate RDT specifications to their efficient purely functional implementations. We implement Peepul as an F* library that discharges proof obligations to an SMT solver. The verified efficient RDTs are extracted as OCaml code and used in Irmin, a Git-like distributed database.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Repairing Serializability Bugs in Distributed Database Programs via Automated Schema Refactoring
Authors:
Kia Rahmani,
Kartik Nagar,
Benjamin Delaware,
Suresh Jagannathan
Abstract:
Serializability is a well-understood concurrency control mechanism that eases reasoning about highly-concurrent database programs. Unfortunately, enforcing serializability has a high-performance cost, especially on geographically distributed database clusters. Consequently, many databases allow programmers to choose when a transaction must be executed under serializability, with the expectation th…
▽ More
Serializability is a well-understood concurrency control mechanism that eases reasoning about highly-concurrent database programs. Unfortunately, enforcing serializability has a high-performance cost, especially on geographically distributed database clusters. Consequently, many databases allow programmers to choose when a transaction must be executed under serializability, with the expectation that transactions would only be so marked when necessary to avoid serious concurrency bugs. However, this is a significant burden to impose on developers, requiring them to (a) reason about subtle concurrent interactions among potentially interfering transactions, (b) determine when such interactions would violate desired invariants, and (c) then identify the minimum number of transactions whose executions should be serialized to prevent these violations. To mitigate this burden, in this paper we present a sound and fully automated schema refactoring procedure that transforms a program's data layout -- rather than its concurrency control logic -- to eliminate statically identified concurrency bugs, allowing more transactions to be safely executed under weaker and more performant database guarantees. Experimental results over a range of database benchmarks indicate that our approach is highly effective in eliminating concurrency bugs, with safe refactored programs showing an average of 120% higher throughput and 45% lower latency compared to the baselines.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Semantics, Specification, and Bounded Verification of Concurrent Libraries in Replicated Systems
Authors:
Kartik Nagar,
Prasita Mukherjee,
Suresh Jagannathan
Abstract:
Geo-replicated systems provide a number of desirable properties such as globally low latency, high availability, scalability, and built-in fault tolerance. Unfortunately, programming correct applications on top of such systems has proven to be very challenging, in large part because of the weak consistency guarantees they offer. These complexities are exacerbated when we try to adapt existing high…
▽ More
Geo-replicated systems provide a number of desirable properties such as globally low latency, high availability, scalability, and built-in fault tolerance. Unfortunately, programming correct applications on top of such systems has proven to be very challenging, in large part because of the weak consistency guarantees they offer. These complexities are exacerbated when we try to adapt existing highly-performant concurrent libraries developed for shared-memory environments to this setting. The use of these libraries, developed with performance and scalability in mind, is highly desirable. But, identifying a suitable notion of correctness to check their validity under a weakly consistent execution model has not been well-studied, in large part because it is problematic to naively transplant criteria such as linearizability that has a useful interpretation in a shared-memory context to a distributed one where the cost of imposing a (logical) global ordering on all actions is prohibitive.
In this paper, we tackle these issues by proposing appropriate semantics and specifications for highly-concurrent libraries in a weakly-consistent, replicated setting. We use these specifications to develop a static analysis framework that can automatically detect correctness violations of library implementations parameterized with respect to the different consistency policies provided by the underlying system. We use our framework to analyze the behavior of a number of highly non-trivial library implementations of stacks, queues, and exchangers. Our results provide the first demonstration that automated correctness checking of concurrent libraries in a weakly geo-replicated setting is both feasible and practical.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
CLOTHO: Directed Test Generation for Weakly Consistent Database Systems
Authors:
Kia Rahmani,
Kartik Nagar,
Benjamin Delaware,
Suresh Jagannathan
Abstract:
Relational database applications are notoriously difficult to test and debug. Concurrent execution of database transactions may violate complex structural invariants that constraint how changes to the contents of one (shared) table affect the contents of another. Simplifying the underlying concurrency model is one way to ameliorate the difficulty of understanding how concurrent accesses and update…
▽ More
Relational database applications are notoriously difficult to test and debug. Concurrent execution of database transactions may violate complex structural invariants that constraint how changes to the contents of one (shared) table affect the contents of another. Simplifying the underlying concurrency model is one way to ameliorate the difficulty of understanding how concurrent accesses and updates can affect database state with respect to these sophisticated properties. Enforcing serializable execution of all transactions achieves this simplification, but it comes at a significant price in performance, especially at scale, where database state is often replicated to improve latency and availability. To address these challenges, this paper presents a novel testing framework for detecting serializability violations in (SQL) database-backed Java applications executing on weakly-consistent storage systems. We manifest our approach in a tool named CLOTHO, that combines a static analyzer and a model checker to generate abstract executions, discover serializability violations in these executions, and translate them back into concrete test inputs suitable for deployment in a test environment. To the best of our knowledge, CLOTHO is the first automated test generation facility for identifying serializability anomalies of Java applications intended to operate in geo-replicated distributed environments. An experimental evaluation on a set of industry-standard benchmarks demonstrates the utility of our approach.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Automated Parameterized Verification of CRDTs
Authors:
Kartik Nagar,
Suresh Jagannathan
Abstract:
Maintaining multiple replicas of data is crucial to achieving scalability, availability and low latency in distributed applications. Conflict-free Replicated Data Types (CRDTs) are important building blocks in this domain because they are designed to operate correctly under the myriad behaviors possible in a weakly-consistent distributed setting. Because of the possibility of concurrent updates to…
▽ More
Maintaining multiple replicas of data is crucial to achieving scalability, availability and low latency in distributed applications. Conflict-free Replicated Data Types (CRDTs) are important building blocks in this domain because they are designed to operate correctly under the myriad behaviors possible in a weakly-consistent distributed setting. Because of the possibility of concurrent updates to the same object at different replicas, and the absence of any ordering guarantees on these updates, convergence is an important correctness criterion for CRDTs. This property asserts that two replicas which receive the same set of updates (in any order) must nonetheless converge to the same state. One way to prove that operations on a CRDT converge is to show that they commute since commutative actions by definition behave the same regardless of the order in which they execute. In this paper, we present a framework for automatically verifying convergence of CRDTs under different weak-consistency policies. Surprisingly, depending upon the consistency policy supported by the underlying system, we show that not all operations of a CRDT need to commute to achieve convergence. We develop a proof rule parameterized by a consistency specification based on the concepts of commutativity modulo consistency policy and non-interference to commutativity. We describe the design and implementation of a verification engine equipped with this rule and show how it can be used to provide the first automated convergence proofs for a number of challenging CRDTs, including sets, lists, and graphs.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
Automated Detection of Serializability Violations under Weak Consistency
Authors:
Kartik Nagar,
Suresh Jagannathan
Abstract:
While a number of weak consistency mechanisms have been developed in recent years to improve performance and ensure availability in distributed, replicated systems, ensuring correctness of transactional applications running on top of such systems remains a difficult and important problem. Serializability is a well-understood correctness criterion for transactional programs; understanding whether a…
▽ More
While a number of weak consistency mechanisms have been developed in recent years to improve performance and ensure availability in distributed, replicated systems, ensuring correctness of transactional applications running on top of such systems remains a difficult and important problem. Serializability is a well-understood correctness criterion for transactional programs; understanding whether applications are serializable when executed in a weakly-consistent environment, however remains a challenging exercise. In this work, we combine the dependency graph-based characterization of serializability and the framework of abstract executions to develop a fully automated approach for statically finding bounded serializability violations under \emph{any} weak consistency model. We reduce the problem of serializability to satisfiability of a formula in First-Order Logic, which allows us to harness the power of existing SMT solvers. We provide rules to automatically construct the FOL encoding from programs written in SQL (allowing loops and conditionals) and the consistency specification written as a formula in FOL. In addition to detecting bounded serializability violations, we also provide two orthogonal schemes to reason about unbounded executions by providing sufficient conditions (in the form of FOL formulae) whose satisfiability would imply the absence of anomalies in any arbitrary execution. We have applied the proposed technique on TPC-C, a real world database program with complex application logic, and were able to discover anomalies under Parallel Snapshot Isolation, and verify serializability for unbounded executions under Snapshot Isolation, two consistency mechanisms substantially weaker than serializability.
△ Less
Submitted 21 June, 2018;
originally announced June 2018.
-
Alone Together: Compositional Reasoning and Inference for Weak Isolation
Authors:
Gowtham Kaki,
Kartik Nagar,
Mahsa Nazafzadeh,
Suresh Jagannathan
Abstract:
Serializability is a well-understood correctness criterion that simplifies reasoning about the behavior of concurrent transactions by ensuring they are isolated from each other while they execute. However, enforcing serializable isolation comes at a steep cost in performance and hence database systems in practice support, and often encourage, developers to implement transactions using weaker alter…
▽ More
Serializability is a well-understood correctness criterion that simplifies reasoning about the behavior of concurrent transactions by ensuring they are isolated from each other while they execute. However, enforcing serializable isolation comes at a steep cost in performance and hence database systems in practice support, and often encourage, developers to implement transactions using weaker alternatives. Unfortunately, the semantics of weak isolation is poorly understood, and usually explained only informally in terms of low-level implementation artifacts. Consequently, verifying high-level correctness properties in such environments remains a challenging problem.
To address this issue, we present a novel program logic that enables compositional reasoning about the behavior of concurrently executing weakly-isolated transactions. Recognizing that the proof burden necessary to use this logic may dissuade application developers, we also describe an inference procedure based on this foundation that ascertains the weakest isolation level that still guarantees the safety of high-level consistency invariants associated with such transactions. The key to effective inference is the observation that weakly-isolated transactions can be viewed as functional (monadic) computations over an abstract database state, allowing us to treat their operations as state transformers over the database. This interpretation enables automated verification using off-the-shelf SMT solvers. Case studies and experiments of real-world applications (written in an embedded DSL in OCaml) demonstrate the utility of our approach.
△ Less
Submitted 9 November, 2017; v1 submitted 26 October, 2017;
originally announced October 2017.