-
SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing
Authors:
Heidi C. Zhang,
Sina J. Semnani,
Farhad Ghassemi,
Jialiang Xu,
Shicheng Liu,
Monica S. Lam
Abstract:
We introduce SPAGHETTI: Semantic Parsing Augmented Generation for Hybrid English information from Text Tables and Infoboxes, a hybrid question-answering (QA) pipeline that utilizes information from heterogeneous knowledge sources, including knowledge base, text, tables, and infoboxes. Our LLM-augmented approach achieves state-of-the-art performance on the Compmix dataset, the most comprehensive he…
▽ More
We introduce SPAGHETTI: Semantic Parsing Augmented Generation for Hybrid English information from Text Tables and Infoboxes, a hybrid question-answering (QA) pipeline that utilizes information from heterogeneous knowledge sources, including knowledge base, text, tables, and infoboxes. Our LLM-augmented approach achieves state-of-the-art performance on the Compmix dataset, the most comprehensive heterogeneous open-domain QA dataset, with 56.5% exact match (EM) rate. More importantly, manual analysis on a sample of the dataset suggests that SPAGHETTI is more than 90% accurate, indicating that EM is no longer suitable for assessing the capabilities of QA systems today.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Search Optimization with Query Likelihood Boosting and Two-Level Approximate Search for Edge Devices
Authors:
Jianwei Zhang,
Helian Feng,
Xin He,
Grant P. Strimel,
Farhad Ghassemi,
Ali Kebarighotbi
Abstract:
We present a novel search optimization solution for approximate nearest neighbor (ANN) search on resource-constrained edge devices. Traditional ANN approaches fall short in meeting the specific demands of real-world scenarios, e.g., skewed query likelihood distribution and search on large-scale indices with a low latency and small footprint. To address these limitations, we introduce two key compo…
▽ More
We present a novel search optimization solution for approximate nearest neighbor (ANN) search on resource-constrained edge devices. Traditional ANN approaches fall short in meeting the specific demands of real-world scenarios, e.g., skewed query likelihood distribution and search on large-scale indices with a low latency and small footprint. To address these limitations, we introduce two key components: a Query Likelihood Boosted Tree (QLBT) to optimize average search latency for frequently used small datasets, and a two-level approximate search algorithm to enable efficient retrieval with large datasets on edge devices. We perform thorough evaluation on simulated and real data and demonstrate QLBT can significantly reduce latency by 15% on real data and our two-level search algorithm successfully achieve deployable accuracy and latency on a 10 million dataset for edge devices. In addition, we provide a comprehensive protocol for configuring and optimizing on-device search algorithm through extensive empirical studies.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Robust Nonparametric Distribution Forecast with Backtest-based Bootstrap and Adaptive Residual Selection
Authors:
Longshaokan Wang,
Lingda Wang,
Mina Georgieva,
Paulo Machado,
Abinaya Ulagappa,
Safwan Ahmed,
Yan Lu,
Arjun Bakshi,
Farhad Ghassemi
Abstract:
Distribution forecast can quantify forecast uncertainty and provide various forecast scenarios with their corresponding estimated probabilities. Accurate distribution forecast is crucial for planning - for example when making production capacity or inventory allocation decisions. We propose a practical and robust distribution forecast framework that relies on backtest-based bootstrap and adaptive…
▽ More
Distribution forecast can quantify forecast uncertainty and provide various forecast scenarios with their corresponding estimated probabilities. Accurate distribution forecast is crucial for planning - for example when making production capacity or inventory allocation decisions. We propose a practical and robust distribution forecast framework that relies on backtest-based bootstrap and adaptive residual selection. The proposed approach is robust to the choice of the underlying forecasting model, accounts for uncertainty around the input covariates, and relaxes the independence between residuals and covariates assumption. It reduces the Absolute Coverage Error by more than 63% compared to the classic bootstrap approaches and by 2% - 32% compared to a variety of State-of-the-Art deep learning approaches on in-house product sales data and M4-hourly competition data.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Specification and Verification of Timing Properties in Interoperable Medical Systems
Authors:
Mahsa Zarneshan,
Fatemeh Ghassemi,
Ehsan Khamespanah,
Marjan Sirjani,
John Hatcliff
Abstract:
To support the dynamic composition of various devices/apps into a medical system at point-of-care, a set of communication patterns to describe the communication needs of devices has been proposed. To address timing requirements, each pattern breaks common timing properties into finer ones that can be enforced locally by the components. Common timing requirements for the underlying communication su…
▽ More
To support the dynamic composition of various devices/apps into a medical system at point-of-care, a set of communication patterns to describe the communication needs of devices has been proposed. To address timing requirements, each pattern breaks common timing properties into finer ones that can be enforced locally by the components. Common timing requirements for the underlying communication substrate are derived from these local properties. The local properties of devices are assured by the vendors at the development time. Although organizations procure devices that are compatible in terms of their local properties and middleware, they may not operate as desired. The latency of the organization network interacts with the local properties of devices. To validate the interaction among the timing properties of components and the network, we formally specify such systems in Timed Rebeca. We use model checking to verify the derived timing requirements of the communication substrate in terms of the network and device models. We provide a set of templates as a guideline to specify medical systems in terms of the formal model of patterns. A composite medical system using several devices is subject to state-space explosion. We extend the reduction technique of Timed Rebeca based on the static properties of patterns. We prove that our reduction is sound and show the applicability of our approach in reducing the state space by modeling two clinical scenarios made of several instances of patterns.
△ Less
Submitted 31 May, 2022; v1 submitted 7 December, 2020;
originally announced December 2020.
-
ConsiDroid: A Concolic-based Tool for Detecting SQL Injection Vulnerability in Android Apps
Authors:
Ehsan Edalat,
Babak Sadeghiyan,
Fatemeh Ghassemi
Abstract:
In this paper, we present a concolic execution technique for detecting SQL injection vulnerabilities in Android apps, with a new tool we called ConsiDroid. We extend the source code of apps with mocking technique, such that the execution of original source code is not affected. The extended source code can be treated as Java applications and may be executed by SPF with concolic execution. We autom…
▽ More
In this paper, we present a concolic execution technique for detecting SQL injection vulnerabilities in Android apps, with a new tool we called ConsiDroid. We extend the source code of apps with mocking technique, such that the execution of original source code is not affected. The extended source code can be treated as Java applications and may be executed by SPF with concolic execution. We automatically produce a DummyMain class out of static analysis such that the essential functions are called sequentially and, the events leading to vulnerable functions are triggered. We extend SPF with taint analysis in ConsiDroid. For making taint analysis possible, we introduce a new technique of symbolic mock classes in order to ease the propagation of tainted values in the code. An SQL injection vulnerability is detected through receiving a tainted value by a vulnerable function. Besides, ConsiDroid takes advantage of static analysis to adjust SPF in order to inspect only suspicious paths. To illustrate the applicability of ConsiDroid, we have inspected randomly selected 140 apps from F-Droid repository. From these apps, we found three apps vulnerable to SQL injection. To verify their vulnerability, we analyzed the apps manually based on ConsiDroid's reports by using Robolectric.
△ Less
Submitted 8 August, 2019; v1 submitted 26 November, 2018;
originally announced November 2018.
-
Verification of Asynchronous Systems with an Unspecified Component
Authors:
Rosa Abbasi,
Fatemeh Ghassemi,
Ramtin Khosravi
Abstract:
Component-based systems evolve as a new component is added or an existing one is replaced by a newer version. Hence, it is appealing to assure the new system still preserves its safety properties. However, instead of inspecting the new system as a whole, which may result in a large state space, it is beneficial to reuse the verification results by inspecting the newly added component in isolation.…
▽ More
Component-based systems evolve as a new component is added or an existing one is replaced by a newer version. Hence, it is appealing to assure the new system still preserves its safety properties. However, instead of inspecting the new system as a whole, which may result in a large state space, it is beneficial to reuse the verification results by inspecting the newly added component in isolation. To this aim, we study the problem of model checking component-based asynchronously communicating systems in the presence of an unspecified component against safety properties. Our solution is based on assume-guarantee reasoning, adopted for asynchronous environments, which generates the weakest assumption. If the newly added component conforms to the assumption, then the whole system still satisfies the property. To make the approach efficient and convergent, we produce an overapproximated interface of the missing component and by its composition with the rest of the system components, we achieve an overapproximated specification of the system, from which we remove those traces of the system that violate the property and generate an assumption for the missing component.
We have implemented our approach on two case studies. Furthermore, we compared our results with the state of the art direct approach. Our resulting assumptions are smaller in size and achieved faster.
△ Less
Submitted 11 September, 2017;
originally announced September 2017.
-
An Efficient Loop-free Version of AODVv2
Authors:
Behnaz Yousefi,
Fatemeh Ghassemi
Abstract:
Ad hoc On Demand distance Vector (AODV) routing protocol is one of the most prominent routing protocol used in Mobile Ad-hoc Networks (MANETs). Due to the mobility of nodes, there exists many revisions as scenarios leading to the loop formation were found. We demonstrate the loop freedom property violation of AODVv2-11, AODVv2-13, and AODVv2-16 through counterexamples. We present our proposed vers…
▽ More
Ad hoc On Demand distance Vector (AODV) routing protocol is one of the most prominent routing protocol used in Mobile Ad-hoc Networks (MANETs). Due to the mobility of nodes, there exists many revisions as scenarios leading to the loop formation were found. We demonstrate the loop freedom property violation of AODVv2-11, AODVv2-13, and AODVv2-16 through counterexamples. We present our proposed version of AODVv2 precisely which not only ensures loop freedom but also improves the performance.
△ Less
Submitted 14 October, 2017; v1 submitted 6 September, 2017;
originally announced September 2017.
-
Reliable Restricted Process Theory
Authors:
Fatemeh Ghassemi,
Wan Fokkink
Abstract:
Malfunctions of a mobile ad hoc network (MANET) protocol caused by a conceptual mistake in the protocol design, rather than unreliable communication, can often be detected only by considering communication among the nodes in the network to be reliable. In Restricted Broadcast Process Theory, which was developed for the specification and verification of MANET protocols, the communication operator i…
▽ More
Malfunctions of a mobile ad hoc network (MANET) protocol caused by a conceptual mistake in the protocol design, rather than unreliable communication, can often be detected only by considering communication among the nodes in the network to be reliable. In Restricted Broadcast Process Theory, which was developed for the specification and verification of MANET protocols, the communication operator is lossy. Replacing unreliable with reliable communication invalidates existing results for this process theory. We examine the effects of this adaptation on the semantics of the framework with regard to the non-blocking property of communication in MANETs, the notion of behavioral equivalence relation and its axiomatization. We illustrate the applicability of our framework through a simple routing protocol. To prove its correctness, we introduce a novel proof process, based on a precongruence relation.
△ Less
Submitted 7 May, 2017;
originally announced May 2017.
-
Combinatorial Entropy Power Inequalities: A Preliminary Study of the Stam region
Authors:
Mokshay Madiman,
Farhad Ghassemi
Abstract:
We initiate the study of the Stam region, defined as the subset of the positive orthant in $\mathbb{R}^{2^n-1}$ that arises from considering entropy powers of subset sums of $n$ independent random vectors in a Euclidean space of finite dimension. We show that the class of fractionally superadditive set functions provides an outer bound to the Stam region, resolving a conjecture of A. R. Barron and…
▽ More
We initiate the study of the Stam region, defined as the subset of the positive orthant in $\mathbb{R}^{2^n-1}$ that arises from considering entropy powers of subset sums of $n$ independent random vectors in a Euclidean space of finite dimension. We show that the class of fractionally superadditive set functions provides an outer bound to the Stam region, resolving a conjecture of A. R. Barron and the first author. On the other hand, the entropy power of a sum of independent random vectors is not supermodular in any dimension. We also develop some qualitative properties of the Stam region, showing for instance that its closure is a logarithmically convex cone.
△ Less
Submitted 29 June, 2018; v1 submitted 4 April, 2017;
originally announced April 2017.
-
Modeling and Efficient Verification of Wireless Ad hoc Networks
Authors:
Behnaz Yousefi,
Fatemeh Ghassemi,
Ramtin Khosravi
Abstract:
Wireless ad hoc networks, in particular mobile ad hoc networks (MANETs), are growing very fast as they make communication easier and more available. However, their protocols tend to be difficult to design due to topology dependent behavior of wireless communication, and their distributed and adaptive operations to topology dynamism. Therefore, it is desirable to have them modeled and verified usin…
▽ More
Wireless ad hoc networks, in particular mobile ad hoc networks (MANETs), are growing very fast as they make communication easier and more available. However, their protocols tend to be difficult to design due to topology dependent behavior of wireless communication, and their distributed and adaptive operations to topology dynamism. Therefore, it is desirable to have them modeled and verified using formal methods. In this paper, we present an actor-based modeling language with the aim to model MANETs. We address main challenges of modeling wireless ad hoc networks such as local broadcast, underlying topology, and its changes, and discuss how they can be efficiently modeled at the semantic level to make their verification amenable. The new framework abstracts the data link layer services by providing asynchronous (local) broadcast and unicast communication, while message delivery is in order and is guaranteed for connected receivers. We illustrate the applicability of our framework through two routing protocols, namely flooding and AODVv2-11, and show how efficiently their state spaces can be reduced by the proposed techniques. Furthermore, we demonstrate a loop formation scenario in AODV, found by our analysis tool.
△ Less
Submitted 17 April, 2017; v1 submitted 25 April, 2016;
originally announced April 2016.