-
What Ails Generative Structure-based Drug Design: Too Little or Too Much Expressivity?
Authors:
Rafał Karczewski,
Samuel Kaski,
Markus Heinonen,
Vikas Garg
Abstract:
Several generative models with elaborate training and sampling procedures have been proposed recently to accelerate structure-based drug design (SBDD); however, perplexingly, their empirical performance turns out to be suboptimal. We seek to better understand this phenomenon from both theoretical and empirical perspectives. Since most of these models apply graph neural networks (GNNs), one may sus…
▽ More
Several generative models with elaborate training and sampling procedures have been proposed recently to accelerate structure-based drug design (SBDD); however, perplexingly, their empirical performance turns out to be suboptimal. We seek to better understand this phenomenon from both theoretical and empirical perspectives. Since most of these models apply graph neural networks (GNNs), one may suspect that they inherit the representational limitations of GNNs. We analyze this aspect, establishing the first such results for protein-ligand complexes. A plausible counterview may attribute the underperformance of these models to their excessive parameterizations, inducing expressivity at the expense of generalization. We also investigate this possibility with a simple metric-aware approach that learns an economical surrogate for affinity to infer an unlabelled molecular graph and optimizes for labels conditioned on this graph and molecular properties. The resulting model achieves state-of-the-art results using 100x fewer trainable parameters and affords up to 1000x speedup. Collectively, our findings underscore the need to reassess and redirect the existing paradigm and efforts for SBDD.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Reducing Matroid Optimization to Basis Search
Authors:
Robert Streit,
Vijay K. Garg
Abstract:
In combinatorial optimization, matroids provide one of the most elegant structures for algorithm design. This is perhaps best identified by the Edmonds-Rado theorem relating the success of the simple greedy algorithm to the anatomy of the optimal basis of a matroid [Edm71; Rad57]. As a response, much energy has been devoted to understanding a matroid's favorable computational properties. Yet surpr…
▽ More
In combinatorial optimization, matroids provide one of the most elegant structures for algorithm design. This is perhaps best identified by the Edmonds-Rado theorem relating the success of the simple greedy algorithm to the anatomy of the optimal basis of a matroid [Edm71; Rad57]. As a response, much energy has been devoted to understanding a matroid's favorable computational properties. Yet surprisingly, not much is understood where parallel algorithm design is concerned. Specifically, while prior work has investigated the task of finding an arbitrary basis in parallel computing settings [KUW88], the more complex task of finding the optimal basis remains unexplored. We initiate this study by reexamining Borůvka's minimum weight spanning tree algorithm in the language of matroid theory, identifying a new characterization of the optimal basis by way of a matroid's cocircuits as a result. Furthermore, we then combine such insights with special properties of binary matroids to reduce optimization in a binary matroid to the simpler task of search for an arbitrary basis, with only logarithmic asymptotic overhead. Consequentially, we are able to compose our reduction with a known basis search method of [KUW88] to obtain a novel algorithm for finding the optimal basis of a binary matroid with only sublinearly many adaptive rounds of queries to an independence oracle. To the authors' knowledge, this is the first parallel algorithm for matroid optimization to outperform the greedy algorithm in terms of adaptive complexity, for any class of matroid not represented by a graph.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness
Authors:
Satyam Kumar,
Sai Srujana Buddi,
Utkarsh Oggy Sarawgi,
Vineet Garg,
Shivesh Ranjan,
Ognjen,
Rudovic,
Ahmed Hussen Abdelaziz,
Saurabh Adya
Abstract:
Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need for effective personalized VAD systems has become paramount. In this paper, we present a comparative analysis of Personalized Voice Activity Detection…
▽ More
Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speech enhancement, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need for effective personalized VAD systems has become paramount. In this paper, we present a comparative analysis of Personalized Voice Activity Detection (PVAD) systems to assess their real-world effectiveness. We introduce a comprehensive approach to assess PVAD systems, incorporating various performance metrics such as frame-level and utterance-level error rates, detection latency and accuracy, alongside user-level analysis. Through extensive experimentation and evaluation, we provide a thorough understanding of the strengths and limitations of various PVAD variants. This paper advances the understanding of PVAD technology by offering insights into its efficacy and viability in practical applications using a comprehensive set of metrics.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Topological Neural Networks go Persistent, Equivariant, and Continuous
Authors:
Yogesh Verma,
Amauri H Souza,
Vikas Garg
Abstract:
Topological Neural Networks (TNNs) incorporate higher-order relational information beyond pairwise interactions, enabling richer representations than Graph Neural Networks (GNNs). Concurrently, topological descriptors based on persistent homology (PH) are being increasingly employed to augment the GNNs. We investigate the benefits of integrating these two paradigms. Specifically, we introduce TopN…
▽ More
Topological Neural Networks (TNNs) incorporate higher-order relational information beyond pairwise interactions, enabling richer representations than Graph Neural Networks (GNNs). Concurrently, topological descriptors based on persistent homology (PH) are being increasingly employed to augment the GNNs. We investigate the benefits of integrating these two paradigms. Specifically, we introduce TopNets as a broad framework that subsumes and unifies various methods in the intersection of GNNs/TNNs and PH such as (generalizations of) RePHINE and TOGL. TopNets can also be readily adapted to handle (symmetries in) geometric complexes, extending the scope of TNNs and PH to spatial settings. Theoretically, we show that PH descriptors can provably enhance the expressivity of simplicial message-passing networks. Empirically, (continuous and E(n)-equivariant extensions of) TopNets achieve strong performance across diverse tasks, including antibody design, molecular dynamics simulation, and drug property prediction.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Alignment is Key for Applying Diffusion Models to Retrosynthesis
Authors:
Najwa Laabid,
Severi Rissanen,
Markus Heinonen,
Arno Solin,
Vikas Garg
Abstract:
Retrosynthesis, the task of identifying precursors for a given molecule, can be naturally framed as a conditional graph generation task. Diffusion models are a particularly promising modelling approach, enabling post-hoc conditioning and trading off quality for speed during generation. We show mathematically that permutation equivariant denoisers severely limit the expressiveness of graph diffusio…
▽ More
Retrosynthesis, the task of identifying precursors for a given molecule, can be naturally framed as a conditional graph generation task. Diffusion models are a particularly promising modelling approach, enabling post-hoc conditioning and trading off quality for speed during generation. We show mathematically that permutation equivariant denoisers severely limit the expressiveness of graph diffusion models and thus their adaptation to retrosynthesis. To address this limitation, we relax the equivariance requirement such that it only applies to aligned permutations of the conditioning and the generated graphs obtained through atom mapping. Our new denoiser achieves the highest top-$1$ accuracy ($54.7$\%) across template-free and template-based methods on USPTO-50k. We also demonstrate the ability for flexible post-training conditioning and good sample quality with small diffusion step counts, highlighting the potential for interactive applications and additional controls for multi-step planning.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Heteroscedastic Preferential Bayesian Optimization with Informative Noise Distributions
Authors:
Marshal Arijona Sinaga,
Julien Martinelli,
Vikas Garg,
Samuel Kaski
Abstract:
Preferential Bayesian optimization (PBO) is a sample-efficient framework for learning human preferences between candidate designs. PBO classically relies on homoscedastic noise models to represent human aleatoric uncertainty. Yet, such noise fails to accurately capture the varying levels of human aleatoric uncertainty, particularly when the user possesses partial knowledge among different pairs of…
▽ More
Preferential Bayesian optimization (PBO) is a sample-efficient framework for learning human preferences between candidate designs. PBO classically relies on homoscedastic noise models to represent human aleatoric uncertainty. Yet, such noise fails to accurately capture the varying levels of human aleatoric uncertainty, particularly when the user possesses partial knowledge among different pairs of candidates. For instance, a chemist with solid expertise in glucose-related molecules may easily compare two compounds from that family while struggling to compare alcohol-related molecules. Currently, PBO overlooks this uncertainty during the search for a new candidate through the maximization of the acquisition function, consequently underestimating the risk associated with human uncertainty. To address this issue, we propose a heteroscedastic noise model to capture human aleatoric uncertainty. This model adaptively assigns noise levels based on the distance of a specific input to a predefined set of reliable inputs known as anchors provided by the human. Anchors encapsulate partial knowledge and offer insight into the comparative difficulty of evaluating different candidate pairs. Such a model can be seamlessly integrated into the acquisition function, thus leading to candidate design pairs that elegantly trade informativeness and ease of comparison for the human expert. We perform an extensive empirical evaluation of the proposed approach, demonstrating a consistent improvement over homoscedastic PBO.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Employing Federated Learning for Training Autonomous HVAC Systems
Authors:
Fredrik Hagström,
Vikas Garg,
Fabricio Oliveira
Abstract:
Buildings account for 40 % of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed fo…
▽ More
Buildings account for 40 % of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed for this purpose due to their ability to learn and adapt purely from experience. They have been shown to outperform classical controllers in terms of energy cost and consumption, as well as thermal comfort. However, their weakness lies in their relatively poor data efficiency, requiring long periods of training to reach acceptable policies, making them inapplicable to real-world controllers directly. Hence, common research goals are to improve the learning speed, as well as to improve their ability to generalize, in order to facilitate transfer learning to unseen building environments. In this paper, we take a federated learning approach to training the reinforcement learning controller of an HVAC system. A global control policy is learned by aggregating local policies trained on multiple data centers located in different climate zones. The goal of the policy is to simultaneously minimize energy consumption and maximize thermal comfort. The federated optimization strategy indirectly increases both the rate at which experience data is collected and the variation in the data. We demonstrate through experimental evaluation that these effects lead to a faster learning speed, as well as greater generalization capabilities in the federated policy compared to any individually trained policy.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces
Authors:
Yue Jiang,
Changkong Zhou,
Vikas Garg,
Antti Oulasvirta
Abstract:
Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture indivi…
▽ More
Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture individual elements' properties and their semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task, which involved predicting the positions of remaining unplaced elements in a partially completed GUI. The new model's suggestions showed alignment and visual appeal superior to the baseline method and received higher subjective ratings for preference. Furthermore, we demonstrate the practical benefits and efficiency advantages designers perceive when utilizing our model as an autocompletion plug-in.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
ClimODE: Climate and Weather Forecasting with Physics-informed Neural ODEs
Authors:
Yogesh Verma,
Markus Heinonen,
Vikas Garg
Abstract:
Climate and weather prediction traditionally relies on complex numerical simulations of atmospheric physics. Deep learning approaches, such as transformers, have recently challenged the simulation paradigm with complex network forecasts. However, they often act as data-driven black-box models that neglect the underlying physics and lack uncertainty quantification. We address these limitations with…
▽ More
Climate and weather prediction traditionally relies on complex numerical simulations of atmospheric physics. Deep learning approaches, such as transformers, have recently challenged the simulation paradigm with complex network forecasts. However, they often act as data-driven black-box models that neglect the underlying physics and lack uncertainty quantification. We address these limitations with ClimODE, a spatiotemporal continuous-time process that implements a key principle of advection from statistical mechanics, namely, weather changes due to a spatial movement of quantities over time. ClimODE models precise weather evolution with value-conserving dynamics, learning global weather transport as a neural flow, which also enables estimating the uncertainty in predictions. Our approach outperforms existing data-driven methods in global and regional forecasting with an order of magnitude smaller parameterization, establishing a new state of the art.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Field-based Molecule Generation
Authors:
Alexandru Dumitrescu,
Dani Korpela,
Markus Heinonen,
Yogesh Verma,
Valerii Iakovlev,
Vikas Garg,
Harri Lähdesmäki
Abstract:
This work introduces FMG, a field-based model for drug-like molecule generation. We show how the flexibility of this method provides crucial advantages over the prevalent, point-cloud based methods, and achieves competitive molecular stability generation. We tackle optical isomerism (enantiomers), a previously omitted molecular property that is crucial for drug safety and effectiveness, and thus a…
▽ More
This work introduces FMG, a field-based model for drug-like molecule generation. We show how the flexibility of this method provides crucial advantages over the prevalent, point-cloud based methods, and achieves competitive molecular stability generation. We tackle optical isomerism (enantiomers), a previously omitted molecular property that is crucial for drug safety and effectiveness, and thus account for all molecular geometry aspects. We demonstrate how previous methods are invariant to a group of transformations that includes enantiomer pairs, leading them invariant to the molecular R and S configurations, while our field-based generative model captures this property.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Algebraic Positional Encodings
Authors:
Konstantinos Kogkalidis,
Jean-Philippe Bernardy,
Vikas Garg
Abstract:
We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible mapping from the algebraic specification of a domain to an interpretation as orthogonal operators. This design preserves the algebraic characteristics of the source domain, ensuring that the model upholds the desired st…
▽ More
We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible mapping from the algebraic specification of a domain to an interpretation as orthogonal operators. This design preserves the algebraic characteristics of the source domain, ensuring that the model upholds the desired structural properties. Our scheme can accommodate various structures, including sequences, grids and trees, as well as their compositions. We conduct a series of experiments to demonstrate the practical applicability of our approach. Results suggest performance on par with or surpassing the current state-of-the-art, without hyperparameter optimizations or ``task search'' of any kind. Code will be made available at \url{github.com/konstantinosKokos/UnitaryPE}.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Parallel Algorithms for Equilevel Predicates
Authors:
Vijay K. Garg,
Robert P. Streit
Abstract:
We define a new class of predicates called equilevel predicates on a distributive lattice which eases the analysis of parallel algorithms. Many combinatorial problems such as the vertex cover problem, the bipartite matching problem, and the minimum spanning tree problem can be modeled as detecting an equilevel predicate. The problem of detecting an equilevel problem is NP-complete, but equilevel p…
▽ More
We define a new class of predicates called equilevel predicates on a distributive lattice which eases the analysis of parallel algorithms. Many combinatorial problems such as the vertex cover problem, the bipartite matching problem, and the minimum spanning tree problem can be modeled as detecting an equilevel predicate. The problem of detecting an equilevel problem is NP-complete, but equilevel predicates with the helpful property can be detected in polynomial time in an online manner. An equilevel predicate has the helpful property with a polynomial time algorithm if the algorithm can return a nonempty set of indices such that advancing on any of them can be used to detect the predicate. Furthermore, the refined independently helpful property allows online parallel detection of such predicates in NC. When the independently helpful property holds, advancing on all the specified indices in parallel can be used to detect the predicate in polylogarithmic time.
We also define a special class of equilevel predicates called solitary predicates. Unless NP = RP, this class of predicate also does not admit efficient algorithms. Earlier work has shown that solitary predicates with the efficient advancement can be detected in polynomial time. We introduce two properties called the antimonotone advancement and the efficient rejection which yield the detection of solitary predicates in NC. Finally, we identify the minimum spanning tree, the shortest path, and the conjunctive predicate detection as problems satisfying such properties, giving alternative certifications of their NC memberships as a result.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Going beyond persistent homology using persistent homology
Authors:
Johanna Immonen,
Amauri H. Souza,
Vikas Garg
Abstract:
Representational limits of message-passing graph neural networks (MP-GNNs), e.g., in terms of the Weisfeiler-Leman (WL) test for isomorphism, are well understood. Augmenting these graph models with topological features via persistent homology (PH) has gained prominence, but identifying the class of attributed graphs that PH can recognize remains open. We introduce a novel concept of color-separati…
▽ More
Representational limits of message-passing graph neural networks (MP-GNNs), e.g., in terms of the Weisfeiler-Leman (WL) test for isomorphism, are well understood. Augmenting these graph models with topological features via persistent homology (PH) has gained prominence, but identifying the class of attributed graphs that PH can recognize remains open. We introduce a novel concept of color-separating sets to provide a complete resolution to this important problem. Specifically, we establish the necessary and sufficient conditions for distinguishing graphs based on the persistence of their connected components, obtained from filter functions on vertex and edge colors. Our constructions expose the limits of vertex- and edge-level PH, proving that neither category subsumes the other. Leveraging these theoretical insights, we propose RePHINE for learning topological features on graphs. RePHINE efficiently combines vertex- and edge-level PH, achieving a scheme that is provably more powerful than both. Integrating RePHINE into MP-GNNs boosts their expressive power, resulting in gains over standard PH on several benchmarks for graph classification.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
Authors:
Utkarsh Oggy Sarawgi,
John Berkowitz,
Vineet Garg,
Arnav Kundu,
Minsik Cho,
Sai Srujana Buddi,
Saurabh Adya,
Ahmed Tewfik
Abstract:
Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms. Hence, increasing the learning capacity of such streaming models (i.e., by adding more parameters) to improve the predictive power may not be viable for real-world tasks. In this work, we propose a new loss, Streaming Anchor Loss (SAL), to better…
▽ More
Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms. Hence, increasing the learning capacity of such streaming models (i.e., by adding more parameters) to improve the predictive power may not be viable for real-world tasks. In this work, we propose a new loss, Streaming Anchor Loss (SAL), to better utilize the given learning capacity by encouraging the model to learn more from essential frames. More specifically, our SAL and its focal variations dynamically modulate the frame-wise cross entropy loss based on the importance of the corresponding frames so that a higher loss penalty is assigned for frames within the temporal proximity of semantically critical events. Therefore, our loss ensures that the model training focuses on predicting the relatively rare but task-relevant frames. Experimental results with standard lightweight convolutional and recurrent streaming networks on three different speech based detection tasks demonstrate that SAL enables the model to learn the overall task more effectively with improved accuracy and latency, without any additional data, model parameters, or architectural changes.
△ Less
Submitted 18 April, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Compositional Sculpting of Iterative Generative Processes
Authors:
Timur Garipov,
Sebastiaan De Peuter,
Ge Yang,
Vikas Garg,
Samuel Kaski,
Tommi Jaakkola
Abstract:
High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance cond…
▽ More
High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\unicode{x2014}$ the harmonic mean ($p_1 \otimes p_2$) and the contrast ($p_1 \unicode{x25D1}\,p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Does Single-channel Speech Enhancement Improve Keyword Spotting Accuracy? A Case Study
Authors:
Avamarie Brueggeman,
Takuya Higuchi,
Masood Delfarah,
Stephen Shum,
Vineet Garg
Abstract:
Noise robustness is a key aspect of successful speech applications. Speech enhancement (SE) has been investigated to improve automatic speech recognition accuracy; however, its effectiveness for keyword spotting (KWS) is still under-investigated. In this paper, we conduct a comprehensive study on single-channel speech enhancement for keyword spotting on the Google Speech Command (GSC) dataset. To…
▽ More
Noise robustness is a key aspect of successful speech applications. Speech enhancement (SE) has been investigated to improve automatic speech recognition accuracy; however, its effectiveness for keyword spotting (KWS) is still under-investigated. In this paper, we conduct a comprehensive study on single-channel speech enhancement for keyword spotting on the Google Speech Command (GSC) dataset. To investigate robustness to noise, the GSC dataset is augmented with noise signals from the WSJ0 Hipster Ambient Mixtures (WHAM!) noise dataset. Our investigation includes not only applying SE before KWS but also performing joint training of the SE frontend and KWS backend models. Moreover, we explore audio injection, a common approach to reduce distortions by using a weighted average of the enhanced and original signals. Audio injection is then further optimized by using another model that predicts the weight for each utterance. Our investigation reveals that SE can improve KWS accuracy on noisy speech when the backend model is trained on clean speech; however, despite our extensive exploration, it is difficult to improve the KWS accuracy with SE when the backend is trained on noisy speech.
△ Less
Submitted 21 February, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Leveraging Large Language Models for Exploiting ASR Uncertainty
Authors:
Pranay Dighe,
Yi Su,
Shangshang Zheng,
Yunshu Liu,
Vineet Garg,
Xiaochuan Niu,
Ahmed Tewfik
Abstract:
While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the…
▽ More
While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality. This work focuses on the former scenario, where LLM's accuracy on SLU tasks is constrained by the accuracy of a fixed ASR system on the spoken input. Specifically, we tackle speech-intent classification task, where a high word-error-rate can limit the LLM's ability to understand the spoken intent. Instead of chasing a high accuracy by designing complex or specialized architectures regardless of deployment costs, we seek to answer how far we can go without substantially changing the underlying ASR and LLM, which can potentially be shared by multiple unrelated tasks. To this end, we propose prompting the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis. We explore prompt-engineering to explain the concept of n-best lists to the LLM; followed by the finetuning of Low-Rank Adapters on the downstream tasks. Our approach using n-best lists proves to be effective on a device-directed speech detection task as well as on a keyword spotting task, where systems using n-best list prompts outperform those using 1-best ASR hypothesis; thus paving the way for an efficient method to exploit ASR uncertainty via LLMs for speech-based applications.
△ Less
Submitted 12 September, 2023; v1 submitted 9 September, 2023;
originally announced September 2023.
-
AbODE: Ab Initio Antibody Design using Conjoined ODEs
Authors:
Yogesh Verma,
Markus Heinonen,
Vikas Garg
Abstract:
Antibodies are Y-shaped proteins that neutralize pathogens and constitute the core of our adaptive immune system. De novo generation of new antibodies that target specific antigens holds the key to accelerating vaccine discovery. However, this co-design of the amino acid sequence and the 3D structure subsumes and accentuates some central challenges from multiple tasks, including protein folding (s…
▽ More
Antibodies are Y-shaped proteins that neutralize pathogens and constitute the core of our adaptive immune system. De novo generation of new antibodies that target specific antigens holds the key to accelerating vaccine discovery. However, this co-design of the amino acid sequence and the 3D structure subsumes and accentuates some central challenges from multiple tasks, including protein folding (sequence to structure), inverse folding (structure to sequence), and docking (binding). We strive to surmount these challenges with a new generative model AbODE that extends graph PDEs to accommodate both contextual information and external interactions. Unlike existing approaches, AbODE uses a single round of full-shot decoding and elicits continuous differential attention that encapsulates and evolves with latent interactions within the antibody as well as those involving the antigen. We unravel fundamental connections between AbODE and temporal networks as well as graph-matching networks. The proposed model significantly outperforms existing methods on standard metrics across benchmarks.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
PACO: Provocation Involving Action, Culture, and Oppression
Authors:
Vaibhav Garg,
Ganning Xu,
Munindar P. Singh
Abstract:
In India, people identify with a particular group based on certain attributes such as religion. The same religious groups are often provoked against each other. Previous studies show the role of provocation in increasing tensions between India's two prominent religious groups: Hindus and Muslims. With the advent of the Internet, such provocation also surfaced on social media platforms such as What…
▽ More
In India, people identify with a particular group based on certain attributes such as religion. The same religious groups are often provoked against each other. Previous studies show the role of provocation in increasing tensions between India's two prominent religious groups: Hindus and Muslims. With the advent of the Internet, such provocation also surfaced on social media platforms such as WhatsApp.
By leveraging an existing dataset of Indian WhatsApp posts, we identified three categories of provoking sentences against Indian Muslims. Further, we labeled 7,000 sentences for three provocation categories and called this dataset PACO. We leveraged PACO to train a model that can identify provoking sentences from a WhatsApp post. Our best model is fine-tuned RoBERTa and achieved a 0.851 average AUC score over five-fold cross-validation. Automatically identifying provoking sentences could stop provoking text from reaching out to the masses, and can prevent possible discrimination or violence against the target religious group.
Further, we studied the provocative speech through a pragmatic lens, by identifying the dialog acts and impoliteness super-strategies used against the religious group.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
iRogue: Identifying Rogue Behavior from App Reviews
Authors:
Vaibhav Garg,
Hui Guo,
Nirav Ajmeri,
Saikath Bhattacharya,
Munindar P. Singh
Abstract:
An app user can access information of other users or third parties. We define rogue mobile apps as those that enable a user (abuser) to access information of another user or third party (victim), in a way that violates the victim's privacy expectations. Such apps are dual-use and their identification is nontrivial. We propose iRogue, an approach for identifying rogue apps based on their reviews, p…
▽ More
An app user can access information of other users or third parties. We define rogue mobile apps as those that enable a user (abuser) to access information of another user or third party (victim), in a way that violates the victim's privacy expectations. Such apps are dual-use and their identification is nontrivial. We propose iRogue, an approach for identifying rogue apps based on their reviews, posted by victims, abusers, and others. iRogue involves training on deep learning features extracted from their 1,884 manually labeled reviews. iRogue first identifies how alarming a review is with respect to rogue behavior and, second, generates a rogue score for an app. iRogue predicts 100 rogue apps from a seed dataset curated following a previous study. Also, iRogue examines apps in other datasets of scraped reviews, and predicts an additional 139 rogue apps. On labeled ground truth, iRogue achieves the highest recall, and outperforms baseline approaches that leverage app descriptions and reviews. A qualitative analysis of alarming reviews reveals rogue functionalities. App users, platforms, and developers should be aware of such apps and their functionalities and take measures to curb privacy risk.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Extracting Incidents, Effects, and Requested Advice from MeToo Posts
Authors:
Vaibhav Garg,
Jiaqing Yuan,
Rujie Xi,
Munindar P. Singh
Abstract:
Survivors of sexual harassment frequently share their experiences on social media, revealing their feelings and emotions and seeking advice. We observed that on Reddit, survivors regularly share long posts that describe a combination of (i) a sexual harassment incident, (ii) its effect on the survivor, including their feelings and emotions, and (iii) the advice being sought. We term such posts MeT…
▽ More
Survivors of sexual harassment frequently share their experiences on social media, revealing their feelings and emotions and seeking advice. We observed that on Reddit, survivors regularly share long posts that describe a combination of (i) a sexual harassment incident, (ii) its effect on the survivor, including their feelings and emotions, and (iii) the advice being sought. We term such posts MeToo posts, even though they may not be so tagged and may appear in diverse subreddits. A prospective helper (such as a counselor or even a casual reader) must understand a survivor's needs from such posts. But long posts can be time-consuming to read and respond to.
Accordingly, we address the problem of extracting key information from a long MeToo post. We develop a natural language-based model to identify sentences from a post that describe any of the above three categories.
On ten-fold cross-validation of a dataset, our model achieves a macro F1 score of 0.82.
In addition, we contribute MeThree, a dataset comprising 8,947 labeled sentences extracted from Reddit posts. We apply the LIWC-22 toolkit on MeThree to understand how different language patterns in sentences of the three categories can reveal differences in emotional tone, authenticity, and other aspects.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Modular Flows: Differential Molecular Generation
Authors:
Yogesh Verma,
Samuel Kaski,
Markus Heinonen,
Vikas Garg
Abstract:
Generating new molecules is fundamental to advancing critical applications such as drug discovery and material synthesis. Flows can generate molecules effectively by inverting the encoding process, however, existing flow models either require artifactual dequantization or specific node/edge orderings, lack desiderata such as permutation invariance, or induce discrepancy between the encoding and th…
▽ More
Generating new molecules is fundamental to advancing critical applications such as drug discovery and material synthesis. Flows can generate molecules effectively by inverting the encoding process, however, existing flow models either require artifactual dequantization or specific node/edge orderings, lack desiderata such as permutation invariance, or induce discrepancy between the encoding and the decoding steps that necessitates post hoc validity correction. We circumvent these issues with novel continuous normalizing E(3)-equivariant flows, based on a system of node ODEs coupled as a graph PDE, that repeatedly reconcile locally toward globally aligned densities. Our models can be cast as message-passing temporal networks, and result in superlative performance on the tasks of density estimation and molecular generation. In particular, our generated samples achieve state-of-the-art on both the standard QM9 and ZINC250K benchmarks.
△ Less
Submitted 13 October, 2022; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Provably expressive temporal graph networks
Authors:
Amauri H. Souza,
Diego Mesquita,
Samuel Kaski,
Vikas Garg
Abstract:
Temporal graph networks (TGNs) have gained prominence as models for embedding dynamic interactions, but little is known about their theoretical underpinnings. We establish fundamental results about the representational power and limits of the two main categories of TGNs: those that aggregate temporal walks (WA-TGNs), and those that augment local message passing with recurrent memory modules (MP-TG…
▽ More
Temporal graph networks (TGNs) have gained prominence as models for embedding dynamic interactions, but little is known about their theoretical underpinnings. We establish fundamental results about the representational power and limits of the two main categories of TGNs: those that aggregate temporal walks (WA-TGNs), and those that augment local message passing with recurrent memory modules (MP-TGNs). Specifically, novel constructions reveal the inadequacy of MP-TGNs and WA-TGNs, proving that neither category subsumes the other. We extend the 1-WL (Weisfeiler-Leman) test to temporal graphs, and show that the most powerful MP-TGNs should use injective updates, as in this case they become as expressive as the temporal WL. Also, we show that sufficiently deep MP-TGNs cannot benefit from memory, and MP/WA-TGNs fail to compute graph properties such as girth.
These theoretical insights lead us to PINT -- a novel architecture that leverages injective temporal message passing and relative positional features. Importantly, PINT is provably more expressive than both MP-TGNs and WA-TGNs. PINT significantly outperforms existing TGNs on several real-world benchmarks.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Lattice Linear Predicate Algorithms for the Constrained Stable Marriage Problem with Ties
Authors:
Vijay K. Garg
Abstract:
We apply Lattice-Linear Predicate Detection Technique to derive parallel and distributed algorithms for various variants of the stable matching problem. These problems are: (a) the constrained stable marriage problem (b) the super stable marriage problem in presence of ties, and (c) the strongly stable marriage in presence of ties. All these problems are solved using the Lattice-Linear Predicate (…
▽ More
We apply Lattice-Linear Predicate Detection Technique to derive parallel and distributed algorithms for various variants of the stable matching problem. These problems are: (a) the constrained stable marriage problem (b) the super stable marriage problem in presence of ties, and (c) the strongly stable marriage in presence of ties. All these problems are solved using the Lattice-Linear Predicate (LLP) algorithm showing its generality. The constrained stable marriage problem is a version of finding the stable marriage in presence of lattice-linear constraints such as ``Peter's regret is less than that of Paul.'' For the constrained stable marriage problem, we present a distributed algorithm that takes $O(n^2)$ messages each of size $O(\log n)$ where $n$ is the number of men in the problem. Our algorithm is completely asynchronous. Our algorithms for the stable marriage problem with ties are also parallel with no synchronization.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Why GANs are overkill for NLP
Authors:
David Alvarez-Melis,
Vikas Garg,
Adam Tauman Kalai
Abstract:
This work offers a novel theoretical perspective on why, despite numerous attempts, adversarial approaches to generative modeling (e.g., GANs) have not been as popular for certain generation tasks, particularly sequential tasks such as Natural Language Generation, as they have in others, such as Computer Vision. In particular, on sequential data such as text, maximum-likelihood approaches are sign…
▽ More
This work offers a novel theoretical perspective on why, despite numerous attempts, adversarial approaches to generative modeling (e.g., GANs) have not been as popular for certain generation tasks, particularly sequential tasks such as Natural Language Generation, as they have in others, such as Computer Vision. In particular, on sequential data such as text, maximum-likelihood approaches are significantly more utilized than GANs. We show that, while it may seem that maximizing likelihood is inherently different than minimizing distinguishability, this distinction is largely artificial and only holds for limited models. We argue that minimizing KL-divergence (i.e., maximizing likelihood) is a more efficient approach to effectively minimizing the same distinguishability criteria that adversarial models seek to optimize. Reductions show that minimizing distinguishability can be seen as simply boosting likelihood for certain families of models including n-gram models and neural networks with a softmax output layer. To achieve a full polynomial-time reduction, a novel next-token distinguishability model is considered.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models
Authors:
Vineet Garg,
Ognjen Rudovic,
Pranay Dighe,
Ahmed H. Abdelaziz,
Erik Marchi,
Saurabh Adya,
Chandra Dhir,
Ahmed Tewfik
Abstract:
We address the problem of detecting speech directed to a device that does not contain a specific wake-word. Specifically, we focus on audio coming from a touch-based invocation. Mitigating virtual assistants (VAs) activation due to accidental button presses is critical for user experience. While the majority of approaches to false trigger mitigation (FTM) are designed to detect the presence of a t…
▽ More
We address the problem of detecting speech directed to a device that does not contain a specific wake-word. Specifically, we focus on audio coming from a touch-based invocation. Mitigating virtual assistants (VAs) activation due to accidental button presses is critical for user experience. While the majority of approaches to false trigger mitigation (FTM) are designed to detect the presence of a target keyword, inferring user intent in absence of keyword is difficult. This also poses a challenge when creating the training/evaluation data for such systems due to inherent ambiguity in the user's data. To this end, we propose a novel FTM approach that uses weakly-labeled training data obtained with a newly introduced data sampling strategy. While this sampling strategy reduces data annotation efforts, the data labels are noisy as the data are not annotated manually. We use these data to train an acoustics-only model for the FTM task by regularizing its loss function via knowledge distillation from an ASR-based (LatticeRNN) model. This improves the model decisions, resulting in 66% gain in accuracy, as measured by equal-error-rate (EER), over the base acoustics-only model. We also show that the ensemble of the LatticeRNN and acoustic-distilled models brings further accuracy improvement of 20%.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Robust 3D Garment Digitization from Monocular 2D Images for 3D Virtual Try-On Systems
Authors:
Sahib Majithia,
Sandeep N. Parameswaran,
Sadbhavana Babar,
Vikram Garg,
Astitva Srivastava,
Avinash Sharma
Abstract:
In this paper, we develop a robust 3D garment digitization solution that can generalize well on real-world fashion catalog images with cloth texture occlusions and large body pose variations. We assumed fixed topology parametric template mesh models for known types of garments (e.g., T-shirts, Trousers) and perform mapping of high-quality texture from an input catalog image to UV map panels corres…
▽ More
In this paper, we develop a robust 3D garment digitization solution that can generalize well on real-world fashion catalog images with cloth texture occlusions and large body pose variations. We assumed fixed topology parametric template mesh models for known types of garments (e.g., T-shirts, Trousers) and perform mapping of high-quality texture from an input catalog image to UV map panels corresponding to the parametric mesh model of the garment. We achieve this by first predicting a sparse set of 2D landmarks on the boundary of the garments. Subsequently, we use these landmarks to perform Thin-Plate-Spline-based texture transfer on UV map panels. Subsequently, we employ a deep texture inpainting network to fill the large holes (due to view variations & self-occlusions) in TPS output to generate consistent UV maps. Furthermore, to train the supervised deep networks for landmark prediction & texture inpainting tasks, we generated a large set of synthetic data with varying texture and lighting imaged from various views with the human present in a wide variety of poses. Additionally, we manually annotated a small set of fashion catalog images crawled from online fashion e-commerce platforms to finetune. We conduct thorough empirical evaluations and show impressive qualitative results of our proposed 3D garment texture solution on fashion catalog images. Such 3D garment digitization helps us solve the challenging task of enabling 3D Virtual Try-on.
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
A Deep Learning Technique using Low Sampling rate for residential Non Intrusive Load Monitoring
Authors:
Ronak Aghera,
Sahil Chilana,
Vishal Garg,
Raghunath Reddy
Abstract:
Individual device loads and energy consumption feedback is one of the important approaches for pursuing users to save energy in residences. This can help in identifying faulty devices and wasted energy by devices when left On unused. The main challenge is to identity and estimate the energy consumption of individual devices without intrusive sensors on each device. Non-intrusive load monitoring (N…
▽ More
Individual device loads and energy consumption feedback is one of the important approaches for pursuing users to save energy in residences. This can help in identifying faulty devices and wasted energy by devices when left On unused. The main challenge is to identity and estimate the energy consumption of individual devices without intrusive sensors on each device. Non-intrusive load monitoring (NILM) or energy disaggregation, is a blind source separation problem which requires a system to estimate the electricity usage of individual appliances from the aggregated household energy consumption. In this paper, we propose a novel deep neural network-based approach for performing load disaggregation on low frequency power data obtained from residential households. We combine a series of one-dimensional Convolutional Neural Networks and Long Short Term Memory (1D CNN-LSTM) to extract features that can identify active appliances and retrieve their power consumption given the aggregated household power value. We used CNNs to extract features from main readings in a given time frame and then used those features to classify if a given appliance is active at that time period or not. Following that, the extracted features are used to model a generation problem using LSTM. We train the LSTM to generate the disaggregated energy consumption of a particular appliance. Our neural network is capable of generating detailed feedback of demand-side, providing vital insights to the end-user about their electricity consumption. The algorithm was designed for low power offline devices such as ESP32. Empirical calculations show that our model outperforms the state-of-the-art on the Reference Energy Disaggregation Dataset (REDD).
△ Less
Submitted 7 November, 2021;
originally announced November 2021.
-
Minimal Envy Matchings in the Hospitals/Residents Problem with Lower Quotas
Authors:
Changyong Hu,
Vijay K. Garg
Abstract:
In the Hospitals/Residents problem, every hospital has an upper quota that limits the number of residents assigned to it. While, in some applications, each hospital also has a lower quota for the number of residents it receives. In this setting, a stable matching may not exist. Envy-freeness is introduced as a relaxation of stability that allows blocking pairs involving a resident and an empty pos…
▽ More
In the Hospitals/Residents problem, every hospital has an upper quota that limits the number of residents assigned to it. While, in some applications, each hospital also has a lower quota for the number of residents it receives. In this setting, a stable matching may not exist. Envy-freeness is introduced as a relaxation of stability that allows blocking pairs involving a resident and an empty position of a hospital. While, envy-free matching might not exist either when lower quotas are introduced. We consider the problem of finding a feasible matching that satisfies lower quotas and upper quotas and minimizes envy in terms of envy-pairs and envy-residents in the Hospitals/Resident problem with Lower Quota. We show that the problem is NP-hard with both envy measurement. We also give a simple exponential-time algorithm for the Minimum-Envy-Pair HRLQ problem.
△ Less
Submitted 29 October, 2021;
originally announced October 2021.
-
Reappraising Domain Generalization in Neural Networks
Authors:
Sarath Sivaprasad,
Akshay Goindani,
Vaibhav Garg,
Ritam Basu,
Saiteja Kosgi,
Vineet Gandhi
Abstract:
Given that Neural Networks generalize unreasonably well in the IID setting (with benign overfitting and betterment in performance with more parameters), OOD presents a consistent failure case to better the understanding of how they learn. This paper focuses on Domain Generalization (DG), which is perceived as the front face of OOD generalization. We find that the presence of multiple domains incen…
▽ More
Given that Neural Networks generalize unreasonably well in the IID setting (with benign overfitting and betterment in performance with more parameters), OOD presents a consistent failure case to better the understanding of how they learn. This paper focuses on Domain Generalization (DG), which is perceived as the front face of OOD generalization. We find that the presence of multiple domains incentivizes domain agnostic learning and is the primary reason for generalization in Tradition DG. We show that the state-of-the-art results can be obtained by borrowing ideas from IID generalization and the DG tailored methods fail to add any performance gains. Furthermore, we perform explorations beyond the Traditional DG (TDG) formulation and propose a novel ClassWise DG (CWDG) benchmark, where for each class, we randomly select one of the domains and keep it aside for testing. Despite being exposed to all domains during training, CWDG is more challenging than TDG evaluation. We propose a novel iterative domain feature masking approach, achieving state-of-the-art results on the CWDG benchmark. Overall, while explaining these observations, our work furthers insights into the learning mechanisms of neural networks.
△ Less
Submitted 28 April, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Streaming on-device detection of device directed speech from voice and touch-based invocation
Authors:
Ognjen Rudovic,
Akanksha Bindal,
Vineet Garg,
Pramod Simha,
Pranay Dighe,
Sachin Kajarekar
Abstract:
When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device. However, in many cases, the VA can accidentally be invoked by the keyword-like speech or accidental button press, which may have implications on user experience and privacy. To this end, we propose an acoustic false-t…
▽ More
When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device. However, in many cases, the VA can accidentally be invoked by the keyword-like speech or accidental button press, which may have implications on user experience and privacy. To this end, we propose an acoustic false-trigger-mitigation (FTM) approach for on-device device-directed speech detection that simultaneously handles the voice-trigger and touch-based invocation. To facilitate the model deployment on-device, we introduce a new streaming decision layer, derived using the notion of temporal convolutional networks (TCN) [1], known for their computational efficiency. To the best of our knowledge, this is the first approach that can detect device-directed speech from more than one invocation type in a streaming fashion. We compare this approach with streaming alternatives based on vanilla Average layer, and canonical LSTMs, and show: (i) that all the models show only a small degradation in accuracy compared with the invocation-specific models, and (ii) that the newly introduced streaming TCN consistently performs better or comparable with the alternatives, while mitigating device undirected speech faster in time, and with (relative) reduction in runtime peak-memory over the LSTM-based approach of 33% vs. 7%, when compared to a non-streaming counterpart.
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
Characterization of Super-stable Matchings
Authors:
Changyong Hu,
Vijay K. Garg
Abstract:
An instance of the super-stable matching problem with incomplete lists and ties is an undirected bipartite graph $G = (A \cup B, E)$, with an adjacency list being a linearly ordered list of ties. Ties are subsets of vertices equally good for a given vertex. An edge $(x,y) \in E \backslash M$ is a blocking edge for a matching $M$ if by getting matched to each other neither of the vertices $x$ and…
▽ More
An instance of the super-stable matching problem with incomplete lists and ties is an undirected bipartite graph $G = (A \cup B, E)$, with an adjacency list being a linearly ordered list of ties. Ties are subsets of vertices equally good for a given vertex. An edge $(x,y) \in E \backslash M$ is a blocking edge for a matching $M$ if by getting matched to each other neither of the vertices $x$ and $y$ would become worse off. Thus, there is no disadvantage if the two vertices would like to match up. A matching $M$ is super-stable if there is no blocking edge with respect to $M$. It has previously been shown that super-stable matchings form a distributive lattice and the number of super-stable matchings can be exponential in the number of vertices. We give two compact representations of size $O(m)$ that can be used to construct all super-stable matchings, where $m$ denotes the number of edges in the graph. The construction of the second representation takes $O(mn)$ time, where $n$ denotes the number of vertices in the graph, and gives an explicit rotation poset similar to the rotation poset in the classical stable marriage problem. We also give a polyhedral characterisation of the set of all super-stable matchings and prove that the super-stable matching polytope is integral, thus solving an open problem stated in the book by Gusfield and Irving .
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation
Authors:
Vineet Garg,
Wonil Chang,
Siddharth Sigtia,
Saurabh Adya,
Pramod Simha,
Pranay Dighe,
Chandra Dhir
Abstract:
We present a unified and hardware efficient architecture for two stage voice trigger detection (VTD) and false trigger mitigation (FTM) tasks. Two stage VTD systems of voice assistants can get falsely activated to audio segments acoustically similar to the trigger phrase of interest. FTM systems cancel such activations by using post trigger audio context. Traditional FTM systems rely on automatic…
▽ More
We present a unified and hardware efficient architecture for two stage voice trigger detection (VTD) and false trigger mitigation (FTM) tasks. Two stage VTD systems of voice assistants can get falsely activated to audio segments acoustically similar to the trigger phrase of interest. FTM systems cancel such activations by using post trigger audio context. Traditional FTM systems rely on automatic speech recognition lattices which are computationally expensive to obtain on device. We propose a streaming transformer (TF) encoder architecture, which progressively processes incoming audio chunks and maintains audio context to perform both VTD and FTM tasks using only acoustic features. The proposed joint model yields an average 18% relative reduction in false reject rate (FRR) for the VTD task at a given false alarm rate. Moreover, our model suppresses 95% of the false triggers with an additional one second of post-trigger audio. Finally, on-device measurements show 32% reduction in runtime memory and 56% reduction in inference time compared to non-streaming version of the model.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Who Needs Consensus? A Distributed Monetary System Between Rational Agents via Hearsay
Authors:
Yanni Georghiades,
Robert Streit,
Vijay Garg
Abstract:
We propose a novel distributed monetary system called Hearsay that tolerates both Byzantine and rational behavior without the need for agents to reach consensus on executed transactions. Recent work [5, 10, 15] has shown that distributed monetary systems do not require consensus and can operate using a broadcast primitive with weaker guarantees, such as reliable broadcast. However, these protocols…
▽ More
We propose a novel distributed monetary system called Hearsay that tolerates both Byzantine and rational behavior without the need for agents to reach consensus on executed transactions. Recent work [5, 10, 15] has shown that distributed monetary systems do not require consensus and can operate using a broadcast primitive with weaker guarantees, such as reliable broadcast. However, these protocols assume that some number of agents may be Byzantine and the remaining agents are perfectly correct. For the application of a monetary system in which the agents are real people with economic interests, the assumption that agents are perfectly correct may be too strong. We expand upon this line of thought by weakening the assumption of correctness and instead adopting a fault tolerance model which allows up to $t < \frac{N}{3}$ agents to be Byzantine and the remaining agents to be rational. A rational agent is one which will deviate from the protocol if it is in their own best interest. Under this fault tolerance model, Hearsay implements a monetary system in which all rational agents achieve agreement on executed transactions. Moreover, Hearsay requires only a single broadcast per transaction. In order to incentivize rational agents to behave correctly in Hearsay, agents are rewarded with transaction fees for participation in the protocol and punished for noticeable deviations from the protocol. Additionally, Hearsay uses a novel broadcast primitive called Rational Reliable Broadcast to ensure that agents can broadcast messages under Hearsay's fault tolerance model. Rational Reliable Broadcast achieves equivalent guarantees to Byzantine Reliable Broadcast [7] but can tolerate the presence of rational agents. To show this, we prove that following the Rational Reliable Broadcast protocol constitutes a Nash equilibrium between rational agents and may therefore be of independent interest.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
A Lattice Linear Predicate Parallel Algorithm for the Dynamic Programming Problems
Authors:
Vijay K. Garg
Abstract:
It has been shown that the parallel Lattice Linear Predicate (LLP) algorithm solves many combinatorial optimization problems such as the shortest path problem, the stable marriage problem and the market clearing price problem. In this paper, we give the parallel LLP algorithm for many dynamic programming problems. In particular, we show that the LLP algorithm solves the longest subsequence problem…
▽ More
It has been shown that the parallel Lattice Linear Predicate (LLP) algorithm solves many combinatorial optimization problems such as the shortest path problem, the stable marriage problem and the market clearing price problem. In this paper, we give the parallel LLP algorithm for many dynamic programming problems. In particular, we show that the LLP algorithm solves the longest subsequence problem, the optimal binary search tree problem, and the knapsack problem. Furthermore, the algorithm can be used to solve the constrained versions of these problems so long as the constraints are lattice linear. The parallel LLP algorithm requires only read-write atomicity and no higher-level atomic instructions.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Progressive Voice Trigger Detection: Accuracy vs Latency
Authors:
Siddharth Sigtia,
John Bridle,
Hywel Richards,
Pascal Clark,
Erik Marchi,
Vineet Garg
Abstract:
We present an architecture for voice trigger detection for virtual assistants. The main idea in this work is to exploit information in words that immediately follow the trigger phrase. We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision. However, waiting to listen to more audio each time incurs a latency increase. Pr…
▽ More
We present an architecture for voice trigger detection for virtual assistants. The main idea in this work is to exploit information in words that immediately follow the trigger phrase. We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision. However, waiting to listen to more audio each time incurs a latency increase. Progressive Voice Trigger Detection allows us to trade-off latency and accuracy by accepting clear trigger candidates quickly, but waiting for more context to decide whether to accept more marginal examples. Using a two-stage architecture, we show that by delaying the decision for just 3% of detected true triggers in the test set, we are able to obtain a relative improvement of 66% in false rejection rate, while incurring only a negligible increase in latency.
△ Less
Submitted 2 March, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Amortized Constant Round Atomic Snapshot in Message-Passing Systems
Authors:
Vijay Garg,
Saptaparni Kumar,
Lewis Tseng,
Xiong Zheng
Abstract:
We study the lattice agreement (LA) and atomic snapshot problems in asynchronous message-passing systems where up to $f$ nodes may crash. Our main result is a crash-tolerant atomic snapshot algorithm with \textit{amortized constant round complexity}. To the best of our knowledge, the best prior result is given by Delporte et al. [TPDS, 18] with amortized $O(n)$ complexity if there are more scans t…
▽ More
We study the lattice agreement (LA) and atomic snapshot problems in asynchronous message-passing systems where up to $f$ nodes may crash. Our main result is a crash-tolerant atomic snapshot algorithm with \textit{amortized constant round complexity}. To the best of our knowledge, the best prior result is given by Delporte et al. [TPDS, 18] with amortized $O(n)$ complexity if there are more scans than updates. Our algorithm achieves amortized constant round if there are $Ω(\sqrt{k})$ operations, where $k$ is the number of actual failures in an execution and is bounded by $f$. Moreover, when there is no failure, our algorithm has $O(1)$ round complexity unconditionally. To achieve amortized constant round complexity, we devise a simple \textit{early-stopping} lattice agreement algorithm and use it to "order" the update and scan operations for our snapshot object. Our LA algorithm has $O(\sqrt{k})$ round complexity. It is the first early-stopping LA algorithm in asynchronous systems.
△ Less
Submitted 29 August, 2020; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering
Authors:
Saurabh Adya,
Vineet Garg,
Siddharth Sigtia,
Pramod Simha,
Chandra Dhir
Abstract:
We consider the design of two-pass voice trigger detection systems. We focus on the networks in the second pass that are used to re-score candidate segments obtained from the first-pass. Our baseline is an acoustic model(AM), with BiLSTM layers, trained by minimizing the CTC loss. We replace the BiLSTM layers with self-attention layers. Results on internal evaluation sets show that self-attention…
▽ More
We consider the design of two-pass voice trigger detection systems. We focus on the networks in the second pass that are used to re-score candidate segments obtained from the first-pass. Our baseline is an acoustic model(AM), with BiLSTM layers, trained by minimizing the CTC loss. We replace the BiLSTM layers with self-attention layers. Results on internal evaluation sets show that self-attention networks yield better accuracy while requiring fewer parameters. We add an auto-regressive decoder network on top of the self-attention layers and jointly minimize the CTC loss on the encoder and the cross-entropy loss on the decoder. This design yields further improvements over the baseline. We retrain all the models above in a multi-task learning(MTL) setting, where one branch of a shared network is trained as an AM, while the second branch classifies the whole sequence to be true-trigger or not. Results demonstrate that networks with self-attention layers yield $\sim$60% relative reduction in false reject rates for a given false-alarm rate, while requiring 10% fewer parameters. When trained in the MTL setup, self-attention networks yield further accuracy improvements. On-device measurements show that we observe 70% relative reduction in inference time. Additionally, the proposed network architectures are $\sim$5X faster to train.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
Improved Paths to Stability for the Stable Marriage Problem
Authors:
Vijay Kumar Garg,
Changyong Hu
Abstract:
The stable marriage problem requires one to find a marriage with no blocking pair. Given a matching that is not stable, Roth and Vande Vate have shown that there exists a sequence of matchings that leads to a stable matching in which each successive matching is obtained by satisfying a blocking pair. The sequence produced by Roth and Vande Vate's algorithm is of length $O(n^3)$ where $n$ is the nu…
▽ More
The stable marriage problem requires one to find a marriage with no blocking pair. Given a matching that is not stable, Roth and Vande Vate have shown that there exists a sequence of matchings that leads to a stable matching in which each successive matching is obtained by satisfying a blocking pair. The sequence produced by Roth and Vande Vate's algorithm is of length $O(n^3)$ where $n$ is the number of men (and women). In this paper, we present an algorithm that achieves stability in a sequence of matchings of length $O(n^2)$. We also give an efficient algorithm to find the stable matching closest to the given initial matching under an appropriate distance function between matchings.
△ Less
Submitted 16 May, 2023; v1 submitted 14 July, 2020;
originally announced July 2020.
-
Byzantine Lattice Agreement in Asynchronous Systems
Authors:
Xiong Zheng,
Vijay Garg
Abstract:
We study the Byzantine lattice agreement (BLA) problem in asynchronous distributed message passing systems. In the BLA problem, each process proposes a value from a join semi-lattice and needs to output a value also in the lattice such that all output values of correct processes lie on a chain despite the presence of Byzantine processes. We present an algorithm for this problem with round complexi…
▽ More
We study the Byzantine lattice agreement (BLA) problem in asynchronous distributed message passing systems. In the BLA problem, each process proposes a value from a join semi-lattice and needs to output a value also in the lattice such that all output values of correct processes lie on a chain despite the presence of Byzantine processes. We present an algorithm for this problem with round complexity of $O(\log f)$ which tolerates $f < \frac{n}{5}$ Byzantine failures in the asynchronous setting without digital signatures, where $n$ is the number of processes. We also show how this algorithm can be modified to work in the authenticated setting (i.e., with digital signatures) to tolerate $f < \frac{n}{3}$ Byzantine failures.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Generalization and Representational Limits of Graph Neural Networks
Authors:
Vikas K. Garg,
Stefanie Jegelka,
Tommi Jaakkola
Abstract:
We address two fundamental questions about graph neural networks (GNNs). First, we prove that several important graph properties cannot be computed by GNNs that rely entirely on local information. Such GNNs include the standard message passing models, and more powerful spatial variants that exploit local graph structure (e.g., via relative orientation of messages, or local port ordering) to distin…
▽ More
We address two fundamental questions about graph neural networks (GNNs). First, we prove that several important graph properties cannot be computed by GNNs that rely entirely on local information. Such GNNs include the standard message passing models, and more powerful spatial variants that exploit local graph structure (e.g., via relative orientation of messages, or local port ordering) to distinguish neighbors of each node. Our treatment includes a novel graph-theoretic formalism. Second, we provide the first data dependent generalization bounds for message passing GNNs. This analysis explicitly accounts for the local permutation invariance of GNNs. Our bounds are much tighter than existing VC-dimension based guarantees for GNNs, and are comparable to Rademacher bounds for recurrent neural networks.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Learn to Expect the Unexpected: Probably Approximately Correct Domain Generalization
Authors:
Vikas K. Garg,
Adam Kalai,
Katrina Ligett,
Zhiwei Steven Wu
Abstract:
Domain generalization is the problem of machine learning when the training data and the test data come from different data domains. We present a simple theoretical model of learning to generalize across domains in which there is a meta-distribution over data distributions, and those data distributions may even have different supports. In our model, the training data given to a learning algorithm c…
▽ More
Domain generalization is the problem of machine learning when the training data and the test data come from different data domains. We present a simple theoretical model of learning to generalize across domains in which there is a meta-distribution over data distributions, and those data distributions may even have different supports. In our model, the training data given to a learning algorithm consists of multiple datasets each from a single domain drawn in turn from the meta-distribution. We study this model in three different problem settings---a multi-domain Massart noise setting, a decision tree multi-dataset setting, and a feature selection setting, and find that computationally efficient, polynomial-sample domain generalization is possible in each. Experiments demonstrate that our feature selection algorithm indeed ignores spurious correlations and improves generalization.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
A Generalization of Teo and Sethuraman's Median Stable Marriage Theorem
Authors:
Vijay K. Garg
Abstract:
Let $L$ be any finite distributive lattice and $B$ be any boolean predicate defined on $L$ such that the set of elements satisfying $B$ is a sublattice of $L$. Consider any subset $M$ of $L$ of size $k$ of elements of $L$ that satisfy $B$. Then, we show that $k$ generalized median elements generated from $M$ also satisfy $B$. We call this result generalized median theorem on finite distributive la…
▽ More
Let $L$ be any finite distributive lattice and $B$ be any boolean predicate defined on $L$ such that the set of elements satisfying $B$ is a sublattice of $L$. Consider any subset $M$ of $L$ of size $k$ of elements of $L$ that satisfy $B$. Then, we show that $k$ generalized median elements generated from $M$ also satisfy $B$. We call this result generalized median theorem on finite distributive lattices. When this result is applied to the stable matching, we get Teo and Sethuraman's median stable matching theorem. Our proof is much simpler than that of Teo and Sethuraman. When the generalized median theorem is applied to the assignment problem, we get an analogous result for market clearing price vectors.
△ Less
Submitted 9 January, 2020;
originally announced January 2020.
-
Byzantine Lattice Agreement in Synchronous Systems
Authors:
Xiong Zheng,
Vijay Garg
Abstract:
In this paper, we study the Byzantine lattice agreement problem in synchronous systems. The lattice agreement problem in crash failure model has been studied both in synchronous and asynchronous systems, which leads to the current best upper bound of $O(\log f)$ rounds in both systems. However, very few algorithmic results are known for the lattice agreement problem in Byzantine failure model. The…
▽ More
In this paper, we study the Byzantine lattice agreement problem in synchronous systems. The lattice agreement problem in crash failure model has been studied both in synchronous and asynchronous systems, which leads to the current best upper bound of $O(\log f)$ rounds in both systems. However, very few algorithmic results are known for the lattice agreement problem in Byzantine failure model. The paper [Nowak et al., DISC, 2019] first gives an algorithm for a variant of the lattice agreement problem on cycle-free lattices that tolerates up to $f < n/(h(X) + 1)$ Byzantine faults, where $n$ is the number of processes and $h(X)$ is the height of the input lattice $X$. The recent preprint by Di et al. studies this problem with a slightly modified validity condition in asynchronous systems. They present a $O(f)$ rounds algorithm by using the reliable broadcast primitive as a first step and following the similar algorithmic framework as the algorithms in crash failure model.
In this paper, we propose three algorithms for the Byzantine lattice agreement problem in synchronous systems. The first algorithm takes $\min \{3h(X) + 6,6\sqrt{f} + 6\})$ rounds and $O(n^2 \min\{h(X), \sqrt{f}\})$ messages, where $h(X)$ is the height of the input lattice $X$, $n$ is the total number of processes. The second algorithm runs in $3\log n + 3$ rounds and takes $O(n^2 \log n)$ messages. The third algorithm takes $4 \log f + 3$ rounds and takes $O(n^2 \log f)$ messages. All algorithms can tolerate up to $f < \frac{n}{3}$ Byzantine failures.
△ Less
Submitted 16 February, 2020; v1 submitted 30 October, 2019;
originally announced October 2019.
-
NC Algorithms for Popular Matchings in One-Sided Preference Systems and Related Problems
Authors:
Changyong Hu,
Vijay K. Garg
Abstract:
The popular matching problem is of matching a set of applicants to a set of posts, where each applicant has a preference list, ranking a non-empty subset of posts in the order of preference, possibly with ties. A matching M is popular if there is no other matching M' such that more applicants prefer M' to M. We give the first NC algorithm to solve the popular matching problem without ties. We also…
▽ More
The popular matching problem is of matching a set of applicants to a set of posts, where each applicant has a preference list, ranking a non-empty subset of posts in the order of preference, possibly with ties. A matching M is popular if there is no other matching M' such that more applicants prefer M' to M. We give the first NC algorithm to solve the popular matching problem without ties. We also give an NC algorithm that solves the maximum-cardinality popular matching problem. No NC or RNC algorithms were known for the matching problem in preference systems prior to this work. Moreover, we give an NC algorithm for a weaker version of the stable matching problem, that is, the problem of finding the "next" stable matching given a stable matching.
△ Less
Submitted 20 December, 2019; v1 submitted 23 October, 2019;
originally announced October 2019.
-
Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure
Authors:
Vikas K. Garg,
Inderjit S. Dhillon,
Hsiang-Fu Yu
Abstract:
The architecture of Transformer is based entirely on self-attention, and has been shown to outperform models that employ recurrence on sequence transduction tasks such as machine translation. The superior performance of Transformer has been attributed to propagating signals over shorter distances, between positions in the input and the output, compared to the recurrent architectures. We establish…
▽ More
The architecture of Transformer is based entirely on self-attention, and has been shown to outperform models that employ recurrence on sequence transduction tasks such as machine translation. The superior performance of Transformer has been attributed to propagating signals over shorter distances, between positions in the input and the output, compared to the recurrent architectures. We establish connections between the dynamics in Transformer and recurrent networks to argue that several factors including gradient flow along an ensemble of multiple weakly dependent paths play a paramount role in the success of Transformer. We then leverage the dynamics to introduce {\em Multiresolution Transformer Networks} as the first architecture that exploits hierarchical structure in data via self-attention. Our models significantly outperform state-of-the-art recurrent and hierarchical recurrent models on two real-world datasets for query suggestion, namely, \aol and \amazon. In particular, on AOL data, our model registers at least 20\% improvement on each precision score, and over 25\% improvement on the BLEU score with respect to the best performing recurrent model. We thus provide strong evidence that recurrence is not essential for modeling hierarchical structure.
△ Less
Submitted 27 August, 2019;
originally announced August 2019.
-
Teaching DNNs to design fast fashion
Authors:
Abhinav Ravi,
Arun Patro,
Vikram Garg,
Anoop Kolar Rajagopal,
Aruna Rajan,
Rajdeep Hazra Banerjee
Abstract:
$ $"Fast Fashion" spearheads the biggest disruption in fashion that enabled to engineer resilient supply chains to quickly respond to changing fashion trends. The conventional design process in commercial manufacturing is often fed through "trends" or prevailing modes of dressing around the world that indicate sudden interest in a new form of expression, cyclic patterns, and popular modes of expre…
▽ More
$ $"Fast Fashion" spearheads the biggest disruption in fashion that enabled to engineer resilient supply chains to quickly respond to changing fashion trends. The conventional design process in commercial manufacturing is often fed through "trends" or prevailing modes of dressing around the world that indicate sudden interest in a new form of expression, cyclic patterns, and popular modes of expression for a given time frame. In this work, we propose a fully automated system to explore, detect, and finally synthesize trends in fashion into design elements by designing representative prototypes of apparel given time series signals generated from social media feeds. Our system is envisioned to be the first step in design of Fast Fashion where the production cycle for clothes from design inception to manufacturing is meant to be rapid and responsive to current "trends". It also works to reduce wastage in fashion production by taking in customer feedback on sellability at the time of design generation. We also provide an interface wherein the designers can play with multiple trending styles in fashion and visualize designs as interpolations of elements of these styles. We aim to aid the creative process through generating interesting and inspiring combinations for a designer to mull by running them through her key customers.
△ Less
Submitted 3 July, 2019; v1 submitted 27 June, 2019;
originally announced June 2019.
-
Strategic Prediction with Latent Aggregative Games
Authors:
Vikas K. Garg,
Tommi Jaakkola
Abstract:
We introduce a new class of context dependent, incomplete information games to serve as structured prediction models for settings with significant strategic interactions. Our games map the input context to outcomes by first condensing the input into private player types that specify the utilities, weighted interactions, as well as the initial strategies for the players. The game is played over mul…
▽ More
We introduce a new class of context dependent, incomplete information games to serve as structured prediction models for settings with significant strategic interactions. Our games map the input context to outcomes by first condensing the input into private player types that specify the utilities, weighted interactions, as well as the initial strategies for the players. The game is played over multiple rounds where players respond to weighted aggregates of their neighbors' strategies. The predicted output from the model is a mixed strategy profile (a near-Nash equilibrium) and each observation is thought of as a sample from this strategy profile. We introduce two new aggregator paradigms with provably convergent game dynamics, and characterize the conditions under which our games are identifiable from data. Our games can be parameterized in a transferable manner so that the sets of players can change from one game to another. We demonstrate empirically that our games as models can recover meaningful strategic interactions from real voting data.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Solving graph compression via optimal transport
Authors:
Vikas K. Garg,
Tommi Jaakkola
Abstract:
We propose a new approach to graph compression by appeal to optimal transport. The transport problem is seeded with prior information about node importance, attributes, and edges in the graph. The transport formulation can be setup for either directed or undirected graphs, and its dual characterization is cast in terms of distributions over the nodes. The compression pertains to the support of nod…
▽ More
We propose a new approach to graph compression by appeal to optimal transport. The transport problem is seeded with prior information about node importance, attributes, and edges in the graph. The transport formulation can be setup for either directed or undirected graphs, and its dual characterization is cast in terms of distributions over the nodes. The compression pertains to the support of node distributions and makes the problem challenging to solve directly. To this end, we introduce Boolean relaxations and specify conditions under which these relaxations are exact. The relaxations admit algorithms with provably fast convergence. Moreover, we provide an exact $O(d \log d)$ algorithm for the subproblem of projecting a $d$-dimensional vector to transformed simplex constraints. Our method outperforms state-of-the-art compression methods on graph classification.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Parallel and Distributed Algorithms for the housing allocation Problem
Authors:
Xiong Zheng,
Vijay Garg
Abstract:
We give parallel and distributed algorithms for the housing allocation problem. In this problem, there is a set of agents and a set of houses. Each agent has a strict preference list for a subset of houses. We need to find a matching such that some criterion is optimized. One such criterion is Pareto Optimality. A matching is Pareto optimal if no coalition of agents can be strictly better off by e…
▽ More
We give parallel and distributed algorithms for the housing allocation problem. In this problem, there is a set of agents and a set of houses. Each agent has a strict preference list for a subset of houses. We need to find a matching such that some criterion is optimized. One such criterion is Pareto Optimality. A matching is Pareto optimal if no coalition of agents can be strictly better off by exchanging houses among themselves. We also study the housing market problem, a variant of the housing allocation problem, where each agent initially owns a house. In addition to Pareto optimality, we are also interested in finding the core of a housing market. A matching is in the core if there is no coalition of agents that can be better off by breaking away from other agents and switching houses only among themselves.
In the first part of this work, we show that computing a Pareto optimal matching of a house allocation is in {\bf CC} and computing the core of a housing market is {\bf CC}-hard. Given a matching, we also show that verifying whether it is in the core can be done in {\bf NC}. We then give an algorithm to show that computing a maximum Pareto optimal matching for the housing allocation problem is in {\bf RNC}^2 and quasi-{\bf NC}^2. In the second part of this work, we present a distributed version of the top trading cycle algorithm for finding the core of a housing market. To that end, we first present two algorithms for finding all the disjoint cycles in a functional graph: a Las Vegas algorithm which terminates in $O(\log l)$ rounds with high probability, where $l$ is the length of the longest cycle, and a deterministic algorithm which terminates in $O(\log^* n \log l)$ rounds, where $n$ is the number of nodes in the graph. Both algorithms work in the synchronous distributed model and use messages of size $O(\log n)$.
△ Less
Submitted 8 May, 2019;
originally announced May 2019.