Search | arXiv e-print repository

NativQA: Multilingual Culturally-Aligned Natural Query for LLMs

Authors: Md. Arid Hasan, Maram Hasanain, Fatema Ahmad, Sahinur Rahman Laskar, Sunaya Upadhyay, Vrunda N Sukhadia, Mucahid Kutlu, Shammur Absar Chowdhury, Firoj Alam

Abstract: Natural Question Answering (QA) datasets play a crucial role in developing and evaluating the capabilities of large language models (LLMs), ensuring their effective usage in real-world applications. Despite the numerous QA datasets that have been developed, there is a notable lack of region-specific datasets generated by native users in their own languages. This gap hinders the effective benchmark… ▽ More Natural Question Answering (QA) datasets play a crucial role in developing and evaluating the capabilities of large language models (LLMs), ensuring their effective usage in real-world applications. Despite the numerous QA datasets that have been developed, there is a notable lack of region-specific datasets generated by native users in their own languages. This gap hinders the effective benchmarking of LLMs for regional and cultural specificities. In this study, we propose a scalable framework, NativQA, to seamlessly construct culturally and regionally aligned QA datasets in native languages, for LLM evaluation and tuning. Moreover, to demonstrate the efficacy of the proposed framework, we designed a multilingual natural QA dataset, MultiNativQA, consisting of ~72K QA pairs in seven languages, ranging from high to extremely low resource, based on queries from native speakers covering 18 topics. We benchmark the MultiNativQA dataset with open- and closed-source LLMs. We made both the framework NativQA and MultiNativQA dataset publicly available for the community. (https://nativqa.gitlab.io) △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: LLMs, Native, Multilingual, Language Diversity, Contextual Understanding, Minority Languages, Culturally Informed, Foundation Models, Large Language Models

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

arXiv:2407.04966 [pdf, other]

A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition

Authors: Shreya G. Upadhyay, Carlos Busso, Chi-Chun Lee

Abstract: Cross-lingual speech emotion recognition (SER) is important for a wide range of everyday applications. While recent SER research relies heavily on large pretrained models for emotion training, existing studies often concentrate solely on the final transformer layer of these models. However, given the task-specific nature and hierarchical architecture of these models, each transformer layer encapsu… ▽ More Cross-lingual speech emotion recognition (SER) is important for a wide range of everyday applications. While recent SER research relies heavily on large pretrained models for emotion training, existing studies often concentrate solely on the final transformer layer of these models. However, given the task-specific nature and hierarchical architecture of these models, each transformer layer encapsulates different levels of information. Leveraging this hierarchical structure, our study focuses on the information embedded across different layers. Through an examination of layer feature similarity across different languages, we propose a novel strategy called a layer-anchoring mechanism to facilitate emotion transfer in cross-lingual SER tasks. Our approach is evaluated using two distinct language affective corpora (MSP-Podcast and BIIC-Podcast), achieving a best UAR performance of 60.21% on the BIIC-podcast corpus. The analysis uncovers interesting insights into the behavior of popular pretrained models. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2406.06519 [pdf, other]

UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor

Authors: Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, Jimmy Lin

Abstract: Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems. Conventionally, these judgments are made by human assessors, rendering this process expensive and laborious. A recent study by Thomas et al. from Microsoft Bing suggested that large language models (LLMs) can accurately perform the relevance assessment task and provide huma… ▽ More Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems. Conventionally, these judgments are made by human assessors, rendering this process expensive and laborious. A recent study by Thomas et al. from Microsoft Bing suggested that large language models (LLMs) can accurately perform the relevance assessment task and provide human-quality judgments, but unfortunately their study did not yield any reusable software artifacts. Our work presents UMBRELA (a recursive acronym that stands for UMbrela is the Bing RELevance Assessor), an open-source toolkit that reproduces the results of Thomas et al. using OpenAI's GPT-4o model and adds more nuance to the original paper. Across Deep Learning Tracks from TREC 2019 to 2023, we find that LLM-derived relevance judgments correlate highly with rankings generated by effective multi-stage retrieval systems. Our toolkit is designed to be easily extensible and can be integrated into existing multi-stage retrieval and evaluation pipelines, offering researchers a valuable resource for studying retrieval evaluation methodologies. UMBRELA will be used in the TREC 2024 RAG Track to aid in relevance assessments, and we envision our toolkit becoming a foundation for further innovation in the field. UMBRELA is available at https://github.com/castorini/umbrela. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures

arXiv:2405.10311 [pdf, other]

UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models

Authors: Sahel Sharifymoghaddam, Shivani Upadhyay, Wenhu Chen, Jimmy Lin

Abstract: Recently, Multi-Modal(MM) Large Language Models(LLMs) have unlocked many complex use-cases that require MM understanding (e.g., image captioning or visual question answering) and MM generation (e.g., text-guided image generation or editing) capabilities. To further improve the output fidelity of MM-LLMs we introduce the model-agnostic UniRAG technique that adds relevant retrieved information to pr… ▽ More Recently, Multi-Modal(MM) Large Language Models(LLMs) have unlocked many complex use-cases that require MM understanding (e.g., image captioning or visual question answering) and MM generation (e.g., text-guided image generation or editing) capabilities. To further improve the output fidelity of MM-LLMs we introduce the model-agnostic UniRAG technique that adds relevant retrieved information to prompts as few-shot examples during inference. Unlike the common belief that Retrieval Augmentation (RA) mainly improves generation or understanding of uncommon entities, our evaluation results on the MSCOCO dataset with common entities show that both proprietary models like GPT4 and Gemini-Pro and smaller open-source models like Llava, LaVIT, and Emu2 significantly enhance their generation quality when their input prompts are augmented with relevant information retrieved by MM retrievers like UniIR models. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 11 pages, 7 figures

arXiv:2405.04727 [pdf, other]

LLMs Can Patch Up Missing Relevance Judgments in Evaluation

Authors: Shivani Upadhyay, Ehsan Kamalloo, Jimmy Lin

Abstract: Unjudged documents or holes in information retrieval benchmarks are considered non-relevant in evaluation, yielding no gains in measuring effectiveness. However, these missing judgments may inadvertently introduce biases into the evaluation as their prevalence for a retrieval model is heavily contingent on the pooling process. Thus, filling holes becomes crucial in ensuring reliable and accurate e… ▽ More Unjudged documents or holes in information retrieval benchmarks are considered non-relevant in evaluation, yielding no gains in measuring effectiveness. However, these missing judgments may inadvertently introduce biases into the evaluation as their prevalence for a retrieval model is heavily contingent on the pooling process. Thus, filling holes becomes crucial in ensuring reliable and accurate evaluation. Collecting human judgment for all documents is cumbersome and impractical. In this paper, we aim at leveraging large language models (LLMs) to automatically label unjudged documents. Our goal is to instruct an LLM using detailed instructions to assign fine-grained relevance judgments to holes. To this end, we systematically simulate scenarios with varying degrees of holes by randomly dropping relevant documents from the relevance judgment in TREC DL tracks. Our experiments reveal a strong correlation between our LLM-based method and ground-truth relevance judgments. Based on our simulation experiments conducted on three TREC DL datasets, in the extreme scenario of retaining only 10% of judgments, our method achieves a Kendall tau correlation of 0.87 and 0.92 on an average for Vicuña-7B and GPT-3.5 Turbo respectively. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 5 pages, 4 figures

arXiv:2403.12031 [pdf, other]

RouterBench: A Benchmark for Multi-LLM Routing System

Authors: Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay

Abstract: As the range of applications for Large Language Models (LLMs) continues to grow, the demand for effective serving solutions becomes increasingly critical. Despite the versatility of LLMs, no single model can optimally address all tasks and applications, particularly when balancing performance with cost. This limitation has led to the development of LLM routing systems, which combine the strengths… ▽ More As the range of applications for Large Language Models (LLMs) continues to grow, the demand for effective serving solutions becomes increasingly critical. Despite the versatility of LLMs, no single model can optimally address all tasks and applications, particularly when balancing performance with cost. This limitation has led to the development of LLM routing systems, which combine the strengths of various models to overcome the constraints of individual LLMs. Yet, the absence of a standardized benchmark for evaluating the performance of LLM routers hinders progress in this area. To bridge this gap, we present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems, along with a comprehensive dataset comprising over 405k inference outcomes from representative LLMs to support the development of routing strategies. We further propose a theoretical framework for LLM routing, and deliver a comparative analysis of various routing approaches through RouterBench, highlighting their potentials and limitations within our evaluation framework. This work not only formalizes and advances the development of LLM routing systems but also sets a standard for their assessment, paving the way for more accessible and economically viable LLM deployments. The code and data are available at https://github.com/withmartian/routerbench. △ Less

Submitted 28 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.00470 [pdf, other]

Autonomous Robotic Arm Manipulation for Planetary Missions using Causal Machine Learning

Authors: C. McDonnell, M. Arana-Catania, S. Upadhyay

Abstract: Autonomous robotic arm manipulators have the potential to make planetary exploration and in-situ resource utilization missions more time efficient and productive, as the manipulator can handle the objects itself and perform goal-specific actions. We train a manipulator to autonomously study objects of which it has no prior knowledge, such as planetary rocks. This is achieved using causal machine l… ▽ More Autonomous robotic arm manipulators have the potential to make planetary exploration and in-situ resource utilization missions more time efficient and productive, as the manipulator can handle the objects itself and perform goal-specific actions. We train a manipulator to autonomously study objects of which it has no prior knowledge, such as planetary rocks. This is achieved using causal machine learning in a simulated planetary environment. Here, the manipulator interacts with objects, and classifies them based on differing causal factors. These are parameters, such as mass or friction coefficient, that causally determine the outcomes of its interactions. Through reinforcement learning, the manipulator learns to interact in ways that reveal the underlying causal factors. We show that this method works even without any prior knowledge of the objects, or any previously-collected training data. We carry out the training in planetary exploration conditions, with realistic manipulator models. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 8 pages, ASTRA 2023: 17th Symposium on Advanced Space Technologies in Robotics and Automation, 18-20 October 2023, Leiden, The Netherlands

arXiv:2402.17937 [pdf, other]

Can an LLM-Powered Socially Assistive Robot Effectively and Safely Deliver Cognitive Behavioral Therapy? A Study With University Students

Authors: Mina J. Kian, Mingyu Zong, Katrin Fischer, Abhyuday Singh, Anna-Maria Velentza, Pau Sang, Shriya Upadhyay, Anika Gupta, Misha A. Faruki, Wallace Browning, Sebastien M. R. Arnold, Bhaskar Krishnamachari, Maja J. Mataric

Abstract: Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performa… ▽ More Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performance of the SAR through a 15-day study with 38 university students randomly assigned to interact daily with the robot or a chatbot (using the same LLM), or complete traditional CBT worksheets throughout the duration of the study. We measured weekly therapeutic outcomes, changes in pre-/post-session anxiety measures, and adherence to completing CBT exercises. We found that self-reported measures of general psychological distress significantly decreased over the study period in the robot and worksheet conditions but not the chatbot condition. Furthermore, the SAR enabled significant single-session improvements for more sessions than the other two conditions combined. Our findings suggest that SAR-guided LLM-powered CBT may be as effective as traditional worksheet methods in supporting therapeutic progress from the beginning to the end of the study and superior in decreasing user anxiety immediately after completing the CBT exercise. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.04881 [pdf, other]

doi 10.13140/RG.2.2.27637.35049

Epistral Network: Revolutionizing Media Curation and Consumption through Decentralization

Authors: Dipankar Sarkar, Shubham Upadhyay

Abstract: Blockchain technology has revolutionized media consumption and distribution in the digital age, allowing creators, consumers, and regulators to participate in a decentralized, fair, and engaging media environment. Epistral, an innovative media network that leverages blockchain technology, aims to be the world's first anti-mimetic media curation and consumption network, addressing the core challeng… ▽ More Blockchain technology has revolutionized media consumption and distribution in the digital age, allowing creators, consumers, and regulators to participate in a decentralized, fair, and engaging media environment. Epistral, an innovative media network that leverages blockchain technology, aims to be the world's first anti-mimetic media curation and consumption network, addressing the core challenges facing today's digital media landscape: unfair treatment of creators and manipulative consumer algorithms, and the complex task of effective regulation. This paper delves into the conceptualization, design, and potential impact of epistral and explores how it embodies McLuhan's and Girard's theories within the realm of blockchain technology and draws from Hayden's critique of democratic representation. The paper analyzes the challenges and opportunities presented by this new network, providing a broader discourse on the future of media consumption, distribution, and regulation. △ Less

Submitted 10 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2310.12963 [pdf, other]

AutoMix: Automatically Mixing Language Models

Authors: Pranjal Aggarwal, Aman Madaan, Ankit Anand, Srividya Pranavi Potharaju, Swaroop Mishra, Pei Zhou, Aditya Gupta, Dheeraj Rajagopal, Karthik Kappaganthu, Yiming Yang, Shyam Upadhyay, Manaal Faruqui, Mausam

Abstract: Large language models (LLMs) are now available from cloud API providers in various sizes and configurations. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present Automix, an approach that strategically routes queries to larger LMs, based on the approximate correctness… ▽ More Large language models (LLMs) are now available from cloud API providers in various sizes and configurations. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present Automix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to Automix are two key technical contributions. First, it has a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring extensive training. Second, given that self-verification can be noisy, it employs a POMDP based router that can effectively select an appropriately sized model, based on answer confidence. Experiments across five language models and five challenging datasets show that Automix consistently surpasses strong baselines, reducing computational cost by over 50% for comparable performance. △ Less

Submitted 28 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: The first two authors contributed equally. Work started and partly done during Aman's internship at Google. This version adds results on additional models and datasets

arXiv:2310.03051 [pdf, other]

How FaR Are Large Language Models From Agents with Theory-of-Mind?

Authors: Pei Zhou, Aman Madaan, Srividya Pranavi Potharaju, Aditya Gupta, Kevin R. McKee, Ari Holtzman, Jay Pujara, Xiang Ren, Swaroop Mishra, Aida Nematzadeh, Shyam Upadhyay, Manaal Faruqui

Abstract: "Thinking is for Doing." Humans can infer other people's mental states from observations--an ability called Theory-of-Mind (ToM)--and subsequently act pragmatically on those inferences. Existing question answering benchmarks such as ToMi ask models questions to make inferences about beliefs of characters in a story, but do not test whether models can then use these inferences to guide their action… ▽ More "Thinking is for Doing." Humans can infer other people's mental states from observations--an ability called Theory-of-Mind (ToM)--and subsequently act pragmatically on those inferences. Existing question answering benchmarks such as ToMi ask models questions to make inferences about beliefs of characters in a story, but do not test whether models can then use these inferences to guide their actions. We propose a new evaluation paradigm for large language models (LLMs): Thinking for Doing (T4D), which requires models to connect inferences about others' mental states to actions in social scenarios. Experiments on T4D demonstrate that LLMs such as GPT-4 and PaLM 2 seemingly excel at tracking characters' beliefs in stories, but they struggle to translate this capability into strategic action. Our analysis reveals the core challenge for LLMs lies in identifying the implicit inferences about mental states without being explicitly asked about as in ToMi, that lead to choosing the correct action in T4D. To bridge this gap, we introduce a zero-shot prompting framework, Foresee and Reflect (FaR), which provides a reasoning structure that encourages LLMs to anticipate future challenges and reason about potential actions. FaR boosts GPT-4's performance from 50% to 71% on T4D, outperforming other prompting methods such as Chain-of-Thought and Self-Ask. Moreover, FaR generalizes to diverse out-of-distribution story structures and scenarios that also require ToM inferences to choose an action, consistently outperforming other methods including few-shot in-context learning. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: Preprint, 18 pages, 6 figures, 6 tables

arXiv:2307.10633 [pdf, other]

Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa

Authors: Shriyash K. Upadhyay, Etan J. Ginsberg

Abstract: Large Language Models have many methods for solving the same problem. This introduces novel strengths (different methods may work well for different problems) and weaknesses (it may be difficult for users to know which method to use). In this paper, we introduce Multi-Method Self-Training (MMST), where one method is trained on the filtered outputs of another, allowing us to augment the strengths a… ▽ More Large Language Models have many methods for solving the same problem. This introduces novel strengths (different methods may work well for different problems) and weaknesses (it may be difficult for users to know which method to use). In this paper, we introduce Multi-Method Self-Training (MMST), where one method is trained on the filtered outputs of another, allowing us to augment the strengths and ameliorate the weaknesses of each method. Using a 176B parameter model trained on both language and code, we show that MMST can 1) improve the less performant method (up to 30%) making the model easier to use, 2) improve the more performant method (up to 32.2%) making the model more performant, and 3) improve the performance of related but distinct tasks (up to 10.3%) by improving the ability of the model to generate rationales. We then conduct ablation analyses to explore why MMST works. We show that MMST generates more data than traditional self-training, but the improvement in performance is driven by the use of multiple methods. We also analyze prompt-engineering and anti-correlated performance between methods as means of making MMST more effective. We hope the evidence from our paper motivates machine learning researchers to explore ways in which advances in language models allow for new forms of training. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 23 pages, 3 figures

arXiv:2307.08970 [pdf, other]

A Unifying Framework for Differentially Private Sums under Continual Observation

Authors: Monika Henzinger, Jalaj Upadhyay, Sarvagya Upadhyay

Abstract: We study the problem of maintaining a differentially private decaying sum under continual observation. We give a unifying framework and an efficient algorithm for this problem for \emph{any sufficiently smooth} function. Our algorithm is the first differentially private algorithm that does not have a multiplicative error for polynomially-decaying weights. Our algorithm improves on all prior works… ▽ More We study the problem of maintaining a differentially private decaying sum under continual observation. We give a unifying framework and an efficient algorithm for this problem for \emph{any sufficiently smooth} function. Our algorithm is the first differentially private algorithm that does not have a multiplicative error for polynomially-decaying weights. Our algorithm improves on all prior works on differentially private decaying sums under continual observation and recovers exactly the additive error for the special case of continual counting from Henzinger et al. (SODA 2023) as a corollary. Our algorithm is a variant of the factorization mechanism whose error depends on the $γ_2$ and $γ_F$ norm of the underlying matrix. We give a constructive proof for an almost exact upper bound on the $γ_2$ and $γ_F$ norm and an almost tight lower bound on the $γ_2$ norm for a large class of lower-triangular matrices. This is the first non-trivial lower bound for lower-triangular matrices whose non-zero entries are not all the same. It includes matrices for all continual decaying sums problems, resulting in an upper bound on the additive error of any differentially private decaying sums algorithm under continual observation. We also explore some implications of our result in discrepancy theory and operator algebra. Given the importance of the $γ_2$ norm in computer science and the extensive work in mathematics, we believe our result will have further applications. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 32 pages

arXiv:2303.05432 [pdf, other]

doi 10.1007/s42001-023-00229-4

Describing the effect of influential spreaders on the different sectors of Indian market: a complex networks perspective

Authors: Anwesha Sengupta, Shashankaditya Upadhyay, Indranil Mukherjee, Prasanta K. Panigrahi

Abstract: Market competition has a role which is directly or indirectly associated with influential effects of individual sectors on other sectors of the economy. The present work studies the relative position of a product in the market through the identification of influential spreaders and its corresponding effect on the other sectors of the market using complex network analysis during the pre-, in-, and… ▽ More Market competition has a role which is directly or indirectly associated with influential effects of individual sectors on other sectors of the economy. The present work studies the relative position of a product in the market through the identification of influential spreaders and its corresponding effect on the other sectors of the market using complex network analysis during the pre-, in-, and post-crisis induced lockdown periods using daily data of NSE from December, 2019 to June, 2021. The existing approaches using different centrality measures failed to distinguish between the positive and negative influences of the different sectors in the market which act as spreaders. To obviate this problem, this paper presents an effective measure called LIEST (Local Influential Effects for Specific Target) that can examine the positive and negative influences separately with respect to any crisis period. LIEST considers the combined impact of all possible nodes which are at most three steps away from the specific targets for the networks. The essence of non-linearity in the network dynamics without considering single node effect becomes visible particularly in the proposed network. △ Less

Submitted 17 October, 2022; originally announced March 2023.

Comments: 17 pages, 21 figures

Journal ref: J Comput Soc Sc (2023) 1-41

arXiv:2301.09244 [pdf, other]

Efficient Encoders for Streaming Sequence Tagging

Authors: Ayush Kaushal, Aditya Gupta, Shyam Upadhyay, Manaal Faruqui

Abstract: A naive application of state-of-the-art bidirectional encoders for streaming sequence tagging would require encoding each token from scratch for each new token in an incremental streaming input (like transcribed speech). The lack of re-usability of previous computation leads to a higher number of Floating Point Operations (or FLOPs) and higher number of unnecessary label flips. Increased FLOPs con… ▽ More A naive application of state-of-the-art bidirectional encoders for streaming sequence tagging would require encoding each token from scratch for each new token in an incremental streaming input (like transcribed speech). The lack of re-usability of previous computation leads to a higher number of Floating Point Operations (or FLOPs) and higher number of unnecessary label flips. Increased FLOPs consequently lead to higher wall-clock time and increased label flipping leads to poorer streaming performance. In this work, we present a Hybrid Encoder with Adaptive Restart (HEAR) that addresses these issues while maintaining the performance of bidirectional encoders over the offline (or complete) inputs while improving performance on streaming (or incomplete) inputs. HEAR has a Hybrid unidirectional-bidirectional encoder architecture to perform sequence tagging, along with an Adaptive Restart Module (ARM) to selectively guide the restart of bidirectional portion of the encoder. Across four sequence tagging tasks, HEAR offers FLOP savings in streaming settings upto 71.1% and also outperforms bidirectional encoders for streaming predictions by upto +10% streaming exact match. △ Less

Submitted 16 March, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

Comments: EACL 2023 Camera-ready

arXiv:2212.03495 [pdf, other]

Metric Elicitation; Moving from Theory to Practice

Authors: Safinah Ali, Sohini Upadhyay, Gaurush Hiranandani, Elena L. Glassman, Oluwasanmi Koyejo

Abstract: Metric Elicitation (ME) is a framework for eliciting classification metrics that better align with implicit user preferences based on the task and context. The existing ME strategy so far is based on the assumption that users can most easily provide preference feedback over classifier statistics such as confusion matrices. This work examines ME, by providing a first ever implementation of the ME s… ▽ More Metric Elicitation (ME) is a framework for eliciting classification metrics that better align with implicit user preferences based on the task and context. The existing ME strategy so far is based on the assumption that users can most easily provide preference feedback over classifier statistics such as confusion matrices. This work examines ME, by providing a first ever implementation of the ME strategy. Specifically, we create a web-based ME interface and conduct a user study that elicits users' preferred metrics in a binary classification setting. We discuss the study findings and present guidelines for future research in this direction. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: The paper to appear at Human-Centered AI workshop at NeurIPS, 2022. arXiv admin note: text overlap with arXiv:2208.09142

arXiv:2211.07514 [pdf, other]

CST5: Data Augmentation for Code-Switched Semantic Parsing

Authors: Anmol Agarwal, Jigar Gupta, Rahul Goel, Shyam Upadhyay, Pankaj Joshi, Rengarajan Aravamudhan

Abstract: Extending semantic parsers to code-switched input has been a challenging problem, primarily due to a lack of supervised training data. In this work, we introduce CST5, a new data augmentation technique that finetunes a T5 model using a small seed set ($\approx$100 utterances) to generate code-switched utterances from English utterances. We show that CST5 generates high quality code-switched data,… ▽ More Extending semantic parsers to code-switched input has been a challenging problem, primarily due to a lack of supervised training data. In this work, we introduce CST5, a new data augmentation technique that finetunes a T5 model using a small seed set ($\approx$100 utterances) to generate code-switched utterances from English utterances. We show that CST5 generates high quality code-switched data, both intrinsically (per human evaluation) and extrinsically by comparing baseline models which are trained without data augmentation to models which are trained with augmented data. Empirically we observe that using CST5, one can achieve the same semantic parsing performance by using up to 20x less labeled data. To aid further research in this area, we are also releasing (a) Hinglish-TOP, the largest human annotated code-switched semantic parsing dataset to date, containing 10k human annotated Hindi-English (Hinglish) code-switched utterances, and (b) Over 170K CST5 generated code-switched utterances from the TOPv2 dataset. Human evaluation shows that both the human annotated data as well as the CST5 generated data is of good quality. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2211.05006 [pdf, other]

Almost Tight Error Bounds on Differentially Private Continual Counting

Authors: Monika Henzinger, Jalaj Upadhyay, Sarvagya Upadhyay

Abstract: The first large-scale deployment of private federated learning uses differentially private counting in the continual release model as a subroutine (Google AI blog titled "Federated Learning with Formal Differential Privacy Guarantees"). In this case, a concrete bound on the error is very relevant to reduce the privacy parameter. The standard mechanism for continual counting is the binary mechanism… ▽ More The first large-scale deployment of private federated learning uses differentially private counting in the continual release model as a subroutine (Google AI blog titled "Federated Learning with Formal Differential Privacy Guarantees"). In this case, a concrete bound on the error is very relevant to reduce the privacy parameter. The standard mechanism for continual counting is the binary mechanism. We present a novel mechanism and show that its mean squared error is both asymptotically optimal and a factor 10 smaller than the error of the binary mechanism. We also show that the constants in our analysis are almost tight by giving non-asymptotic lower and upper bounds that differ only in the constants of lower-order terms. Our algorithm is a matrix mechanism for the counting matrix and takes constant time per release. We also use our explicit factorization of the counting matrix to give an upper bound on the excess risk of the private learning algorithm of Denisov et al. (NeurIPS 2022). Our lower bound for any continual counting mechanism is the first tight lower bound on continual counting under approximate differential privacy. It is achieved using a new lower bound on a certain factorization norm, denoted by $γ_F(\cdot)$, in terms of the singular values of the matrix. In particular, we show that for any complex matrix, $A \in \mathbb{C}^{m \times n}$, \[ γ_F(A) \geq \frac{1}{\sqrt{m}}\|A\|_1, \] where $\|\cdot \|$ denotes the Schatten-1 norm. We believe this technique will be useful in proving lower bounds for a larger class of linear queries. To illustrate the power of this technique, we show the first lower bound on the mean squared error for answering parity queries. △ Less

Submitted 5 February, 2024; v1 submitted 9 November, 2022; originally announced November 2022.

Comments: Updated the citations to include two papers we learned about since version 01

arXiv:2208.13322 [pdf, other]

Streaming Intended Query Detection using E2E Modeling for Continued Conversation

Authors: Shuo-yiin Chang, Guru Prakash, Zelin Wu, Qiao Liang, Tara N. Sainath, Bo Li, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman

Abstract: In voice-enabled applications, a predetermined hotword isusually used to activate a device in order to attend to the query.However, speaking queries followed by a hotword each timeintroduces a cognitive burden in continued conversations. Toavoid repeating a hotword, we propose a streaming end-to-end(E2E) intended query detector that identifies the utterancesdirected towards the device and filters… ▽ More In voice-enabled applications, a predetermined hotword isusually used to activate a device in order to attend to the query.However, speaking queries followed by a hotword each timeintroduces a cognitive burden in continued conversations. Toavoid repeating a hotword, we propose a streaming end-to-end(E2E) intended query detector that identifies the utterancesdirected towards the device and filters out other utterancesnot directed towards device. The proposed approach incor-porates the intended query detector into the E2E model thatalready folds different components of the speech recognitionpipeline into one neural network.The E2E modeling onspeech decoding and intended query detection also allows us todeclare a quick intended query detection based on early partialrecognition result, which is important to decrease latencyand make the system responsive. We demonstrate that theproposed E2E approach yields a 22% relative improvement onequal error rate (EER) for the detection accuracy and 600 mslatency improvement compared with an independent intendedquery detector. In our experiment, the proposed model detectswhether the user is talking to the device with a 8.7% EERwithin 1.4 seconds of median latency after user starts speaking. △ Less

Submitted 28 August, 2022; originally announced August 2022.

Comments: 5 pages, Interspeech 2022

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2205.12816 [pdf, other]

P4Filter: A two level defensive mechanism against attacks in SDN using P4

Authors: Ananya Saxena, Ritvik Muttreja, Shivam Upadhyay, K. Shiv Kumar, Venkanna U

Abstract: The advancements in networking technologies have led to a new paradigm of controlling networks, with data plane programmability as a basis. This facility opens up many advantages, such as flexibility in packet processing and better network management, which leads to better security in the network. However, the current literature lacks network security solutions concerning authentication and preven… ▽ More The advancements in networking technologies have led to a new paradigm of controlling networks, with data plane programmability as a basis. This facility opens up many advantages, such as flexibility in packet processing and better network management, which leads to better security in the network. However, the current literature lacks network security solutions concerning authentication and preventing unauthorized access. In this work, our goal is to avoid attacks in a two level defense mechanism (P4Filter). The first level is a dynamic firewall logic, which blocks packets generated from an unauthorized source. The second level is an authentication mechanism based on dynamic port knocking. The two security levels were tested in a virtual environment with P4 based switches. The packets arriving at the switch from unknown hosts are sent to the controller. The controller maintains an ACL using which it assigns rules for both the levels to allow or drop the packets. For port knocking a new random sequence is generated for every new host. Hosts can only connect using the correct sequence assigned to them.The tests conducted show this approach performs better than the previous P4 based firewall approaches due to two security levels. Moreover, it is successful in mitigating specific security attacks by blocking unauthorized access to the network. △ Less

Submitted 6 June, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.07277 [pdf, other]

doi 10.1145/3514094.3534159

Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations

Authors: Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach, Himabindu Lakkaraju

Abstract: As post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to ensure that the quality of the resulting explanations is consistently high across various population subgroups including the minority groups. For instance, it should not be the case that explanations associated with instances belonging to a particular gender su… ▽ More As post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to ensure that the quality of the resulting explanations is consistently high across various population subgroups including the minority groups. For instance, it should not be the case that explanations associated with instances belonging to a particular gender subgroup (e.g., female) are less accurate than those associated with other genders. However, there is little to no research that assesses if there exist such group-based disparities in the quality of the explanations output by state-of-the-art explanation methods. In this work, we address the aforementioned gaps by initiating the study of identifying group-based disparities in explanation quality. To this end, we first outline the key properties which constitute explanation quality and where disparities can be particularly problematic. We then leverage these properties to propose a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art methods. Using this framework, we carry out a rigorous empirical analysis to understand if and when group-based disparities in explanation quality arise. Our results indicate that such disparities are more likely to occur when the models being explained are complex and highly non-linear. In addition, we also observe that certain post hoc explanation methods (e.g., Integrated Gradients, SHAP) are more likely to exhibit the aforementioned disparities. To the best of our knowledge, this work is the first to highlight and study the problem of group-based disparities in explanation quality. In doing so, our work sheds light on previously unexplored ways in which explanation methods may introduce unfairness in real world decision making. △ Less

Submitted 1 July, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

Comments: As presented at AIES 2022

arXiv:2203.15272 [pdf, other]

Sparse Image based Navigation Architecture to Mitigate the need of precise Localization in Mobile Robots

Authors: Pranay Mathur, Rajesh Kumar, Sarthak Upadhyay

Abstract: Traditional simultaneous localization and mapping (SLAM) methods focus on improvement in the robot's localization under environment and sensor uncertainty. This paper, however, focuses on mitigating the need for exact localization of a mobile robot to pursue autonomous navigation using a sparse set of images. The proposed method consists of a model architecture - RoomNet, for unsupervised learning… ▽ More Traditional simultaneous localization and mapping (SLAM) methods focus on improvement in the robot's localization under environment and sensor uncertainty. This paper, however, focuses on mitigating the need for exact localization of a mobile robot to pursue autonomous navigation using a sparse set of images. The proposed method consists of a model architecture - RoomNet, for unsupervised learning resulting in a coarse identification of the environment and a separate local navigation policy for local identification and navigation. The former learns and predicts the scene based on the short term image sequences seen by the robot along with the transition image scenarios using long term image sequences. The latter uses sparse image matching to characterise the similarity of frames achieved vis-a-vis the frames viewed by the robot during the mapping and training stage. A sparse graph of the image sequence is created which is then used to carry out robust navigation purely on the basis of visual goals. The proposed approach is evaluated on two robots in a test environment and demonstrates the ability to navigate in dynamic environments where landmarks are obscured and classical localization methods fail. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: 7 Pages, 4 figures

arXiv:2203.08685 [pdf, other]

A Feasibility Study of Answer-Agnostic Question Generation for Education

Authors: Liam Dugan, Eleni Miltsakaki, Shriyash Upadhyay, Etan Ginsberg, Hannah Gonzalez, Dayheon Choi, Chuning Yuan, Chris Callison-Burch

Abstract: We conduct a feasibility study into the applicability of answer-agnostic question generation models to textbook passages. We show that a significant portion of errors in such systems arise from asking irrelevant or uninterpretable questions and that such errors can be ameliorated by providing summarized input. We find that giving these models human-written summaries instead of the original text re… ▽ More We conduct a feasibility study into the applicability of answer-agnostic question generation models to textbook passages. We show that a significant portion of errors in such systems arise from asking irrelevant or uninterpretable questions and that such errors can be ameliorated by providing summarized input. We find that giving these models human-written summaries instead of the original text results in a significant increase in acceptability of generated questions (33% $\rightarrow$ 83%) as determined by expert annotators. We also find that, in the absence of human-written summaries, automatic summarization can serve as a good middle ground. △ Less

Submitted 29 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

Comments: To be published in 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)

ACM Class: I.2.7

arXiv:2203.05931 [pdf, other]

FedSyn: Synthetic Data Generation using Federated Learning

Authors: Monik Raj Behera, Sudhir Upadhyay, Suresh Shetty, Sudha Priyadarshini, Palka Patel, Ker Farn Lee

Abstract: As Deep Learning algorithms continue to evolve and become more sophisticated, they require massive datasets for model training and efficacy of models. Some of those data requirements can be met with the help of existing datasets within the organizations. Current Machine Learning practices can be leveraged to generate synthetic data from an existing dataset. Further, it is well established that div… ▽ More As Deep Learning algorithms continue to evolve and become more sophisticated, they require massive datasets for model training and efficacy of models. Some of those data requirements can be met with the help of existing datasets within the organizations. Current Machine Learning practices can be leveraged to generate synthetic data from an existing dataset. Further, it is well established that diversity in generated synthetic data relies on (and is perhaps limited by) statistical properties of available dataset within a single organization or entity. The more diverse an existing dataset is, the more expressive and generic synthetic data can be. However, given the scarcity of underlying data, it is challenging to collate big data in one organization. The diverse, non-overlapping dataset across distinct organizations provides an opportunity for them to contribute their limited distinct data to a larger pool that can be leveraged to further synthesize. Unfortunately, this raises data privacy concerns that some institutions may not be comfortable with. This paper proposes a novel approach to generate synthetic data - FedSyn. FedSyn is a collaborative, privacy preserving approach to generate synthetic data among multiple participants in a federated and collaborative network. FedSyn creates a synthetic data generation model, which can generate synthetic data consisting of statistical distribution of almost all the participants in the network. FedSyn does not require access to the data of an individual participant, hence protecting the privacy of participant's data. The proposed technique in this paper leverages federated machine learning and generative adversarial network (GAN) as neural network architecture for synthetic data generation. The proposed method can be extended to many machine learning problem classes in finance, health, governance, technology and many more. △ Less

Submitted 5 April, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

arXiv:2203.00274 [pdf, other]

TableFormer: Robust Transformer Modeling for Table-Text Encoding

Authors: Jingfeng Yang, Aditya Gupta, Shyam Upadhyay, Luheng He, Rahul Goel, Shachi Paul

Abstract: Understanding tables is an important aspect of natural language understanding. Existing models for table understanding require linearization of the table structure, where row or column order is encoded as an unwanted bias. Such spurious biases make the model vulnerable to row and column order perturbations. Additionally, prior work has not thoroughly modeled the table structures or table-text alig… ▽ More Understanding tables is an important aspect of natural language understanding. Existing models for table understanding require linearization of the table structure, where row or column order is encoded as an unwanted bias. Such spurious biases make the model vulnerable to row and column order perturbations. Additionally, prior work has not thoroughly modeled the table structures or table-text alignments, hindering the table-text understanding ability. In this work, we propose a robust and structurally aware table-text encoding architecture TableFormer, where tabular structural biases are incorporated completely through learnable attention biases. TableFormer is (1) strictly invariant to row and column orders, and, (2) could understand tables better due to its tabular inductive biases. Our evaluations showed that TableFormer outperforms strong baselines in all settings on SQA, WTQ and TabFact table reasoning datasets, and achieves state-of-the-art performance on SQA, especially when facing answer-invariant row and column order perturbations (6% improvement over the best baseline), because previous SOTA models' performance drops by 4% - 6% when facing such perturbations while TableFormer is not affected. △ Less

Submitted 3 May, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: ACL 2022, 10 pages

arXiv:2202.07764 [pdf, other]

doi 10.1088/2058-9565/acd1a8

Paving the Way towards 800 Gbps Quantum-Secured Optical Channel Deployment in Mission-Critical Environments

Authors: Marco Pistoia, Omar Amer, Monik R. Behera, Joseph A. Dolphin, James F. Dynes, Benny John, Paul A. Haigh, Yasushi Kawakura, David H. Kramer, Jeffrey Lyon, Navid Moazzami, Tulasi D. Movva, Antigoni Polychroniadou, Suresh Shetty, Greg Sysak, Farzam Toudeh-Fallah, Sudhir Upadhyay, Robert I. Woodward, Andrew J. Shields

Abstract: This article describes experimental research studies conducted towards understanding the implementation aspects of high-capacity quantum-secured optical channels in mission-critical metro-scale operational environments using Quantum Key Distribution (QKD) technology. To the best of our knowledge, this is the first time that an 800 Gbps quantum-secured optical channel -- along with several other De… ▽ More This article describes experimental research studies conducted towards understanding the implementation aspects of high-capacity quantum-secured optical channels in mission-critical metro-scale operational environments using Quantum Key Distribution (QKD) technology. To the best of our knowledge, this is the first time that an 800 Gbps quantum-secured optical channel -- along with several other Dense Wavelength Division Multiplexed (DWDM) channels on the C-band and multiplexed with the QKD channel on the O-band -- was established at distances up to 100 km, with secret key-rates relevant for practical industry use cases. In addition, during the course of these trials, transporting a blockchain application over this established channel was utilized as a demonstration of securing a financial transaction in transit over a quantum-secured optical channel. The findings of this research pave the way towards the deployment of QKD-secured optical channels in high-capacity, metro-scale, mission-critical operational environments, such as Inter-Data Center Interconnects. △ Less

Submitted 2 March, 2023; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: 11 pages, 9 figures, 2 tables

Journal ref: Quantum Science and Technology, Institute of Physics, May 2023

arXiv:2108.04371 [pdf, other]

Extending LIME for Business Process Automation

Authors: Sohini Upadhyay, Vatche Isahagian, Vinod Muthusamy, Yara Rizk

Abstract: AI business process applications automate high-stakes business decisions where there is an increasing demand to justify or explain the rationale behind algorithmic decisions. Business process applications have ordering or constraints on tasks and feature values that cause lightweight, model-agnostic, existing explanation methods like LIME to fail. In response, we propose a local explanation framew… ▽ More AI business process applications automate high-stakes business decisions where there is an increasing demand to justify or explain the rationale behind algorithmic decisions. Business process applications have ordering or constraints on tasks and feature values that cause lightweight, model-agnostic, existing explanation methods like LIME to fail. In response, we propose a local explanation framework extending LIME for explaining AI business process applications. Empirical evaluation of our extension underscores the advantage of our approach in the business process setting. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2107.10243 [pdf, other]

Federated Learning using Smart Contracts on Blockchains, based on Reward Driven Approach

Authors: Monik Raj Behera, Sudhir Upadhyay, Suresh Shetty

Abstract: Over the recent years, Federated machine learning continues to gain interest and momentum where there is a need to draw insights from data while preserving the data provider's privacy. However, one among other existing challenges in the adoption of federated learning has been the lack of fair, transparent and universally agreed incentivization schemes for rewarding the federated learning contribut… ▽ More Over the recent years, Federated machine learning continues to gain interest and momentum where there is a need to draw insights from data while preserving the data provider's privacy. However, one among other existing challenges in the adoption of federated learning has been the lack of fair, transparent and universally agreed incentivization schemes for rewarding the federated learning contributors. Smart contracts on a blockchain network provide transparent, immutable and independently verifiable proofs by all participants of the network. We leverage this open and transparent nature of smart contracts on a blockchain to define incentivization rules for the contributors, which is based on a novel scalar quantity - federated contribution. Such a smart contract based reward-driven model has the potential to revolutionize the federated learning adoption in enterprises. Our contribution is two-fold: first is to show how smart contract based blockchain can be a very natural communication channel for federated learning. Second, leveraging this infrastructure, we can show how an intuitive measure of each agents' contribution can be built and integrated with the life cycle of the training and reward process. △ Less

Submitted 25 March, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: 9 pages, 7 figures and 1 table

arXiv:2106.13346 [pdf, other]

What will it take to generate fairness-preserving explanations?

Authors: Jessica Dai, Sohini Upadhyay, Stephen H. Bach, Himabindu Lakkaraju

Abstract: In situations where explanations of black-box models may be useful, the fairness of the black-box is also often a relevant concern. However, the link between the fairness of the black-box model and the behavior of explanations for the black-box is unclear. We focus on explanations applied to tabular datasets, suggesting that explanations do not necessarily preserve the fairness properties of the b… ▽ More In situations where explanations of black-box models may be useful, the fairness of the black-box is also often a relevant concern. However, the link between the fairness of the black-box model and the behavior of explanations for the black-box is unclear. We focus on explanations applied to tabular datasets, suggesting that explanations do not necessarily preserve the fairness properties of the black-box algorithm. In other words, explanation algorithms can ignore or obscure critical relevant properties, creating incorrect or misleading explanations. More broadly, we propose future research directions for evaluating and generating explanations such that they are informative and relevant from a fairness perspective. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: Presented at ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI

arXiv:2106.09992 [pdf, other]

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

Authors: Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju

Abstract: As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice. Despite the growing popularity of counterfactual explanations, a deeper understanding of these explanations is still lacking. In this work, we systematically analyze counterfactual explanations throug… ▽ More As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice. Despite the growing popularity of counterfactual explanations, a deeper understanding of these explanations is still lacking. In this work, we systematically analyze counterfactual explanations through the lens of adversarial examples. We do so by formalizing the similarities between popular counterfactual explanation and adversarial example generation methods identifying conditions when they are equivalent. We then derive the upper bounds on the distances between the solutions output by counterfactual explanation and adversarial example generation methods, which we validate on several real-world data sets. By establishing these theoretical and empirical similarities between counterfactual explanations and adversarial examples, our work raises fundamental questions about the design and development of existing counterfactual explanation algorithms. △ Less

Submitted 19 October, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS), 28-30 March 2022

arXiv:2106.04571 [pdf, other]

TIMEDIAL: Temporal Commonsense Reasoning in Dialog

Authors: Lianhui Qin, Aditya Gupta, Shyam Upadhyay, Luheng He, Yejin Choi, Manaal Faruqui

Abstract: Everyday conversations require understanding everyday events, which in turn, requires understanding temporal commonsense concepts interwoven with those events. Despite recent progress with massive pre-trained language models (LMs) such as T5 and GPT-3, their capability of temporal reasoning in dialogs remains largely under-explored. In this paper, we present the first study to investigate pre-trai… ▽ More Everyday conversations require understanding everyday events, which in turn, requires understanding temporal commonsense concepts interwoven with those events. Despite recent progress with massive pre-trained language models (LMs) such as T5 and GPT-3, their capability of temporal reasoning in dialogs remains largely under-explored. In this paper, we present the first study to investigate pre-trained LMs for their temporal reasoning capabilities in dialogs by introducing a new task and a crowd-sourced English challenge set, TIMEDIAL. We formulate TIME-DIAL as a multiple-choice cloze task with over 1.1K carefully curated dialogs. Empirical results demonstrate that even the best performing models struggle on this task compared to humans, with 23 absolute points of gap in accuracy. Furthermore, our analysis reveals that the models fail to reason about dialog context correctly; instead, they rely on shallow cues based on existing temporal patterns in context, motivating future research for modeling temporal concepts in text and robust contextual reasoning about them. The dataset is publicly available at: https://github.com/google-research-datasets/timedial. △ Less

Submitted 8 June, 2021; originally announced June 2021.

arXiv:2106.04016 [pdf, other]

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Authors: Aditya Gupta, Jiacheng Xu, Shyam Upadhyay, Diyi Yang, Manaal Faruqui

Abstract: Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual disfluencies in previously fluent questions. Disfl-QA contains a variety of challenging disflue… ▽ More Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual disfluencies in previously fluent questions. Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text than what was necessary in prior datasets. Experiments show that the performance of existing state-of-the-art question answering models degrades significantly when tested on Disfl-QA in a zero-shot setting.We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning. We argue that we need large-scale disfluency datasets in order for NLP models to be robust to them. The dataset is publicly available at: https://github.com/google-research-datasets/disfl-qa. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: Findings of ACL 2021

arXiv:2102.13620 [pdf, other]

Towards Robust and Reliable Algorithmic Recourse

Authors: Sohini Upadhyay, Shalmali Joshi, Himabindu Lakkaraju

Abstract: As predictive models are increasingly being deployed in high-stakes decision making (e.g., loan approvals), there has been growing interest in post hoc techniques which provide recourse to affected individuals. These techniques generate recourses under the assumption that the underlying predictive model does not change. However, in practice, models are often regularly updated for a variety of reas… ▽ More As predictive models are increasingly being deployed in high-stakes decision making (e.g., loan approvals), there has been growing interest in post hoc techniques which provide recourse to affected individuals. These techniques generate recourses under the assumption that the underlying predictive model does not change. However, in practice, models are often regularly updated for a variety of reasons (e.g., dataset shifts), thereby rendering previously prescribed recourses ineffective. To address this problem, we propose a novel framework, RObust Algorithmic Recourse (ROAR), that leverages adversarial training for finding recourses that are robust to model shifts. To the best of our knowledge, this work proposes the first solution to this critical problem. We also carry out detailed theoretical analysis which underscores the importance of constructing recourses that are robust to model shifts: 1) we derive a lower bound on the probability of invalidation of recourses generated by existing approaches which are not robust to model shifts. 2) we prove that the additional cost incurred due to the robust recourses output by our framework is bounded. Experimental evaluation on multiple synthetic and real-world datasets demonstrates the efficacy of the proposed framework and supports our theoretical findings. △ Less

Submitted 13 July, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

arXiv:2102.12082 [pdf, other]

Hopeful_Men@LT-EDI-EACL2021: Hope Speech Detection Using Indic Transliteration and Transformers

Authors: Ishan Sanjeev Upadhyay, Nikhil E, Anshul Wadhawan, Radhika Mamidi

Abstract: This paper aims to describe the approach we used to detect hope speech in the HopeEDI dataset. We experimented with two approaches. In the first approach, we used contextual embeddings to train classifiers using logistic regression, random forest, SVM, and LSTM based models.The second approach involved using a majority voting ensemble of 11 models which were obtained by fine-tuning pre-trained tra… ▽ More This paper aims to describe the approach we used to detect hope speech in the HopeEDI dataset. We experimented with two approaches. In the first approach, we used contextual embeddings to train classifiers using logistic regression, random forest, SVM, and LSTM based models.The second approach involved using a majority voting ensemble of 11 models which were obtained by fine-tuning pre-trained transformer models (BERT, ALBERT, RoBERTa, IndicBERT) after adding an output layer. We found that the second approach was superior for English, Tamil and Malayalam. Our solution got a weighted F1 score of 0.93, 0.75 and 0.49 for English,Malayalam and Tamil respectively. Our solution ranked first in English, eighth in Malayalam and eleventh in Tamil. △ Less

Submitted 24 February, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

arXiv:2102.10618 [pdf, other]

Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

Authors: Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, Himabindu Lakkaraju

Abstract: As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbati… ▽ More As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the explanations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real world datasets. △ Less

Submitted 19 July, 2021; v1 submitted 21 February, 2021; originally announced February 2021.

Comments: The short version of this paper appears in the proceedings of ICML-21

arXiv:2010.12574 [pdf, other]

Online Semi-Supervised Learning with Bandit Feedback

Authors: Sohini Upadhyay, Mikhail Yurochkin, Mayank Agarwal, Yasaman Khazaeni, DjallelBouneffouf

Abstract: We formulate a new problem at the intersectionof semi-supervised learning and contextual bandits,motivated by several applications including clini-cal trials and ad recommendations. We demonstratehow Graph Convolutional Network (GCN), a semi-supervised learning approach, can be adjusted tothe new problem formulation. We also propose avariant of the linear contextual bandit with semi-supervised mis… ▽ More We formulate a new problem at the intersectionof semi-supervised learning and contextual bandits,motivated by several applications including clini-cal trials and ad recommendations. We demonstratehow Graph Convolutional Network (GCN), a semi-supervised learning approach, can be adjusted tothe new problem formulation. We also propose avariant of the linear contextual bandit with semi-supervised missing rewards imputation. We thentake the best of both approaches to develop multi-GCN embedded contextual bandit. Our algorithmsare verified on several real world datasets. △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:2010.09473 [pdf, other]

Double-Linear Thompson Sampling for Context-Attentive Bandits

Authors: Djallel Bouneffouf, Raphaël Féraud, Sohini Upadhyay, Yasaman Khazaeni, Irina Rish

Abstract: In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration;however, the agent has a freedom to choose which variables to observe. We de… ▽ More In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration;however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: arXiv admin note: text overlap with arXiv:1906.09384

arXiv:2009.02668 [pdf, ps, other]

A Framework for Private Matrix Analysis

Authors: Jalaj Upadhyay, Sarvagya Upadhyay

Abstract: We study private matrix analysis in the sliding window model where only the last $W$ updates to matrices are considered useful for analysis. We give first efficient $o(W)$ space differentially private algorithms for spectral approximation, principal component analysis, and linear regression. We also initiate and show efficient differentially private algorithms for two important variants of princip… ▽ More We study private matrix analysis in the sliding window model where only the last $W$ updates to matrices are considered useful for analysis. We give first efficient $o(W)$ space differentially private algorithms for spectral approximation, principal component analysis, and linear regression. We also initiate and show efficient differentially private algorithms for two important variants of principal component analysis: sparse principal component analysis and non-negative principal component analysis. Prior to our work, no such result was known for sparse and non-negative differentially private principal component analysis even in the static data setting. These algorithms are obtained by identifying sufficient conditions on positive semidefinite matrices formed from streamed matrices. We also show a lower bound on space required to compute low-rank approximation even if the algorithm gives multiplicative approximation and incurs additive error. This follows via reduction to a certain communication complexity problem. △ Less

Submitted 6 September, 2020; originally announced September 2020.

Comments: 41 pages

arXiv:2007.06368 [pdf, other]

Contextual Bandit with Missing Rewards

Authors: Djallel Bouneffouf, Sohini Upadhyay, Yasaman Khazaeni

Abstract: We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the reward associated with each context-based decision may not always be observed("missing rewards"). This new problem is motivated by certain online settings including clinical trial and ad recommendation applications. In order to addre… ▽ More We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the reward associated with each context-based decision may not always be observed("missing rewards"). This new problem is motivated by certain online settings including clinical trial and ad recommendation applications. In order to address the missing rewards setting, we propose to combine the standard contextual bandit approach with an unsupervised learning mechanism such as clustering. Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on several real-life datasets. △ Less

Submitted 18 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

arXiv:2001.00658 [pdf, ps, other]

Compressed Quadratization of Higher Order Binary Optimization Problems

Authors: Avradip Mandal, Arnab Roy, Sarvagya Upadhyay, Hayato Ushijima-Mwesigwa

Abstract: Recent hardware advances in quantum and quantum-inspired annealers promise substantial speedup for solving NP-hard combinatorial optimization problems compared to general-purpose computers. These special-purpose hardware are built for solving hard instances of Quadratic Unconstrained Binary Optimization (QUBO) problems. In terms of number of variables and precision of these hardware are usually re… ▽ More Recent hardware advances in quantum and quantum-inspired annealers promise substantial speedup for solving NP-hard combinatorial optimization problems compared to general-purpose computers. These special-purpose hardware are built for solving hard instances of Quadratic Unconstrained Binary Optimization (QUBO) problems. In terms of number of variables and precision of these hardware are usually resource-constrained and they work either in Ising space {-1,1} or in Boolean space {0,1}. Many naturally occurring problem instances are higher-order in nature. The known method to reduce the degree of a higher-order optimization problem uses Rosenberg's polynomial. The method works in Boolean space by reducing the degree of one term by introducing one extra variable. In this work, we prove that in Ising space the degree reduction of one term requires the introduction of two variables. Our proposed method of degree reduction works directly in Ising space, as opposed to converting an Ising polynomial to Boolean space and applying previously known Rosenberg's polynomial. For sparse higher-order Ising problems, this results in a more compact representation of the resultant QUBO problem, which is crucial for utilizing resource-constrained QUBO solvers. △ Less

Submitted 2 January, 2020; originally announced January 2020.

arXiv:1911.09810 [pdf, other]

Leveraging Special-Purpose Hardware for Local Search Heuristics

Authors: Xiaoyuan Liu, Hayato Ushijima-Mwesigwa, Avradip Mandal, Sarvagya Upadhyay, Ilya Safro, Arnab Roy

Abstract: As we approach the physical limits predicted by Moore's law, a variety of specialized hardware is emerging to tackle specialized tasks in different domains. Within combinatorial optimization, adiabatic quantum computers, CMOS annealers, and optical parametric oscillators are few of the emerging specialized hardware technology aimed at solving optimization problems. In terms of mathematical framewo… ▽ More As we approach the physical limits predicted by Moore's law, a variety of specialized hardware is emerging to tackle specialized tasks in different domains. Within combinatorial optimization, adiabatic quantum computers, CMOS annealers, and optical parametric oscillators are few of the emerging specialized hardware technology aimed at solving optimization problems. In terms of mathematical framework, the Ising optimization model unifies all of these emerging special-purpose hardware. In other words, they are all designed to solve optimization problems expressed in the Ising model or equivalently as a quadratic unconstrained binary optimization model. Due to variety of constraints specific to each type of hardware, they usually suffer from a major challenge: the number of variables that the hardware can manage to solve is very limited. Given that large-scale practical problems, including problems in operations research, combinatorial scientific computing, data science and network science require significantly more variables to model than these devices provide, we are likely to witness that cloud-based deployments of these devices will be available for parallel and shared access. Thus hybrid techniques in combination with both hardware and software must be developed to utilize these technologies. Local search meta-heuristics is one of the approaches to tackle large scale problems. However, a general optimization step within local search is not traditionally formulated in the Ising form. In this work, we propose a new meta-heuristic to model local search in the Ising form for the special-purpose hardware devices. As such, we demonstrate that our method takes the limitations of the Ising model and current hardware into account, utilizes a given hardware more efficiently compared to previous approaches, while also producing high quality solutions compared to other well-known meta-heuristics. △ Less

Submitted 28 November, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

MSC Class: 68R05 90C27 90C59

arXiv:1909.11218 [pdf, other]

Attention Interpretability Across NLP Tasks

Authors: Shikhar Vashishth, Shyam Upadhyay, Gaurav Singh Tomar, Manaal Faruqui

Abstract: The attention layer in a neural network model provides insights into the model's reasoning behind its prediction, which are usually criticized for being opaque. Recently, seemingly contradictory viewpoints have emerged about the interpretability of attention weights (Jain & Wallace, 2019; Vig & Belinkov, 2019). Amid such confusion arises the need to understand attention mechanism more systematical… ▽ More The attention layer in a neural network model provides insights into the model's reasoning behind its prediction, which are usually criticized for being opaque. Recently, seemingly contradictory viewpoints have emerged about the interpretability of attention weights (Jain & Wallace, 2019; Vig & Belinkov, 2019). Amid such confusion arises the need to understand attention mechanism more systematically. In this work, we attempt to fill this gap by giving a comprehensive explanation which justifies both kinds of observations (i.e., when is attention interpretable and when it is not). Through a series of experiments on diverse NLP tasks, we validate our observations and reinforce our claim of interpretability of attention through manual evaluation. △ Less

Submitted 24 September, 2019; originally announced September 2019.

Report number: 2019

arXiv:1908.09453 [pdf, other]

OpenSpiel: A Framework for Reinforcement Learning in Games

Authors: Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes , et al. (2 additional authors not shown)

Abstract: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partia… ▽ More OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partially- and fully- observable) grid worlds and social dilemmas. OpenSpiel also includes tools to analyze learning dynamics and other common evaluation metrics. This document serves both as an overview of the code base and an introduction to the terminology, core concepts, and algorithms across the fields of reinforcement learning, computational game theory, and search. △ Less

Submitted 26 September, 2020; v1 submitted 25 August, 2019; originally announced August 2019.

arXiv:1906.09384 [pdf, other]

A Bandit Approach to Posterior Dialog Orchestration Under a Budget

Authors: Sohini Upadhyay, Mayank Agarwal, Djallel Bounneffouf, Yasaman Khazaeni

Abstract: Building multi-domain AI agents is a challenging task and an open problem in the area of AI. Within the domain of dialog, the ability to orchestrate multiple independently trained dialog agents, or skills, to create a unified system is of particular significance. In this work, we study the task of online posterior dialog orchestration, where we define posterior orchestration as the task of selecti… ▽ More Building multi-domain AI agents is a challenging task and an open problem in the area of AI. Within the domain of dialog, the ability to orchestrate multiple independently trained dialog agents, or skills, to create a unified system is of particular significance. In this work, we study the task of online posterior dialog orchestration, where we define posterior orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. To account for the various costs associated with extracting skill features, we consider online posterior orchestration under a skill execution budget. We formalize this setting as Context Attentive Bandit with Observations (CABO), a variant of context attentive bandits, and evaluate it on simulated non-conversational and proprietary conversational datasets. △ Less

Submitted 22 June, 2019; originally announced June 2019.

Comments: 2nd Conversational AI Workshop, NeurIPS 2018

arXiv:1809.07807 [pdf]

Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Authors: Shyam Upadhyay, Jordan Kodner, Dan Roth

Abstract: Generating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction. Existing approaches to transliteration generation require a large (>5000) number of training examples. This difficulty contrasts with transliteration discovery, a somewhat easier task that involves picking a plausible tr… ▽ More Generating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction. Existing approaches to transliteration generation require a large (>5000) number of training examples. This difficulty contrasts with transliteration discovery, a somewhat easier task that involves picking a plausible transliteration from a given list. In this work, we present a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which we show can be sourced from annotators in a matter of hours. This opens the task to languages for which large number of training examples are unavailable. We evaluate transliteration generation performance itself, as well the improvement it brings to cross-lingual candidate generation for entity linking, a typical downstream task. We present a comprehensive evaluation of our approach on nine languages, each written in a unique script. △ Less

Submitted 20 September, 2018; originally announced September 2018.

Comments: EMNLP 2018

arXiv:1809.07657 [pdf, other]

Joint Multilingual Supervision for Cross-lingual Entity Linking

Authors: Shyam Upadhyay, Nitish Gupta, Dan Roth

Abstract: Cross-lingual Entity Linking (XEL) aims to ground entity mentions written in any language to an English Knowledge Base (KB), such as Wikipedia. XEL for most languages is challenging, owing to limited availability of resources as supervision. We address this challenge by developing the first XEL approach that combines supervision from multiple languages jointly. This enables our approach to: (a) au… ▽ More Cross-lingual Entity Linking (XEL) aims to ground entity mentions written in any language to an English Knowledge Base (KB), such as Wikipedia. XEL for most languages is challenging, owing to limited availability of resources as supervision. We address this challenge by developing the first XEL approach that combines supervision from multiple languages jointly. This enables our approach to: (a) augment the limited supervision in the target language with additional supervision from a high-resource language (like English), and (b) train a single entity linking model for multiple languages, improving upon individually trained models for each language. Extensive evaluation on three benchmark datasets across 8 languages shows that our approach significantly improves over the current state-of-the-art. We also provide analyses in two limited resource settings: (a) zero-shot setting, when no supervision in the target language is available, and in (b) low-resource setting, when some supervision in the target language is available. Our analysis provides insights into the limitations of zero-shot XEL approaches in realistic scenarios, and shows the value of joint supervision in low-resource settings. △ Less

Submitted 20 September, 2018; originally announced September 2018.

Comments: EMNLP 2018

arXiv:1805.07720 [pdf]

doi 10.1109/TMSCS.2018.2870592

A Deep Structure of Person Re-Identification using Multi-Level Gaussian Models

Authors: Dinesh Kumar Vishwakarma, Sakshi Upadhyay

Abstract: Person re-identification is being widely used in the forensic, and security and surveillance system, but person re-identification is a challenging task in real life scenario. Hence, in this work, a new feature descriptor model has been proposed using a multilayer framework of Gaussian distribution model on pixel features, which include color moments, color space values and Schmid filter responses.… ▽ More Person re-identification is being widely used in the forensic, and security and surveillance system, but person re-identification is a challenging task in real life scenario. Hence, in this work, a new feature descriptor model has been proposed using a multilayer framework of Gaussian distribution model on pixel features, which include color moments, color space values and Schmid filter responses. An image of a person usually consists of distinct body regions, usually with differentiable clothing followed by local colors and texture patterns. Thus, the image is evaluated locally by dividing the image into overlapping regions. Each region is further fragmented into a set of local Gaussians on small patches. A global Gaussian encodes, these local Gaussians for each region creating a multi-level structure. Hence, the global picture of a person is described by local level information present in it, which is often ignored. Also, we have analyzed the efficiency of earlier metric learning methods on this descriptor. The performance of the descriptor is evaluated on four public available challenging datasets and the highest accuracy achieved on these datasets are compared with similar state-of-the-arts, which demonstrate the superior performance. △ Less

Submitted 20 May, 2018; originally announced May 2018.

Comments: 9 pages

Report number: 8469037

Journal ref: IEEE Transactions on Multi-Scale Computing Systems 4 (2018) 513 - 521

Showing 1–50 of 60 results for author: Upadhyay, S