Search | arXiv e-print repository

Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game

Authors: Prisha Samadarshi, Mariam Mustafa, Anushka Kulkarni, Raven Rothkopf, Tuhin Chakrabarty, Smaranda Muresan

Abstract: The New York Times Connections game has emerged as a popular and challenging pursuit for word puzzle enthusiasts. We collect 200 Connections games to evaluate the performance of state-of-the-art large language models (LLMs) against expert and novice human players. Our results show that even the best-performing LLM, GPT-4o, which has otherwise shown impressive reasoning abilities on a wide variety… ▽ More The New York Times Connections game has emerged as a popular and challenging pursuit for word puzzle enthusiasts. We collect 200 Connections games to evaluate the performance of state-of-the-art large language models (LLMs) against expert and novice human players. Our results show that even the best-performing LLM, GPT-4o, which has otherwise shown impressive reasoning abilities on a wide variety of benchmarks, can only fully solve 8% of the games. Compared to GPT-4o, novice and expert players perform better, with expert human players significantly outperforming GPT-4o. To deepen our understanding we create a taxonomy of the knowledge types required to successfully categorize words in the Connections game, revealing that LLMs struggle with associative, encyclopedic, and linguistic knowledge. Our findings establish the New York Times Connections game as a challenging benchmark for evaluating abstract reasoning capabilities in humans and AI systems. △ Less

Submitted 15 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.09933 [pdf, other]

What Does it Take to Generalize SER Model Across Datasets? A Comprehensive Benchmark

Authors: Adham Ibrahim, Shady Shehata, Ajinkya Kulkarni, Mukhtar Mohamed, Muhammad Abdul-Mageed

Abstract: Speech emotion recognition (SER) is essential for enhancing human-computer interaction in speech-based applications. Despite improvements in specific emotional datasets, there is still a research gap in SER's capability to generalize across real-world situations. In this paper, we investigate approaches to generalize the SER system across different emotion datasets. In particular, we incorporate 1… ▽ More Speech emotion recognition (SER) is essential for enhancing human-computer interaction in speech-based applications. Despite improvements in specific emotional datasets, there is still a research gap in SER's capability to generalize across real-world situations. In this paper, we investigate approaches to generalize the SER system across different emotion datasets. In particular, we incorporate 11 emotional speech datasets and illustrate a comprehensive benchmark on the SER task. We also address the challenge of imbalanced data distribution using over-sampling methods when combining SER datasets for training. Furthermore, we explore various evaluation protocols for adeptness in the generalization of SER. Building on this, we explore the potential of Whisper for SER, emphasizing the importance of thorough evaluation. Our approach is designed to advance SER technology by integrating speaker-independent methods. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: ACCEPTED AT INTERSPEECH 2024, GREECE

arXiv:2406.09494 [pdf, other]

The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments

Authors: Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy

Abstract: The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this datas… ▽ More The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this dataset. The dataset containing 158 hours of speech, consisting of both supervised and unsupervised mono-channel far-field recordings, was released for LD and SD tracks. Further, 12 hours of close-field mono-channel recordings were provided for the ASR track conducted on 5 Indian languages. The details of the dataset, baseline systems and the leader board results are highlighted in this paper. We have also compared our baseline models and the team's performances on evaluation data of DISPLACE-2023 to emphasize the advancements made in this second version of the challenge. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures, Interspeech 2024

arXiv:2406.03506 [pdf]

Fuzzy Convolution Neural Networks for Tabular Data Classification

Authors: Arun D. Kulkarni

Abstract: Recently, convolution neural networks (CNNs) have attracted a great deal of attention due to their remarkable performance in various domains, particularly in image and text classification tasks. However, their application to tabular data classification remains underexplored. There are many fields such as bioinformatics, finance, medicine where nonimage data are prevalent. Adaption of CNNs to class… ▽ More Recently, convolution neural networks (CNNs) have attracted a great deal of attention due to their remarkable performance in various domains, particularly in image and text classification tasks. However, their application to tabular data classification remains underexplored. There are many fields such as bioinformatics, finance, medicine where nonimage data are prevalent. Adaption of CNNs to classify nonimage data remains highly challenging. This paper investigates the efficacy of CNNs for tabular data classification, aiming to bridge the gap between traditional machine learning approaches and deep learning techniques. We propose a novel framework fuzzy convolution neural network (FCNN) tailored specifically for tabular data to capture local patterns within feature vectors. In our approach, we map feature values to fuzzy memberships. The fuzzy membership vectors are converted into images that are used to train the CNN model. The trained CNN model is used to classify unknown feature vectors. To validate our approach, we generated six complex noisy data sets. We used randomly selected seventy percent samples from each data set for training and thirty percent for testing. The data sets were also classified using the state-of-the-art machine learning algorithms such as the decision tree (DT), support vector machine (SVM), fuzzy neural network (FNN), Bayes classifier, and Random Forest (RF). Experimental results demonstrate that our proposed model can effectively learn meaningful representations from tabular data, achieving competitive or superior performance compared to existing methods. Overall, our finding suggests that the proposed FCNN model holds promise as a viable alternative for tabular data classification tasks, offering a fresh prospective and potentially unlocking new opportunities for leveraging deep learning in structured data analysis. △ Less

Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 10 pages, 16 figures, Submitted to IEEE Access

MSC Class: I.2.10 ACM Class: I.4.6

arXiv:2405.15804 [pdf, other]

Explainable Human-AI Interaction: A Planning Perspective

Authors: Sarath Sreedharan, Anagha Kulkarni, Subbarao Kambhampati

Abstract: From its inception, AI has had a rather ambivalent relationship with humans -- swinging between their augmentation and replacement. Now, as AI technologies enter our everyday lives at an ever increasing pace, there is a greater need for AI systems to work synergistically with humans. One critical requirement for such synergistic human-AI interaction is that the AI systems be explainable to the hum… ▽ More From its inception, AI has had a rather ambivalent relationship with humans -- swinging between their augmentation and replacement. Now, as AI technologies enter our everyday lives at an ever increasing pace, there is a greater need for AI systems to work synergistically with humans. One critical requirement for such synergistic human-AI interaction is that the AI systems be explainable to the humans in the loop. To do this effectively, AI agents need to go beyond planning with their own models of the world, and take into account the mental model of the human in the loop. Drawing from several years of research in our lab, we will discuss how the AI agent can use these mental models to either conform to human expectations, or change those expectations through explanatory communication. While the main focus of the book is on cooperative scenarios, we will point out how the same mental models can be used for obfuscation and deception. Although the book is primarily driven by our own research in these areas, in every chapter, we will provide ample connections to relevant research from other groups. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2404.13364 [pdf, other]

MahaSQuAD: Bridging Linguistic Divides in Marathi Question-Answering

Authors: Ruturaj Ghatage, Aditya Kulkarni, Rajlaxmi Patil, Sharvi Endait, Raviraj Joshi

Abstract: Question-answering systems have revolutionized information retrieval, but linguistic and cultural boundaries limit their widespread accessibility. This research endeavors to bridge the gap of the absence of efficient QnA datasets in low-resource languages by translating the English Question Answering Dataset (SQuAD) using a robust data curation approach. We introduce MahaSQuAD, the first-ever full… ▽ More Question-answering systems have revolutionized information retrieval, but linguistic and cultural boundaries limit their widespread accessibility. This research endeavors to bridge the gap of the absence of efficient QnA datasets in low-resource languages by translating the English Question Answering Dataset (SQuAD) using a robust data curation approach. We introduce MahaSQuAD, the first-ever full SQuAD dataset for the Indic language Marathi, consisting of 118,516 training, 11,873 validation, and 11,803 test samples. We also present a gold test set of manually verified 500 examples. Challenges in maintaining context and handling linguistic nuances are addressed, ensuring accurate translations. Moreover, as a QnA dataset cannot be simply converted into any low-resource language using translation, we need a robust method to map the answer translation to its span in the translated passage. Hence, to address this challenge, we also present a generic approach for translating SQuAD into any low-resource language. Thus, we offer a scalable approach to bridge linguistic and cultural gaps present in low-resource languages, in the realm of question-answering systems. The datasets and models are shared publicly at https://github.com/l3cube-pune/MarathiNLP . △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: Accepted at the International Conference on Natural Language Processing (ICON 2023)

arXiv:2403.19183 [pdf, other]

Empirical Analysis for Unsupervised Universal Dependency Parse Tree Aggregation

Authors: Adithya Kulkarni, Oliver Eulenstein, Qi Li

Abstract: Dependency parsing is an essential task in NLP, and the quality of dependency parsers is crucial for many downstream tasks. Parsers' quality often varies depending on the domain and the language involved. Therefore, it is essential to combat the issue of varying quality to achieve stable performance. In various NLP tasks, aggregation methods are used for post-processing aggregation and have been s… ▽ More Dependency parsing is an essential task in NLP, and the quality of dependency parsers is crucial for many downstream tasks. Parsers' quality often varies depending on the domain and the language involved. Therefore, it is essential to combat the issue of varying quality to achieve stable performance. In various NLP tasks, aggregation methods are used for post-processing aggregation and have been shown to combat the issue of varying quality. However, aggregation methods for post-processing aggregation have not been sufficiently studied in dependency parsing tasks. In an extensive empirical study, we compare different unsupervised post-processing aggregation methods to identify the most suitable dependency tree structure aggregation method. △ Less

Submitted 3 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18212 [pdf, other]

Preference-Based Planning in Stochastic Environments: From Partially-Ordered Temporal Goals to Most Preferred Policies

Authors: Hazhar Rahmani, Abhishek N. Kulkarni, Jie Fu

Abstract: Human preferences are not always represented via complete linear orders: It is natural to employ partially-ordered preferences for expressing incomparable outcomes. In this work, we consider decision-making and probabilistic planning in stochastic systems modeled as Markov decision processes (MDPs), given a partially ordered preference over a set of temporally extended goals. Specifically, each te… ▽ More Human preferences are not always represented via complete linear orders: It is natural to employ partially-ordered preferences for expressing incomparable outcomes. In this work, we consider decision-making and probabilistic planning in stochastic systems modeled as Markov decision processes (MDPs), given a partially ordered preference over a set of temporally extended goals. Specifically, each temporally extended goal is expressed using a formula in Linear Temporal Logic on Finite Traces (LTL$_f$). To plan with the partially ordered preference, we introduce order theory to map a preference over temporal goals to a preference over policies for the MDP. Accordingly, a most preferred policy under a stochastic ordering induces a stochastic nondominated probability distribution over the finite paths in the MDP. To synthesize a most preferred policy, our technical approach includes two key steps. In the first step, we develop a procedure to transform a partially ordered preference over temporal goals into a computational model, called preference automaton, which is a semi-automaton with a partial order over acceptance conditions. In the second step, we prove that finding a most preferred policy is equivalent to computing a Pareto-optimal policy in a multi-objective MDP that is constructed from the original MDP, the preference automaton, and the chosen stochastic ordering relation. Throughout the paper, we employ running examples to illustrate the proposed preference specification and solution approaches. We demonstrate the efficacy of our algorithm using these examples, providing detailed analysis, and then discuss several potential future directions. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2209.12267

arXiv:2403.08036 [pdf, other]

A Review of Cybersecurity Incidents in the Food and Agriculture Sector

Authors: Ajay Kulkarni, Yingjie Wang, Munisamy Gopinath, Dan Sobien, Abdul Rahman, Feras A. Batarseh

Abstract: The increasing utilization of emerging technologies in the Food & Agriculture (FA) sector has heightened the need for security to minimize cyber risks. Considering this aspect, this manuscript reviews disclosed and documented cybersecurity incidents in the FA sector. For this purpose, thirty cybersecurity incidents were identified, which took place between July 2011 and April 2023. The details of… ▽ More The increasing utilization of emerging technologies in the Food & Agriculture (FA) sector has heightened the need for security to minimize cyber risks. Considering this aspect, this manuscript reviews disclosed and documented cybersecurity incidents in the FA sector. For this purpose, thirty cybersecurity incidents were identified, which took place between July 2011 and April 2023. The details of these incidents are reported from multiple sources such as: the private industry and flash notifications generated by the Federal Bureau of Investigation (FBI), internal reports from the affected organizations, and available media sources. Considering the available information, a brief description of the security threat, ransom amount, and impact on the organization are discussed for each incident. This review reports an increased frequency of cybersecurity threats to the FA sector. To minimize these cyber risks, popular cybersecurity frameworks and recent agriculture-specific cybersecurity solutions are also discussed. Further, the need for AI assurance in the FA sector is explained, and the Farmer-Centered AI (FCAI) framework is proposed. The main aim of the FCAI framework is to support farmers in decision-making for agricultural production, by incorporating AI assurance. Lastly, the effects of the reported cyber incidents on other critical infrastructures, food security, and the economy are noted, along with specifying the open issues for future development. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: Preprint. Submitted for journal publication

arXiv:2402.17231 [pdf, other]

MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning

Authors: Debrup Das, Debopriyo Banerjee, Somak Aditya, Ashish Kulkarni

Abstract: Tool-augmented Large Language Models (TALMs) are known to enhance the skillset of large language models (LLMs), thereby, leading to their improved reasoning abilities across many tasks. While, TALMs have been successfully employed in different question-answering benchmarks, their efficacy on complex mathematical reasoning benchmarks, and the potential complementary benefits offered by tools for kn… ▽ More Tool-augmented Large Language Models (TALMs) are known to enhance the skillset of large language models (LLMs), thereby, leading to their improved reasoning abilities across many tasks. While, TALMs have been successfully employed in different question-answering benchmarks, their efficacy on complex mathematical reasoning benchmarks, and the potential complementary benefits offered by tools for knowledge retrieval and mathematical equation solving are open research questions. In this work, we present MathSensei, a tool-augmented large language model for mathematical reasoning. We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API) through evaluations on mathematical reasoning datasets. We perform exhaustive ablations on MATH, a popular dataset for evaluating mathematical reasoning on diverse mathematical disciplines. We also conduct experiments involving well-known tool planners to study the impact of tool sequencing on the model performance. MathSensei achieves 13.5% better accuracy over gpt-3.5-turbo with Chain-of-Thought on the MATH dataset. We further observe that TALMs are not as effective for simpler math word problems (in GSM-8K), and the benefit increases as the complexity and required knowledge increases (progressively over AQuA, MMLU-Math, and higher level complex questions in MATH). The code and data are available at https://github.com/Debrup-61/MathSensei. △ Less

Submitted 3 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.16862 [pdf, other]

Revisiting Common Randomness, No-signaling and Information Structure in Decentralized Control

Authors: Apurva Dhingra, Ankur A. Kulkarni

Abstract: This work revisits the no-signaling condition for decentralized information structures. We produce examples to show that within the no-signaling polytope exist strategies that cannot be achieved by passive common randomness but instead require agents to either share their observations with a mediator or communicate directly with each other. This poses a question mark on whether the no-signaling co… ▽ More This work revisits the no-signaling condition for decentralized information structures. We produce examples to show that within the no-signaling polytope exist strategies that cannot be achieved by passive common randomness but instead require agents to either share their observations with a mediator or communicate directly with each other. This poses a question mark on whether the no-signaling condition truly captures the decentralized information structure in the strictest sense. △ Less

Submitted 20 January, 2024; originally announced February 2024.

MSC Class: 93C41; 93E20; 81Q93

arXiv:2402.07513 [pdf, other]

The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese

Authors: Ajinkya Kulkarni, Anna Tokareva, Rameez Qureshi, Miguel Couceiro

Abstract: In the field of spoken language understanding, systems like Whisper and Multilingual Massive Speech (MMS) have shown state-of-the-art performances. This study is dedicated to a comprehensive exploration of the Whisper and MMS systems, with a focus on assessing biases in automatic speech recognition (ASR) inherent to casual conversation speech specific to the Portuguese language. Our investigation… ▽ More In the field of spoken language understanding, systems like Whisper and Multilingual Massive Speech (MMS) have shown state-of-the-art performances. This study is dedicated to a comprehensive exploration of the Whisper and MMS systems, with a focus on assessing biases in automatic speech recognition (ASR) inherent to casual conversation speech specific to the Portuguese language. Our investigation encompasses various categories, including gender, age, skin tone color, and geo-location. Alongside traditional ASR evaluation metrics such as Word Error Rate (WER), we have incorporated p-value statistical significance for gender bias analysis. Furthermore, we extensively examine the impact of data distribution and empirically show that oversampling techniques alleviate such stereotypical biases. This research represents a pioneering effort in quantifying biases in the Portuguese language context through the application of MMS and Whisper, contributing to a better understanding of ASR systems' performance in multilingual settings. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: EACL-2024 LT-EDI Workshop

arXiv:2402.03678 [pdf, other]

Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

Authors: Yash Shukla, Tanushree Burman, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, Jivko Sinapov

Abstract: Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the lea… ▽ More Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task. △ Less

Submitted 2 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02285 [pdf, other]

SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

Authors: Atharva Kulkarni, Bo-Hsiang Tseng, Joel Ruben Antony Moniz, Dhivya Piraviperumal, Hong Yu, Shruti Bhargava

Abstract: In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing in-context learning methods involve retrieving and adding similar examples to the prompt, requiring access to labeled training data. Procuring such training data for a wide range of domains and applications is time-consuming, expensive, an… ▽ More In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing in-context learning methods involve retrieving and adding similar examples to the prompt, requiring access to labeled training data. Procuring such training data for a wide range of domains and applications is time-consuming, expensive, and, at times, infeasible. While zero-shot learning requires no training data, it significantly lags behind the few-shot setup. Thus, `\textit{Can we efficiently generate synthetic data for any dialogue schema to enable few-shot prompting?}' Addressing this question, we propose \method, a data generation framework tailored for DST, utilizing LLMs. Our approach only requires the dialogue schema and a few hand-crafted dialogue templates to synthesize natural, coherent, and free-flowing dialogues with DST annotations. Few-shot learning using data from {\method} results in $4-5%$ improvement in Joint Goal Accuracy over the zero-shot baseline on MultiWOZ 2.1 and 2.4. Remarkably, our few-shot learning approach recovers nearly $98%$ of the performance compared to the few-shot setup using human-annotated training data. Our synthetic data and code can be accessed at https://github.com/apple/ml-synthdst △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 9 pages. 4 figures, EACL 2024 main conference

arXiv:2401.08363 [pdf, other]

Mitigating Bias in Machine Learning Models for Phishing Webpage Detection

Authors: Aditya Kulkarni, Vivek Balachandran, Dinil Mon Divakaran, Tamal Das

Abstract: The widespread accessibility of the Internet has led to a surge in online fraudulent activities, underscoring the necessity of shielding users' sensitive information from cybercriminals. Phishing, a well-known cyberattack, revolves around the creation of phishing webpages and the dissemination of corresponding URLs, aiming to deceive users into sharing their sensitive information, often for identi… ▽ More The widespread accessibility of the Internet has led to a surge in online fraudulent activities, underscoring the necessity of shielding users' sensitive information from cybercriminals. Phishing, a well-known cyberattack, revolves around the creation of phishing webpages and the dissemination of corresponding URLs, aiming to deceive users into sharing their sensitive information, often for identity theft or financial gain. Various techniques are available for preemptively categorizing zero-day phishing URLs by distilling unique attributes and constructing predictive models. However, these existing techniques encounter unresolved issues. This proposal delves into persistent challenges within phishing detection solutions, particularly concentrated on the preliminary phase of assembling comprehensive datasets, and proposes a potential solution in the form of a tool engineered to alleviate bias in ML models. Such a tool can generate phishing webpages for any given set of legitimate URLs, infusing randomly selected content and visual-based phishing features. Furthermore, we contend that the tool holds the potential to assess the efficacy of existing phishing detection solutions, especially those trained on confined datasets. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.07148 [pdf, other]

Assessing the Effectiveness of Binary-Level CFI Techniques

Authors: Ruturaj K. Vaidya, Prasad A. Kulkarni

Abstract: Memory corruption is an important class of vulnerability that can be leveraged to craft control flow hijacking attacks. Control Flow Integrity (CFI) provides protection against such attacks. Application of type-based CFI policies requires information regarding the number and type of function arguments. Binary-level type recovery is inherently speculative, which motivates the need for an evaluation… ▽ More Memory corruption is an important class of vulnerability that can be leveraged to craft control flow hijacking attacks. Control Flow Integrity (CFI) provides protection against such attacks. Application of type-based CFI policies requires information regarding the number and type of function arguments. Binary-level type recovery is inherently speculative, which motivates the need for an evaluation framework to assess the effectiveness of binary-level CFI techniques compared with their source-level counterparts, where such type information is fully and accurately accessible. In this work, we develop a novel, generalized and extensible framework to assess how the program analysis information we get from state-of-the-art binary analysis tools affects the efficacy of type-based CFI techniques. We introduce new and insightful metrics to quantitatively compare source independent CFI policies with their ground truth source aware counterparts. We leverage our framework to evaluate binary-level CFI policies implemented using program analysis information extracted from the IDA Pro binary analyzer and compared with the ground truth information obtained from the LLVM compiler, and present our observations. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: 14 pages, 9 figures, 9 tables, Part of this work is to be published in 16th International Symposium on Foundations & Practice of Security (FPS - 2023)

arXiv:2401.01285 [pdf, ps, other]

doi 10.1145/3626252.3630926

Socially Responsible Computing in an Introductory Course

Authors: Aakash Gautam, Anagha Kulkarni, Sarah Hug, Jane Lehr, Ilmi Yoon

Abstract: Given the potential for technology to inflict harm and injustice on society, it is imperative that we cultivate a sense of social responsibility among our students as they progress through the Computer Science (CS) curriculum. Our students need to be able to examine the social complexities in which technology development and use are situated. Also, aligning students' personal goals and their abili… ▽ More Given the potential for technology to inflict harm and injustice on society, it is imperative that we cultivate a sense of social responsibility among our students as they progress through the Computer Science (CS) curriculum. Our students need to be able to examine the social complexities in which technology development and use are situated. Also, aligning students' personal goals and their ability to achieve them in their field of study is important for promoting motivation and a sense of belonging. Promoting communal goals while learning computing can help broaden participation, particularly among groups who have been historically marginalized in computing. Keeping these considerations in mind, we piloted an introductory Java programming course in which activities engaging students in ethical and socially responsible considerations were integrated across modules. Rather than adding social on top of the technical content, our curricular approach seeks to weave them together. The data from the class suggests that the students found the inclusion of the social context in the technical assignments to be more motivating and expressed greater agency in realizing social change. We share our approach to designing this new introductory socially responsible computing course and the students' reflections. We also highlight seven considerations for educators seeking to incorporate socially responsible computing. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Journal ref: Proceedings of the 55th ACM Technical Symposium on Computer Science Education (SIGCSE 2024)

arXiv:2312.14924 [pdf, other]

Training Convolutional Neural Networks with the Forward-Forward algorithm

Authors: Riccardo Scodellaro, Ajinkya Kulkarni, Frauke Alves, Matthias Schröter

Abstract: The recent successes in analyzing images with deep neural networks are almost exclusively achieved with Convolutional Neural Networks (CNNs). The training of these CNNs, and in fact of all deep neural network architectures, uses the backpropagation algorithm where the output of the network is compared with the desired result and the difference is then used to tune the weights of the network toward… ▽ More The recent successes in analyzing images with deep neural networks are almost exclusively achieved with Convolutional Neural Networks (CNNs). The training of these CNNs, and in fact of all deep neural network architectures, uses the backpropagation algorithm where the output of the network is compared with the desired result and the difference is then used to tune the weights of the network towards the desired outcome. In a 2022 preprint, Geoffrey Hinton suggested an alternative way of training which passes the desired results together with the images at the input of the network. This so called Forward Forward (FF) algorithm has up to now only been used in fully connected networks. In this paper, we show how the FF paradigm can be extended to CNNs. Our FF-trained CNN, featuring a novel spatially-extended labeling technique, achieves a classification accuracy of 99.16% on the MNIST hand-written digits dataset. We show how different hyperparameters affect the performance of the proposed algorithm and compare the results with CNN trained with the standard backpropagation approach. Furthermore, we use Class Activation Maps to investigate which type of features are learnt by the FF algorithm. △ Less

Submitted 7 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: 19 pages, 9 figures

arXiv:2312.07504 [pdf, other]

COLMAP-Free 3D Gaussian Splatting

Authors: Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang

Abstract: While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure an… ▽ More While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes. Our project page is https://oasisyang.github.io/colmap-free-3dgs △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Project Page: https://oasisyang.github.io/colmap-free-3dgs

arXiv:2312.03151 [pdf, other]

Multitask Learning Can Improve Worst-Group Outcomes

Authors: Atharva Kulkarni, Lucio Dery, Amrith Setlur, Aditi Raghunathan, Ameet Talwalkar, Graham Neubig

Abstract: In order to create machine learning systems that serve a variety of users well, it is vital to not only achieve high average performance but also ensure equitable outcomes across diverse groups. However, most machine learning methods are designed to improve a model's average performance on a chosen end task without consideration for their impact on worst group error. Multitask learning (MTL) is on… ▽ More In order to create machine learning systems that serve a variety of users well, it is vital to not only achieve high average performance but also ensure equitable outcomes across diverse groups. However, most machine learning methods are designed to improve a model's average performance on a chosen end task without consideration for their impact on worst group error. Multitask learning (MTL) is one such widely used technique. In this paper, we seek not only to understand the impact of MTL on worst-group accuracy but also to explore its potential as a tool to address the challenge of group-wise fairness. We primarily consider the standard setting of fine-tuning a pre-trained model, where, following recent work \citep{gururangan2020don, dery2023aang}, we multitask the end task with the pre-training objective constructed from the end task data itself. In settings with few or no group annotations, we find that multitasking often, but not consistently, achieves better worst-group accuracy than Just-Train-Twice (JTT; \citet{pmlr-v139-liu21f}) -- a representative distributionally robust optimization (DRO) method. Leveraging insights from synthetic data experiments, we propose to modify standard MTL by regularizing the joint multitask representation space. We run a large number of fine-tuning experiments across computer vision and natural language processing datasets and find that our regularized MTL approach \emph{consistently} outperforms JTT on both average and worst-group outcomes. Our official code can be found here: \href{https://github.com/atharvajk98/MTL-group-robustness.git}{\url{https://github.com/atharvajk98/MTL-group-robustness}}. △ Less

Submitted 28 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: 20 pages, 7 tables, 6 Figures

arXiv:2311.16294 [pdf, other]

Aligning Non-Causal Factors for Transformer-Based Source-Free Domain Adaptation

Authors: Sunandini Sanyal, Ashish Ramayee Asokan, Suvaansh Bhambri, Pradyumna YM, Akshay Kulkarni, Jogendra Nath Kundu, R Venkatesh Babu

Abstract: Conventional domain adaptation algorithms aim to achieve better generalization by aligning only the task-discriminative causal factors between a source and target domain. However, we find that retaining the spurious correlation between causal and non-causal factors plays a vital role in bridging the domain gap and improving target adaptation. Therefore, we propose to build a framework that disenta… ▽ More Conventional domain adaptation algorithms aim to achieve better generalization by aligning only the task-discriminative causal factors between a source and target domain. However, we find that retaining the spurious correlation between causal and non-causal factors plays a vital role in bridging the domain gap and improving target adaptation. Therefore, we propose to build a framework that disentangles and supports causal factor alignment by aligning the non-causal factors first. We also investigate and find that the strong shape bias of vision transformers, coupled with its multi-head attention, make it a suitable architecture for realizing our proposed disentanglement. Hence, we propose to build a Causality-enforcing Source-Free Transformer framework (C-SFTrans) to achieve disentanglement via a novel two-stage alignment approach: a) non-causal factor alignment: non-causal factors are aligned using a style classification task which leads to an overall global alignment, b) task-discriminative causal factor alignment: causal factors are aligned via target adaptation. We are the first to investigate the role of vision transformers (ViTs) in a privacy-preserving source-free setting. Our approach achieves state-of-the-art results in several DA benchmarks. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: WACV 2024. Project Page: https://val.cds.iisc.ac.in/C-SFTrans/

arXiv:2311.06592 [pdf, other]

An Empirical Study of Using ChatGPT for Fact Verification Task

Authors: Mohna Chakraborty, Adithya Kulkarni, Qi Li

Abstract: ChatGPT has recently emerged as a powerful tool for performing diverse NLP tasks. However, ChatGPT has been criticized for generating nonfactual responses, raising concerns about its usability for sensitive tasks like fact verification. This study investigates three key research questions: (1) Can ChatGPT be used for fact verification tasks? (2) What are different prompts performance using ChatGPT… ▽ More ChatGPT has recently emerged as a powerful tool for performing diverse NLP tasks. However, ChatGPT has been criticized for generating nonfactual responses, raising concerns about its usability for sensitive tasks like fact verification. This study investigates three key research questions: (1) Can ChatGPT be used for fact verification tasks? (2) What are different prompts performance using ChatGPT for fact verification tasks? (3) For the best-performing prompt, what common mistakes does ChatGPT make? Specifically, this study focuses on conducting a comprehensive and systematic analysis by designing and comparing the performance of three different prompts for fact verification tasks on the benchmark FEVER dataset using ChatGPT. △ Less

Submitted 11 November, 2023; originally announced November 2023.

arXiv:2311.04179 [pdf]

On Leakage in Machine Learning Pipelines

Authors: Leonard Sasse, Eliana Nicolaisen-Sobesky, Juergen Dukart, Simon B. Eickhoff, Michael Götz, Sami Hamdan, Vera Komeyer, Abhijit Kulkarni, Juha Lahnakoski, Bradley C. Love, Federico Raimondo, Kaustubh R. Patil

Abstract: Machine learning (ML) provides powerful tools for predictive modeling. ML's popularity stems from the promise of sample-level prediction with applications across a variety of fields from physics and marketing to healthcare. However, if not properly implemented and evaluated, ML pipelines may contain leakage typically resulting in overoptimistic performance estimates and failure to generalize to ne… ▽ More Machine learning (ML) provides powerful tools for predictive modeling. ML's popularity stems from the promise of sample-level prediction with applications across a variety of fields from physics and marketing to healthcare. However, if not properly implemented and evaluated, ML pipelines may contain leakage typically resulting in overoptimistic performance estimates and failure to generalize to new data. This can have severe negative financial and societal implications. Our aim is to expand understanding associated with causes leading to leakage when designing, implementing, and evaluating ML pipelines. Illustrated by concrete examples, we provide a comprehensive overview and discussion of various types of leakage that may arise in ML pipelines. △ Less

Submitted 5 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: second draft

arXiv:2310.17654 [pdf]

ACWA: An AI-driven Cyber-Physical Testbed for Intelligent Water Systems

Authors: Feras A. Batarseh, Ajay Kulkarni, Chhayly Sreng, Justice Lin, Siam Maksud

Abstract: This manuscript presents a novel state-of-the-art cyber-physical water testbed, namely: The AI and Cyber for Water and Agriculture testbed (ACWA). ACWA is motivated by the need to advance water supply management using AI and Cybersecurity experimentation. The main goal of ACWA is to address pressing challenges in the water and agricultural domains by utilising cutting-edge AI and data-driven techn… ▽ More This manuscript presents a novel state-of-the-art cyber-physical water testbed, namely: The AI and Cyber for Water and Agriculture testbed (ACWA). ACWA is motivated by the need to advance water supply management using AI and Cybersecurity experimentation. The main goal of ACWA is to address pressing challenges in the water and agricultural domains by utilising cutting-edge AI and data-driven technologies. These challenges include Cyberbiosecurity, resources management, access to water, sustainability, and data-driven decision-making, among others. To address such issues, ACWA consists of multiple topologies, sensors, computational nodes, pumps, tanks, smart water devices, as well as databases and AI models that control the system. Moreover, we present ACWA simulator, which is a software-based water digital twin. The simulator runs on fluid and constituent transport principles that produce theoretical time series of a water distribution system. This creates a good validation point for comparing the theoretical approach with real-life results via the physical ACWA testbed. ACWA data are available to AI and water domain researchers and are hosted in an online public repository. In this paper, the system is introduced in detail and compared with existing water testbeds; additionally, example use-cases are described along with novel outcomes such as datasets, software, and AI-related scenarios. △ Less

Submitted 26 September, 2023; originally announced October 2023.

arXiv:2310.16990 [pdf, other]

doi 10.18653/v1/2023.emnlp-industry.61

STEER: Semantic Turn Extension-Expansion Recognition for Voice Assistants

Authors: Leon Liyang Zhang, Jiarui Lu, Joel Ruben Antony Moniz, Aditya Kulkarni, Dhivya Piraviperumal, Tien Dung Tran, Nicholas Tzou, Hong Yu

Abstract: In the context of a voice assistant system, steering refers to the phenomenon in which a user issues a follow-up command attempting to direct or clarify a previous turn. We propose STEER, a steering detection model that predicts whether a follow-up turn is a user's attempt to steer the previous command. Constructing a training dataset for steering use cases poses challenges due to the cold-start p… ▽ More In the context of a voice assistant system, steering refers to the phenomenon in which a user issues a follow-up command attempting to direct or clarify a previous turn. We propose STEER, a steering detection model that predicts whether a follow-up turn is a user's attempt to steer the previous command. Constructing a training dataset for steering use cases poses challenges due to the cold-start problem. To overcome this, we developed heuristic rules to sample opt-in usage data, approximating positive and negative samples without any annotation. Our experimental results show promising performance in identifying steering intent, with over 95% accuracy on our sampled data. Moreover, STEER, in conjunction with our sampling strategy, aligns effectively with real-world steering scenarios, as evidenced by its strong zero-shot performance on a human-graded evaluation set. In addition to relying solely on user transcripts as input, we introduce STEER+, an enhanced version of the model. STEER+ utilizes a semantic parse tree to provide more context on out-of-vocabulary words, such as named entities that often occur at the sentence boundary. This further improves model performance, reducing error rate in domains where entities frequently appear, such as messaging. Lastly, we present a data analysis that highlights the improvement in user experience when voice assistants support steering use cases. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Industry Track

arXiv:2310.16621 [pdf]

ArTST: Arabic Text and Speech Transformer

Authors: Hawau Olamide Toyin, Amirbek Djanibekov, Ajinkya Kulkarni, Hanan Aldarmaki

Abstract: We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-traine… ▽ More We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-trained the model from scratch on MSA speech and text data, and fine-tuned it for the following tasks: Automatic Speech Recognition (ASR), Text-To-Speech synthesis (TTS), and spoken dialect identification. In our experiments comparing ArTST with SpeechT5, as well as with previously reported results in these tasks, ArTST performs on a par with or exceeding the current state-of-the-art in all three tasks. Moreover, we find that our pre-training is conducive for generalization, which is particularly evident in the low-resource TTS task. The pre-trained model as well as the fine-tuned ASR and TTS models are released for research use. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 11 pages, 1 figure, SIGARAB ArabicNLP 2023

arXiv:2310.15113 [pdf]

Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model

Authors: Leonie Weissweiler, Valentin Hofmann, Anjali Kantharuban, Anna Cai, Ritam Dutt, Amey Hengle, Anubha Kabra, Atharva Kulkarni, Abhishek Vijayakumar, Haofei Yu, Hinrich Schütze, Kemal Oflazer, David R. Mortensen

Abstract: Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (i… ▽ More Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko's (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results -- through the lens of morphology -- cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading. △ Less

Submitted 26 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2310.13812 [pdf, other]

Yet Another Model for Arabic Dialect Identification

Authors: Ajinkya Kulkarni, Hanan Aldarmaki

Abstract: In this paper, we describe a spoken Arabic dialect identification (ADI) model for Arabic that consistently outperforms previously published results on two benchmark datasets: ADI-5 and ADI-17. We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as w… ▽ More In this paper, we describe a spoken Arabic dialect identification (ADI) model for Arabic that consistently outperforms previously published results on two benchmark datasets: ADI-5 and ADI-17. We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants. We find that individually, ECAPA-TDNN network outperforms ResNet, and models with UniSpeech-SAT features outperform models with MFCCs by a large margin. Furthermore, a fusion of all four variants consistently outperforms individual models. Our best models outperform previously reported results on both datasets, with accuracies of 84.7% and 96.9% on ADI-5 and ADI-17, respectively. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: ACCEPTED AT ArabicNLP 2023

arXiv:2310.10085 [pdf]

Solution to Advanced Manufacturing Process Problems using Cohort Intelligence Algorithm with Improved Constraint Handling Approaches

Authors: Aniket Nargundkar, Madhav Rawal, Aryaman Patel, Anand J Kulkarni, Apoorva S Shastri

Abstract: Recently, various Artificial Intelligence (AI) based optimization metaheuristics are proposed and applied for a variety of problems. Cohort Intelligence (CI) algorithm is a socio inspired optimization technique which is successfully applied for solving several unconstrained & constrained real-world problems from the domains such as design, manufacturing, supply chain, healthcare, etc. Generally, r… ▽ More Recently, various Artificial Intelligence (AI) based optimization metaheuristics are proposed and applied for a variety of problems. Cohort Intelligence (CI) algorithm is a socio inspired optimization technique which is successfully applied for solving several unconstrained & constrained real-world problems from the domains such as design, manufacturing, supply chain, healthcare, etc. Generally, real-world problems are constrained in nature. Even though most of the Evolutionary Algorithms (EAs) can efficiently solve unconstrained problems, their performance degenerates when the constraints are involved. In this paper, two novel constraint handling approaches based on modulus and hyperbolic tangent probability distributions are proposed. Constrained CI algorithm with constraint handling approaches based on triangular, modulus and hyperbolic tangent is presented and applied for optimizing advanced manufacturing processes such as Water Jet Machining (WJM), Abrasive Jet Machining (AJM), Ultrasonic Machining (USM) and Grinding process. The solutions obtained using proposed CI algorithm are compared with contemporary algorithms such as Genetic Algorithm, Simulated Annealing, Teaching Learning Based Optimization, etc. The proposed approaches achieved 2%-127% maximization of material removal rate satisfying hard constraints. As compared to the GA, CI with Hyperbolic tangent probability distribution achieved 15%, 2%, 2%, 127%, and 4% improvement in MRR for AJMB, AJMD, WJM, USM, and Grinding processes, respectively contributing to the productivity improvement. The contributions in this paper have opened several avenues for further applicability of the proposed constraint handling approaches for solving complex constrained problems. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.09501 [pdf, other]

DepNeCTI: Dependency-based Nested Compound Type Identification for Sanskrit

Authors: Jivnesh Sandhan, Yaswanth Narsupalli, Sreevatsa Muppirala, Sriram Krishnan, Pavankumar Satuluri, Amba Kulkarni, Pawan Goyal

Abstract: Multi-component compounding is a prevalent phenomenon in Sanskrit, and understanding the implicit structure of a compound's components is crucial for deciphering its meaning. Earlier approaches in Sanskrit have focused on binary compounds and neglected the multi-component compound setting. This work introduces the novel task of nested compound type identification (NeCTI), which aims to identify ne… ▽ More Multi-component compounding is a prevalent phenomenon in Sanskrit, and understanding the implicit structure of a compound's components is crucial for deciphering its meaning. Earlier approaches in Sanskrit have focused on binary compounds and neglected the multi-component compound setting. This work introduces the novel task of nested compound type identification (NeCTI), which aims to identify nested spans of a multi-component compound and decode the implicit semantic relations between them. To the best of our knowledge, this is the first attempt in the field of lexical semantics to propose this task. We present 2 newly annotated datasets including an out-of-domain dataset for this task. We also benchmark these datasets by exploring the efficacy of the standard problem formulations such as nested named entity recognition, constituency parsing and seq2seq, etc. We present a novel framework named DepNeCTI: Dependency-based Nested Compound Type Identifier that surpasses the performance of the best baseline with an average absolute improvement of 13.1 points F1-score in terms of Labeled Span Score (LSS) and a 5-fold enhancement in inference efficiency. In line with the previous findings in the binary Sanskrit compound identification task, context provides benefits for the NeCTI task. The codebase and datasets are publicly available at: https://github.com/yaswanth-iitkgp/DepNeCTI △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: 9 Pages, Camera-ready version accepted at EMNLP23 (Findings)

arXiv:2310.07423 [pdf, other]

Adapting the adapters for code-switching in multilingual ASR

Authors: Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Aldarmaki

Abstract: Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages. Some of these models employ language adapters in their formulation, which helps to improve monolingual performance and avoids some of the drawbacks of multi-lingual modeling on resource-rich languages. However, this formulation restricts the usab… ▽ More Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages. Some of these models employ language adapters in their formulation, which helps to improve monolingual performance and avoids some of the drawbacks of multi-lingual modeling on resource-rich languages. However, this formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance. In this work, we propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network. We also model code-switching as a sequence of latent binary sequences that can be used to guide the flow of information from each language adapter at the frame level. The proposed approaches are evaluated on three code-switched datasets encompassing Arabic, Mandarin, and Hindi languages paired with English, showing consistent improvements in code-switching performance with at least 10\% absolute reduction in CER across all test sets. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: Submitted to ICASSP 2024

arXiv:2310.06080 [pdf, other]

Advancing Diagnostic Precision: Leveraging Machine Learning Techniques for Accurate Detection of Covid-19, Pneumonia, and Tuberculosis in Chest X-Ray Images

Authors: Aditya Kulkarni, Guruprasad Parasnis, Harish Balasubramanian, Vansh Jain, Anmol Chokshi, Reena Sonkusare

Abstract: Lung diseases such as COVID-19, tuberculosis (TB), and pneumonia continue to be serious global health concerns that affect millions of people worldwide. In medical practice, chest X-ray examinations have emerged as the norm for diagnosing diseases, particularly chest infections such as COVID-19. Paramedics and scientists are working intensively to create a reliable and precise approach for early-s… ▽ More Lung diseases such as COVID-19, tuberculosis (TB), and pneumonia continue to be serious global health concerns that affect millions of people worldwide. In medical practice, chest X-ray examinations have emerged as the norm for diagnosing diseases, particularly chest infections such as COVID-19. Paramedics and scientists are working intensively to create a reliable and precise approach for early-stage COVID-19 diagnosis in order to save lives. But with a variety of symptoms, medical diagnosis of these disorders poses special difficulties. It is essential to address their identification and timely diagnosis in order to successfully treat and prevent these illnesses. In this research, a multiclass classification approach using state-of-the-art methods for deep learning and image processing is proposed. This method takes into account the robustness and efficiency of the system in order to increase diagnostic precision of chest diseases. A comparison between a brand-new convolution neural network (CNN) and several transfer learning pre-trained models including VGG19, ResNet, DenseNet, EfficientNet, and InceptionNet is recommended. Publicly available and widely used research datasets like Shenzen, Montogomery, the multiclass Kaggle dataset and the NIH dataset were used to rigorously test the model. Recall, precision, F1-score, and Area Under Curve (AUC) score are used to evaluate and compare the performance of the proposed model. An AUC value of 0.95 for COVID-19, 0.99 for TB, and 0.98 for pneumonia is obtained using the proposed network. Recall and precision ratings of 0.95, 0.98, and 0.97, respectively, likewise met high standards. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 11 pages, 18 figures, Under review in Discover Artificial Intelligence Journal by Springer Nature

arXiv:2310.04020 [pdf]

Snail Homing and Mating Search Algorithm: A Novel Bio-Inspired Metaheuristic Algorithm

Authors: Anand J Kulkarni, Ishaan R Kale, Apoorva Shastri, Aayush Khandekar

Abstract: In this paper, a novel Snail Homing and Mating Search (SHMS) algorithm is proposed. It is inspired from the biological behaviour of the snails. Snails continuously travels to find food and a mate, leaving behind a trail of mucus that serves as a guide for their return. Snails tend to navigate by following the available trails on the ground and responding to cues from nearby shelter homes. The prop… ▽ More In this paper, a novel Snail Homing and Mating Search (SHMS) algorithm is proposed. It is inspired from the biological behaviour of the snails. Snails continuously travels to find food and a mate, leaving behind a trail of mucus that serves as a guide for their return. Snails tend to navigate by following the available trails on the ground and responding to cues from nearby shelter homes. The proposed SHMS algorithm is investigated by solving several unimodal and multimodal functions. The solutions are validated using standard statistical tests such as two-sided and pairwise signed rank Wilcoxon test and Friedman rank test. The solution obtained from the SHMS algorithm exhibited superior robustness as well as search space exploration capabilities within the less computational cost. The real-world application of SHMS algorithm is successfully demonstrated in the engineering design domain by solving three cases of design and economic optimization shell and tube heat exchanger problem. The objective function value and other statistical results obtained using SHMS algorithm are compared with other well-known metaheuristic algorithms. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: 46 Pages, 11 Figures, 24 Tables

arXiv:2310.03055 [pdf]

Modified LAB Algorithm with Clustering-based Search Space Reduction Method for solving Engineering Design Problems

Authors: Ruturaj Reddy, Utkarsh Gupta, Ishaan Kale, Apoorva Shastri, Anand J Kulkarni

Abstract: A modified LAB algorithm is introduced in this paper. It builds upon the original LAB algorithm (Reddy et al. 2023), which is a socio-inspired algorithm that models competitive and learning behaviours within a group, establishing hierarchical roles. The proposed algorithm incorporates the roulette wheel approach and a reduction factor introducing inter-group competition and iteratively narrowing d… ▽ More A modified LAB algorithm is introduced in this paper. It builds upon the original LAB algorithm (Reddy et al. 2023), which is a socio-inspired algorithm that models competitive and learning behaviours within a group, establishing hierarchical roles. The proposed algorithm incorporates the roulette wheel approach and a reduction factor introducing inter-group competition and iteratively narrowing down the sample space. The algorithm is validated by solving the benchmark test problems from CEC 2005 and CEC 2017. The solutions are validated using standard statistical tests such as two-sided and pairwise signed rank Wilcoxon test and Friedman rank test. The algorithm exhibited improved and superior robustness as well as search space exploration capabilities. Furthermore, a Clustering-Based Search Space Reduction (C-SSR) method is proposed, making the algorithm capable to solve constrained problems. The C-SSR method enables the algorithm to identify clusters of feasible regions, satisfying the constraints and contributing to achieve the optimal solution. This method demonstrates its effectiveness as a potential alternative to traditional constraint handling techniques. The results obtained using the Modified LAB algorithm are then compared with those achieved by other recent metaheuristic algorithms. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.15164 [pdf, other]

3D Reconstruction with Generalizable Neural Fields using Scene Priors

Authors: Yang Fu, Shalini De Mello, Xueting Li, Amey Kulkarni, Jan Kautz, Xiaolong Wang, Sifei Liu

Abstract: High-fidelity 3D scene reconstruction has been substantially advanced by recent progress in neural fields. However, most existing methods train a separate network from scratch for each individual scene. This is not scalable, inefficient, and unable to yield good results given limited views. While learning-based multi-view stereo methods alleviate this issue to some extent, their multi-view setting… ▽ More High-fidelity 3D scene reconstruction has been substantially advanced by recent progress in neural fields. However, most existing methods train a separate network from scratch for each individual scene. This is not scalable, inefficient, and unable to yield good results given limited views. While learning-based multi-view stereo methods alleviate this issue to some extent, their multi-view setting makes it less flexible to scale up and to broad applications. Instead, we introduce training generalizable Neural Fields incorporating scene Priors (NFPs). The NFP network maps any single-view RGB-D image into signed distance and radiance values. A complete scene can be reconstructed by merging individual frames in the volumetric space WITHOUT a fusion module, which provides better flexibility. The scene priors can be trained on large-scale datasets, allowing for fast adaptation to the reconstruction of a new scene with fewer views. NFP not only demonstrates SOTA scene reconstruction performance and efficiency, but it also supports single-image novel-view synthesis, which is underexplored in neural fields. More qualitative results are available at: https://oasisyang.github.io/neural-prior △ Less

Submitted 28 September, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: Project Page: https://oasisyang.github.io/neural-prior

arXiv:2308.14023 [pdf, other]

Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation

Authors: Sunandini Sanyal, Ashish Ramayee Asokan, Suvaansh Bhambri, Akshay Kulkarni, Jogendra Nath Kundu, R. Venkatesh Babu

Abstract: Conventional Domain Adaptation (DA) methods aim to learn domain-invariant feature representations to improve the target adaptation performance. However, we motivate that domain-specificity is equally important since in-domain trained models hold crucial domain-specific properties that are beneficial for adaptation. Hence, we propose to build a framework that supports disentanglement and learning o… ▽ More Conventional Domain Adaptation (DA) methods aim to learn domain-invariant feature representations to improve the target adaptation performance. However, we motivate that domain-specificity is equally important since in-domain trained models hold crucial domain-specific properties that are beneficial for adaptation. Hence, we propose to build a framework that supports disentanglement and learning of domain-specific factors and task-specific factors in a unified model. Motivated by the success of vision transformers in several multi-modal vision problems, we find that queries could be leveraged to extract the domain-specific factors. Hence, we propose a novel Domain-specificity-inducing Transformer (DSiT) framework for disentangling and learning both domain-specific and task-specific factors. To achieve disentanglement, we propose to construct novel Domain-Representative Inputs (DRI) with domain-specific information to train a domain classifier with a novel domain token. We are the first to utilize vision transformers for domain adaptation in a privacy-oriented source-free setting, and our approach achieves state-of-the-art performance on single-source, multi-source, and multi-target benchmarks △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: ICCV 2023. Project page: http://val.cds.iisc.ac.in/DSiT-SFDA

arXiv:2308.03260 [pdf, other]

Exploring Different Time-series-Transformer (TST) Architectures: A Case Study in Battery Life Prediction for Electric Vehicles (EVs)

Authors: Niranjan Sitapure, Atharva Kulkarni

Abstract: In recent years, battery technology for electric vehicles (EVs) has been a major focus, with a significant emphasis on developing new battery materials and chemistries. However, accurately predicting key battery parameters, such as state-of-charge (SOC) and temperature, remains a challenge for constructing advanced battery management systems (BMS). Existing battery models do not comprehensively co… ▽ More In recent years, battery technology for electric vehicles (EVs) has been a major focus, with a significant emphasis on developing new battery materials and chemistries. However, accurately predicting key battery parameters, such as state-of-charge (SOC) and temperature, remains a challenge for constructing advanced battery management systems (BMS). Existing battery models do not comprehensively cover all parameters affecting battery performance, including non-battery-related factors like ambient temperature, cabin temperature, elevation, and regenerative braking during EV operation. Due to the difficulty of incorporating these auxiliary parameters into traditional models, a data-driven approach is suggested. Time-series-transformers (TSTs), leveraging multiheaded attention and parallelization-friendly architecture, are explored alongside LSTM models. Novel TST architectures, including encoder TST + decoder LSTM and a hybrid TST-LSTM, are also developed and compared against existing models. A dataset comprising 72 driving trips in a BMW i3 (60 Ah) is used to address battery life prediction in EVs, aiming to create accurate TST models that incorporate environmental, battery, vehicle driving, and heating circuit data to predict SOC and battery temperature for future time steps. △ Less

Submitted 6 August, 2023; originally announced August 2023.

Comments: 13 pages and 7 figures

ACM Class: J.2; J.6; I.6

arXiv:2308.00119 [pdf, ps, other]

Overtaking Moving Obstacles with Digit: Path Following for Bipedal Robots via Model Predictive Contouring Control

Authors: Kunal S. Narkhede, Dhruv A. Thanki, Abhijeet M. Kulkarni, Ioannis Poulakakis

Abstract: Humanoid robots are expected to navigate in changing environments and perform a variety of tasks. Frequently, these tasks require the robot to make decisions online regarding the speed and precision of following a reference path. For example, a robot may want to decide to temporarily deviate from its path to overtake a slowly moving obstacle that shares the same path and is ahead. In this case, pa… ▽ More Humanoid robots are expected to navigate in changing environments and perform a variety of tasks. Frequently, these tasks require the robot to make decisions online regarding the speed and precision of following a reference path. For example, a robot may want to decide to temporarily deviate from its path to overtake a slowly moving obstacle that shares the same path and is ahead. In this case, path following performance is compromised in favor of fast path traversal. Available global trajectory tracking approaches typically assume a given -- specified in advance -- time parametrization of the path and seek to minimize the norm of the Cartesian error. As a result, when the robot should be where on the path is fixed and temporary deviations from the path are strongly discouraged. Given a global path, this paper presents a Model Predictive Contouring Control (MPCC) approach to selecting footsteps that maximize path traversal while simultaneously allowing the robot to decide between faithful versus fast path following. The method is evaluated in high-fidelity simulations of the bipedal robot Digit in terms of tracking performance of curved paths under disturbances and is also applied to the case where Digit overtakes a moving obstacle. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: Accepted for publication in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2307.02644 [pdf, ps, other]

Achievable Rates for Information Extraction from a Strategic Sender

Authors: Anuj S. Vora, Ankur A. Kulkarni

Abstract: We consider a setting of non-cooperative communication where a receiver wants to recover randomly generated sequences of symbols that are observed by a strategic sender. The sender aims to maximize an average utility that may not align with the recovery criterion of the receiver, whereby the received signals may not be truthful. We pose this problem as a sequential game between the sender and the… ▽ More We consider a setting of non-cooperative communication where a receiver wants to recover randomly generated sequences of symbols that are observed by a strategic sender. The sender aims to maximize an average utility that may not align with the recovery criterion of the receiver, whereby the received signals may not be truthful. We pose this problem as a sequential game between the sender and the receiver with the receiver as the leader and determine `achievable strategies' for the receiver that attain arbitrarily small probability of error for large blocklengths. We show the existence of such achievable strategies under a sufficient condition on the utility of the sender. For the case of the binary alphabet, this condition is also necessary, in the absence of which, the probability of error goes to one for all choices of strategies of the receiver. We show that for reliable recovery, the receiver chooses to correctly decode only a subset of messages received from the sender and deliberately makes an error on messages outside this subset. Due to this decoding strategy, despite a clean channel, our setting exhibits a notion of maximum rate of communication above which the probability of error may not vanish asymptotically and in certain cases, may even tend to one. For the case of the binary alphabet, the maximum rate may be strictly less than unity for certain classes of utilities. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: Submitted to IEEE Transactions on Information Theory

arXiv:2307.02244 [pdf, other]

Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

Authors: Sandipana Dowerah, Ajinkya Kulkarni, Romain Serizel, Denis Jouvet

Abstract: The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step training procedure that takes the benefit of self-supervised learning. In the first stage, the Diff-Filter is trained by conducting timedomain speech filtering… ▽ More The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step training procedure that takes the benefit of self-supervised learning. In the first stage, the Diff-Filter is trained by conducting timedomain speech filtering using a scoring-based diffusion model. In the second stage, the Diff-Filter is jointly optimized with a pre-trained ECAPA-TDNN speaker verification model under a self-supervised learning framework. We present a novel loss based on equal error rate. This loss is used to conduct selfsupervised learning on a dataset that is not labelled in terms of speakers. The proposed approach is evaluated on MultiSV, a multichannel speaker verification dataset, and shows significant improvements in performance under noisy multichannel conditions. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.16621 [pdf, other]

Assessing the Performance of 1D-Convolution Neural Networks to Predict Concentration of Mixture Components from Raman Spectra

Authors: Dexter Antonio, Hannah O'Toole, Randy Carney, Ambarish Kulkarni, Ahmet Palazoglu

Abstract: An emerging application of Raman spectroscopy is monitoring the state of chemical reactors during biologic drug production. Raman shift intensities scale linearly with the concentrations of chemical species and thus can be used to analytically determine real-time concentrations using non-destructive light irradiation in a label-free manner. Chemometric algorithms are used to interpret Raman spectr… ▽ More An emerging application of Raman spectroscopy is monitoring the state of chemical reactors during biologic drug production. Raman shift intensities scale linearly with the concentrations of chemical species and thus can be used to analytically determine real-time concentrations using non-destructive light irradiation in a label-free manner. Chemometric algorithms are used to interpret Raman spectra produced from complex mixtures of bioreactor contents as a reaction evolves. Finding the optimal algorithm for a specific bioreactor environment is challenging due to the lack of freely available Raman mixture datasets. The RaMix Python package addresses this challenge by enabling the generation of synthetic Raman mixture datasets with controllable noise levels to assess the utility of different chemometric algorithm types for real-time monitoring applications. To demonstrate the capabilities of this package and compare the performance of different chemometric algorithms, 48 datasets of simulated spectra were generated using the RaMix Python package. The four tested algorithms include partial least squares regression (PLS), a simple neural network, a simple convolutional neural network (simple CNN), and a 1D convolutional neural network with a ResNet architecture (ResNet). The performance of the PLS and simple CNN model was found to be comparable, with the PLS algorithm slightly outperforming the other models on 83\% of the data sets. The simple CNN model outperforms the other models on large, high noise datasets, demonstrating the superior capability of convolutional neural networks compared to PLS in analyzing noisy spectra. These results demonstrate the promise of CNNs to automatically extract concentration information from unprocessed, noisy spectra, allowing for better process control of industrial drug production. Code for this project is available at github.com/DexterAntonio/RaMix. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 7 pages, 7 figures

ACM Class: I.5.4

arXiv:2306.03252 [pdf, other]

doi 10.1109/IROS55552.2023.10342053

RACECAR -- The Dataset for High-Speed Autonomous Racing

Authors: Amar Kulkarni, John Chrosniak, Emory Ducote, Florian Sauerbeck, Andrew Saba, Utkarsh Chirimar, John Link, Marcello Cellina, Madhur Behl

Abstract: This paper describes the first open dataset for full-scale and high-speed autonomous racing. Multi-modal sensor data has been collected from fully autonomous Indy race cars operating at speeds of up to 170 mph (273 kph). Six teams who raced in the Indy Autonomous Challenge have contributed to this dataset. The dataset spans 11 interesting racing scenarios across two race tracks which include solo… ▽ More This paper describes the first open dataset for full-scale and high-speed autonomous racing. Multi-modal sensor data has been collected from fully autonomous Indy race cars operating at speeds of up to 170 mph (273 kph). Six teams who raced in the Indy Autonomous Challenge have contributed to this dataset. The dataset spans 11 interesting racing scenarios across two race tracks which include solo laps, multi-agent laps, overtaking situations, high-accelerations, banked tracks, obstacle avoidance, pit entry and exit at different speeds. The dataset contains data from 27 racing sessions across the 11 scenarios with over 6.5 hours of sensor data recorded from the track. The data is organized and released in both ROS2 and nuScenes format. We have also developed the ROS2-to-nuScenes conversion library to achieve this. The RACECAR data is unique because of the high-speed environment of autonomous racing. We present several benchmark problems on localization, object detection and tracking (LiDAR, Radar, and Camera), and mapping using the RACECAR data to explore issues that arise at the limits of operation of the vehicle. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: 9 pages, 10 figures. For links to data and reference material go to https://github.com/linklab-uva/RACECAR_DATA

arXiv:2306.01105 [pdf, other]

Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

Authors: Atharva Kulkarni, Sarah Masud, Vikram Goyal, Tanmoy Chakraborty

Abstract: Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate lexicons. However, capturing hate signals becomes challenging in neutrally-seeded malicious content. Thus, designing models and datasets that mimic the… ▽ More Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate lexicons. However, capturing hate signals becomes challenging in neutrally-seeded malicious content. Thus, designing models and datasets that mimic the real-world variability of hate warrants further investigation. To this end, we present GOTHate, a large-scale code-mixed crowdsourced dataset of around 51k posts for hate speech detection from Twitter. GOTHate is neutrally seeded, encompassing different languages and topics. We conduct detailed comparisons of GOTHate with the existing hate speech datasets, highlighting its novelty. We benchmark it with 10 recent baselines. Our extensive empirical and benchmarking experiments suggest that GOTHate is hard to classify in a text-only setup. Thus, we investigate how adding endogenous signals enhances the hate speech detection task. We augment GOTHate with the user's timeline information and ego network, bringing the overall data source closer to the real-world setup for understanding hateful content. Our proposed solution HEN-mBERT is a modular, multilingual, mixture-of-experts model that enriches the linguistic subspace with latent endogenous signals from history, topology, and exemplars. HEN-mBERT transcends the best baseline by 2.5% and 5% in overall macro-F1 and hate class F1, respectively. Inspired by our experiments, in partnership with Wipro AI, we are developing a semi-automated pipeline to detect hateful content as a part of their mission to tackle online harm. △ Less

Submitted 15 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 15 pages, 4 figures, 11 tables. Accepted at SIGKDD'23

arXiv:2305.15690 [pdf, other]

Beryllium: Neural Search for Algorithm Implementations

Authors: Adithya Kulkarni, Mohna Chakraborty, Yonas Sium, Sai Charishma Valluri, Wei Le, Qi Li

Abstract: In this paper, we explore the feasibility of finding algorithm implementations from code. Successfully matching code and algorithms can help understand unknown code, provide reference implementations, and automatically collect data for learning-based program synthesis. To achieve the goal, we designed a new language named p-language to specify the algorithms and a static analyzer for the p-languag… ▽ More In this paper, we explore the feasibility of finding algorithm implementations from code. Successfully matching code and algorithms can help understand unknown code, provide reference implementations, and automatically collect data for learning-based program synthesis. To achieve the goal, we designed a new language named p-language to specify the algorithms and a static analyzer for the p-language to automatically extract control flow, math, and natural language information from the algorithm descriptions. We embedded the output of p-language (p-code) and source code in a common vector space using self-supervised machine learning methods to match algorithm with code without any manual annotation. We developed a tool named Beryllium. It takes pseudo code as a query and returns a list of ranked code snippets that likely match the algorithm query. Our evaluation on Stony Brook Algorithm Repository and popular GitHub projects show that Beryllium significantly outperformed the state-of-the-art code search tools in both C and Java. Specifically, for 98.5%, 93.8%, and 66.2% queries, we found the algorithm implementations in the top 25, 10, and 1 ranked list, respectively. Given 87 algorithm queries, we found implementations for 74 algorithms in the GitHub projects where we did not know the algorithms before. △ Less

Submitted 1 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.15689 [pdf, other]

Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts

Authors: Mohna Chakraborty, Adithya Kulkarni, Qi Li

Abstract: Recent studies have demonstrated that natural-language prompts can help to leverage the knowledge learned by pre-trained language models for the binary sentence-level sentiment classification task. Specifically, these methods utilize few-shot learning settings to fine-tune the sentiment classification model using manual or automatically generated prompts. However, the performance of these methods… ▽ More Recent studies have demonstrated that natural-language prompts can help to leverage the knowledge learned by pre-trained language models for the binary sentence-level sentiment classification task. Specifically, these methods utilize few-shot learning settings to fine-tune the sentiment classification model using manual or automatically generated prompts. However, the performance of these methods is sensitive to the perturbations of the utilized prompts. Furthermore, these methods depend on a few labeled instances for automatic prompt generation and prompt ranking. This study aims to find high-quality prompts for the given task in a zero-shot setting. Given a base prompt, our proposed approach automatically generates multiple prompts similar to the base prompt employing positional, reasoning, and paraphrasing techniques and then ranks the prompts using a novel metric. We empirically demonstrate that the top-ranked prompts are high-quality and significantly outperform the base prompt and the prompts generated using few-shot learning for the binary sentence-level sentiment classification task. △ Less

Submitted 1 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.14707 [pdf, other]

SciFix: Outperforming GPT3 on Scientific Factual Error Correction

Authors: Dhananjay Ashok, Atharva Kulkarni, Hai Pham, Barnabás Póczos

Abstract: Due to the prohibitively high cost of creating error correction datasets, most Factual Claim Correction methods rely on a powerful verification model to guide the correction process. This leads to a significant drop in performance in domains like scientific claims, where good verification models do not always exist. In this work, we introduce SciFix, a scientific claim correction system that does… ▽ More Due to the prohibitively high cost of creating error correction datasets, most Factual Claim Correction methods rely on a powerful verification model to guide the correction process. This leads to a significant drop in performance in domains like scientific claims, where good verification models do not always exist. In this work, we introduce SciFix, a scientific claim correction system that does not require a verifier but can outperform existing methods by a considerable margin -- achieving correction accuracy of 84% on the SciFact dataset, 77% on SciFact-Open and 72% on the CovidFact dataset, compared to next best accuracies of 7%, 5%, and 15% on the same datasets respectively. Our method leverages the power of prompting with LLMs during training to create a richly annotated dataset that can be used for fully supervised training and regularization. We additionally use a claim-aware decoding procedure to improve the quality of corrected claims. Our method outperforms the very LLM that was used to generate the annotated dataset -- with Few-Shot Prompting on GPT3.5 achieving 58%, 61%, and 64% on the respective datasets, a consistently lower correction accuracy, despite using nearly 800 times as many parameters as our model. △ Less

Submitted 12 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: To appear in proceedings of EMNLP2023 (findings)

arXiv:2304.13958 [pdf, other]

Learning and Reasoning Multifaceted and Longitudinal Data for Poverty Estimates and Livelihood Capabilities of Lagged Regions in Rural India

Authors: Atharva Kulkarni, Raya Das, Ravi S. Srivastava, Tanmoy Chakraborty

Abstract: Poverty is a multifaceted phenomenon linked to the lack of capabilities of households to earn a sustainable livelihood, increasingly being assessed using multidimensional indicators. Its spatial pattern depends on social, economic, political, and regional variables. Artificial intelligence has shown immense scope in analyzing the complexities and nuances of poverty. The proposed project aims to ex… ▽ More Poverty is a multifaceted phenomenon linked to the lack of capabilities of households to earn a sustainable livelihood, increasingly being assessed using multidimensional indicators. Its spatial pattern depends on social, economic, political, and regional variables. Artificial intelligence has shown immense scope in analyzing the complexities and nuances of poverty. The proposed project aims to examine the poverty situation of rural India for the period of 1990-2022 based on the quality of life and livelihood indicators. The districts will be classified into `advanced', `catching up', `falling behind', and `lagged' regions. The project proposes to integrate multiple data sources, including conventional national-level large sample household surveys, census surveys, and proxy variables like daytime, and nighttime data from satellite images, and communication networks, to name a few, to provide a comprehensive view of poverty at the district level. The project also intends to examine causation and longitudinal analysis to examine the reasons for poverty. Poverty and inequality could be widening in developing countries due to demographic and growth-agglomerating policies. Therefore, targeting the lagging regions and the vulnerable population is essential to eradicate poverty and improve the quality of life to achieve the goal of `zero poverty'. Thus, the study also focuses on the districts with a higher share of the marginal section of the population compared to the national average to trace the performance of development indicators and their association with poverty in these regions. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: Accepted to IJCAI 2023 Main Conference (AI for Social Good Track)

arXiv:2304.05271 [pdf, other]

Automaton-Guided Curriculum Generation for Reinforcement Learning Agents

Authors: Yash Shukla, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, Jivko Sinapov

Abstract: Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a series of correct acti… ▽ More Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a series of correct actions to reach the goal state, considering future transitions while choosing an action). Employing a curriculum (a sequence of increasingly complex tasks) further improves the learning speed of the agent by sequencing intermediate tasks suited to the learning capacity of the agent. However, generating curricula from the logical specification still remains an unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum Learning, a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP (OOMDP) representation to generate a curriculum as a DAG, where the vertices correspond to tasks, and edges correspond to the direction of knowledge transfer. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e.g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e.g, Q-Learning for Reward Machines). Further, we demonstrate that AGCL performs well even in the presence of noise in the task's OOMDP description, and also when distractor objects are present that are not modeled in the logical specification of the tasks' objectives. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: To be presented at The International Conference on Automated Planning and Scheduling (ICAPS) 2023

arXiv:2303.06629 [pdf, ps, other]

Another Generic Setting for Entity Resolution: Basic Theory

Authors: Xiuzhan Guo, Arthur Berrill, Ajinkya Kulkarni, Kostya Belezko, Min Luo

Abstract: Benjelloun et al. \cite{BGSWW} considered the Entity Resolution (ER) problem as the generic process of matching and merging entity records judged to represent the same real world object. They treated the functions for matching and merging entity records as black-boxes and introduced four important properties that enable efficient generic ER algorithms. In this paper, we shall study the propertie… ▽ More Benjelloun et al. \cite{BGSWW} considered the Entity Resolution (ER) problem as the generic process of matching and merging entity records judged to represent the same real world object. They treated the functions for matching and merging entity records as black-boxes and introduced four important properties that enable efficient generic ER algorithms. In this paper, we shall study the properties which match and merge functions share, model matching and merging black-boxes for ER in a partial groupoid, based on the properties that match and merge functions satisfy, and show that a partial groupoid provides another generic setting for ER. The natural partial order on a partial groupoid is defined when the partial groupoid satisfies Idempotence and Catenary associativity. Given a partial order on a partial groupoid, the least upper bound and compatibility ($LU_{pg}$ and $CP_{pg}$) properties are equivalent to Idempotence, Commutativity, Associativity, and Representativity and the partial order must be the natural one we defined when the domain of the partial operation is reflexive. The partiality of a partial groupoid can be reduced using connected components and clique covers of its domain graph, and a noncommutative partial groupoid can be mapped to a commutative one homomorphically if it has the partial idempotent semigroup like structures. In a finitely generated partial groupoid $(P,D,\circ)$ without any conditions required, the ER we concern is the full elements in $P$. If $(P,D,\circ)$ satisfies Idempotence and Catenary associativity, then the ER is the maximal elements in $P$, which are full elements and form the ER defined in \cite{BGSWW}. Furthermore, in the case, since there is a transitive binary order, we consider ER as ``sorting, selecting, and querying the elements in a finitely generated partial groupoid." △ Less

Submitted 12 March, 2023; originally announced March 2023.

arXiv:2303.00822 [pdf, other]

Planning for Attacker Entrapment in Adversarial Settings

Authors: Brittany Cates, Anagha Kulkarni, Sarath Sreedharan

Abstract: In this paper, we propose a planning framework to generate a defense strategy against an attacker who is working in an environment where a defender can operate without the attacker's knowledge. The objective of the defender is to covertly guide the attacker to a trap state from which the attacker cannot achieve their goal. Further, the defender is constrained to achieve its goal within K number of… ▽ More In this paper, we propose a planning framework to generate a defense strategy against an attacker who is working in an environment where a defender can operate without the attacker's knowledge. The objective of the defender is to covertly guide the attacker to a trap state from which the attacker cannot achieve their goal. Further, the defender is constrained to achieve its goal within K number of steps, where K is calculated as a pessimistic lower bound within which the attacker is unlikely to suspect a threat in the environment. Such a defense strategy is highly useful in real world systems like honeypots or honeynets, where an unsuspecting attacker interacts with a simulated production system while assuming it is the actual production system. Typically, the interaction between an attacker and a defender is captured using game theoretic frameworks. Our problem formulation allows us to capture it as a much simpler infinite horizon discounted MDP, in which the optimal policy for the MDP gives the defender's strategy against the actions of the attacker. Through empirical evaluation, we show the merits of our problem formulation. △ Less

Submitted 5 April, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Showing 1–50 of 170 results for author: Kulkarni, A