Search | arXiv e-print repository

Synthetic Counterfactual Faces

Authors: Guruprasad V Ramesh, Harrison Rosenberg, Ashish Hooda, Shimaa Ahmed Kassem Fawaz

Abstract: Computer vision systems have been deployed in various applications involving biometrics like human faces. These systems can identify social media users, search for missing persons, and verify identity of individuals. While computer vision models are often evaluated for accuracy on available benchmarks, more annotated data is necessary to learn about their robustness and fairness against semantic d… ▽ More Computer vision systems have been deployed in various applications involving biometrics like human faces. These systems can identify social media users, search for missing persons, and verify identity of individuals. While computer vision models are often evaluated for accuracy on available benchmarks, more annotated data is necessary to learn about their robustness and fairness against semantic distributional shifts in input data, especially in face data. Among annotated data, counterfactual examples grant strong explainability characteristics. Because collecting natural face data is prohibitively expensive, we put forth a generative AI-based framework to construct targeted, counterfactual, high-quality synthetic face data. Our synthetic data pipeline has many use cases, including face recognition systems sensitivity evaluations and image understanding system probes. The pipeline is validated with multiple user studies. We showcase the efficacy of our face generation pipeline on a leading commercial vision model. We identify facial attributes that cause vision systems to fail. △ Less

Submitted 29 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: Paper under review. Full text and results will be updated after acceptance

arXiv:2405.13077 [pdf, other]

GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation

Authors: Govind Ramesh, Yao Dou, Wei Xu

Abstract: Research on jailbreaking has been valuable for testing and understanding the safety and security issues of large language models (LLMs). In this paper, we introduce Iterative Refinement Induced Self-Jailbreak (IRIS), a novel approach that leverages the reflective capabilities of LLMs for jailbreaking with only black-box access. Unlike previous methods, IRIS simplifies the jailbreaking process by u… ▽ More Research on jailbreaking has been valuable for testing and understanding the safety and security issues of large language models (LLMs). In this paper, we introduce Iterative Refinement Induced Self-Jailbreak (IRIS), a novel approach that leverages the reflective capabilities of LLMs for jailbreaking with only black-box access. Unlike previous methods, IRIS simplifies the jailbreaking process by using a single model as both the attacker and target. This method first iteratively refines adversarial prompts through self-explanation, which is crucial for ensuring that even well-aligned LLMs obey adversarial instructions. IRIS then rates and enhances the output given the refined prompt to increase its harmfulness. We find IRIS achieves jailbreak success rates of 98% on GPT-4 and 92% on GPT-4 Turbo in under 7 queries. It significantly outperforms prior approaches in automatic, black-box and interpretable jailbreaking, while requiring substantially fewer queries, thereby establishing a new standard for interpretable jailbreaking methods. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2309.07277 [pdf, ps, other]

Limitations of Face Image Generation

Authors: Harrison Rosenberg, Shimaa Ahmed, Guruprasad V Ramesh, Ramya Korlakai Vinayak, Kassem Fawaz

Abstract: Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the conte… ▽ More Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the context of face generation. Utilizing a combination of qualitative and quantitative measures, including embedding-based metrics and user studies, we present a framework to audit the characteristics of generated faces conditioned on a set of social attributes. We applied our framework on faces generated through state-of-the-art text-to-image diffusion models. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. Furthermore, we present an analytical model that provides insights into how training data selection contributes to the performance of generative models. △ Less

Submitted 21 December, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: Accepted to The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

arXiv:2308.02013 [pdf, other]

Federated Representation Learning for Automatic Speech Recognition

Authors: Guruprasad V Ramesh, Gopinath Chennupati, Milind Rao, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo

Abstract: Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respec… ▽ More Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints. We use the speaker and chapter information in the unlabeled speech dataset, Libri-Light, to simulate non-IID speaker-siloed data distributions and pre-train an LSTM encoder with the Contrastive Predictive Coding framework with FedSGD. We show that the pre-trained ASR encoder in FL performs as well as a centrally pre-trained model and produces an improvement of 12-15% (WER) compared to no pre-training. We further adapt the federated pre-trained models to a new language, French, and show a 20% (WER) improvement over no pre-training. △ Less

Submitted 7 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: Accepted at ISCA SPSC Symposium 3rd Symposium on Security and Privacy in Speech Communication, 2023

arXiv:2307.00335 [pdf, other]

Single Sequence Prediction over Reasoning Graphs for Multi-hop QA

Authors: Gowtham Ramesh, Makesh Sreedhar, Junjie Hu

Abstract: Recent generative approaches for multi-hop question answering (QA) utilize the fusion-in-decoder method~\cite{izacard-grave-2021-leveraging} to generate a single sequence output which includes both a final answer and a reasoning path taken to arrive at that answer, such as passage titles and key facts from those passages. While such models can lead to better interpretability and high quantitative… ▽ More Recent generative approaches for multi-hop question answering (QA) utilize the fusion-in-decoder method~\cite{izacard-grave-2021-leveraging} to generate a single sequence output which includes both a final answer and a reasoning path taken to arrive at that answer, such as passage titles and key facts from those passages. While such models can lead to better interpretability and high quantitative scores, they often have difficulty accurately identifying the passages corresponding to key entities in the context, resulting in incorrect passage hops and a lack of faithfulness in the reasoning path. To address this, we propose a single-sequence prediction method over a local reasoning graph (\model)\footnote{Code/Models will be released at \url{https://github.com/gowtham1997/SeqGraph}} that integrates a graph structure connecting key entities in each context passage to relevant subsequent passages for each question. We use a graph neural network to encode this graph structure and fuse the resulting representations into the entity representations of the model. Our experiments show significant improvements in answer exact-match/F1 scores and faithfulness of grounding in the reasoning path on the HotpotQA dataset and achieve state-of-the-art numbers on the Musique dataset with only up to a 4\% increase in model parameters. △ Less

Submitted 1 July, 2023; originally announced July 2023.

arXiv:2212.05409 [pdf, other]

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages

Authors: Sumanth Doddapaneni, Rahul Aralikatte, Gowtham Ramesh, Shreya Goyal, Mitesh M. Khapra, Anoop Kunchukuttan, Pratyush Kumar

Abstract: Building Natural Language Understanding (NLU) capabilities for Indic languages, which have a collective speaker base of more than one billion speakers is absolutely crucial. In this work, we aim to improve the NLU capabilities of Indic languages by making contributions along 3 important axes (i) monolingual corpora (ii) NLU testsets (iii) multilingual LLMs focusing on Indic languages. Specifically… ▽ More Building Natural Language Understanding (NLU) capabilities for Indic languages, which have a collective speaker base of more than one billion speakers is absolutely crucial. In this work, we aim to improve the NLU capabilities of Indic languages by making contributions along 3 important axes (i) monolingual corpora (ii) NLU testsets (iii) multilingual LLMs focusing on Indic languages. Specifically, we curate the largest monolingual corpora, IndicCorp, with 20.9B tokens covering 24 languages from 4 language families - a 2.3x increase over prior work, while supporting 12 additional languages. Next, we create a human-supervised benchmark, IndicXTREME, consisting of nine diverse NLU tasks covering 20 languages. Across languages and tasks, IndicXTREME contains a total of 105 evaluation sets, of which 52 are new contributions to the literature. To the best of our knowledge, this is the first effort towards creating a standard benchmark for Indic languages that aims to test the multilingual zero-shot capabilities of pretrained language models. Finally, we train IndicBERT v2, a state-of-the-art model supporting all the languages. Averaged across languages and tasks, the model achieves an absolute improvement of 2 points over a strong baseline. The data and models are available at https://github.com/AI4Bharat/IndicBERT. △ Less

Submitted 24 May, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

Comments: ACL 2023

arXiv:2111.03945 [pdf, other]

Towards Building ASR Systems for the Next Billion Users

Authors: Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

Abstract: Recent methods in speech and language technology pretrain very LARGE models which are fine-tuned for specific tasks. However, the benefits of such LARGE models are often limited to a few resource rich languages of the world. In this work, we make multiple contributions towards building ASR systems for low resource languages from the Indian subcontinent. First, we curate 17,000 hours of raw speech… ▽ More Recent methods in speech and language technology pretrain very LARGE models which are fine-tuned for specific tasks. However, the benefits of such LARGE models are often limited to a few resource rich languages of the world. In this work, we make multiple contributions towards building ASR systems for low resource languages from the Indian subcontinent. First, we curate 17,000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance. Second, using this raw speech data we pretrain several variants of wav2vec style models for 40 Indian languages. Third, we analyze the pretrained models to find key features: codebook vectors of similar sounding phonemes are shared across languages, representations across layers are discriminative of the language family, and attention heads often pay attention within small local windows. Fourth, we fine-tune this model for downstream ASR for 9 languages and obtain state-of-the-art results on 3 public datasets, including on very low-resource languages such as Sinhala and Nepali. Our work establishes that multilingual pretraining is an effective strategy for building ASR systems for the linguistically diverse speakers of the Indian subcontinent. Our code, data and models are available publicly at https://indicnlp.ai4bharat.org/indicwav2vec/ and we hope they will help advance research in ASR for Indic languages. △ Less

Submitted 22 December, 2021; v1 submitted 6 November, 2021; originally announced November 2021.

arXiv:2110.04711 [pdf, other]

SuperShaper: Task-Agnostic Super Pre-training of BERT Models with Variable Hidden Dimensions

Authors: Vinod Ganesan, Gowtham Ramesh, Pratyush Kumar

Abstract: Task-agnostic pre-training followed by task-specific fine-tuning is a default approach to train NLU models. Such models need to be deployed on devices across the cloud and the edge with varying resource and accuracy constraints. For a given task, repeating pre-training and fine-tuning across tens of devices is prohibitively expensive. We propose SuperShaper, a task agnostic pre-training approach w… ▽ More Task-agnostic pre-training followed by task-specific fine-tuning is a default approach to train NLU models. Such models need to be deployed on devices across the cloud and the edge with varying resource and accuracy constraints. For a given task, repeating pre-training and fine-tuning across tens of devices is prohibitively expensive. We propose SuperShaper, a task agnostic pre-training approach which simultaneously pre-trains a large number of Transformer models by varying shapes, i.e., by varying the hidden dimensions across layers. This is enabled by a backbone network with linear bottleneck matrices around each Transformer layer which are sliced to generate differently shaped sub-networks. In spite of its simple design space and efficient implementation, SuperShaper discovers networks that effectively trade-off accuracy and model size: Discovered networks are more accurate than a range of hand-crafted and automatically searched networks on GLUE benchmarks. Further, we find two critical advantages of shape as a design variable for Neural Architecture Search (NAS): (a) heuristics of good shapes can be derived and networks found with these heuristics match and even improve on carefully searched networks across a range of parameter counts, and (b) the latency of networks across multiple CPUs and GPUs are insensitive to the shape and thus enable device-agnostic search. In summary, SuperShaper radically simplifies NAS for language models and discovers networks that generalize across tasks, parameter constraints, and devices. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2107.00676 [pdf, other]

A Primer on Pretrained Multilingual Language Models

Authors: Sumanth Doddapaneni, Gowtham Ramesh, Mitesh M. Khapra, Anoop Kunchukuttan, Pratyush Kumar

Abstract: Multilingual Language Models (\MLLMs) such as mBERT, XLM, XLM-R, \textit{etc.} have emerged as a viable option for bringing the power of pretraining to a large number of languages. Given their success in zero-shot transfer learning, there has emerged a large body of work in (i) building bigger \MLLMs~covering a large number of languages (ii) creating exhaustive benchmarks covering a wider variety… ▽ More Multilingual Language Models (\MLLMs) such as mBERT, XLM, XLM-R, \textit{etc.} have emerged as a viable option for bringing the power of pretraining to a large number of languages. Given their success in zero-shot transfer learning, there has emerged a large body of work in (i) building bigger \MLLMs~covering a large number of languages (ii) creating exhaustive benchmarks covering a wider variety of tasks and languages for evaluating \MLLMs~ (iii) analysing the performance of \MLLMs~on monolingual, zero-shot cross-lingual and bilingual tasks (iv) understanding the universal language patterns (if any) learnt by \MLLMs~ and (v) augmenting the (often) limited capacity of \MLLMs~ to improve their performance on seen or even unseen languages. In this survey, we review the existing literature covering the above broad areas of research pertaining to \MLLMs. Based on our survey, we recommend some promising directions of future research. △ Less

Submitted 23 December, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

arXiv:2104.05596 [pdf]

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages

Authors: Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Mahalakshmi J, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra

Abstract: We present Samanantar, the largest publicly available parallel corpora collection for Indic languages. The collection contains a total of 49.7 million sentence pairs between English and 11 Indic languages (from two language families). Specifically, we compile 12.4 million sentence pairs from existing, publicly-available parallel corpora, and additionally mine 37.4 million sentence pairs from the w… ▽ More We present Samanantar, the largest publicly available parallel corpora collection for Indic languages. The collection contains a total of 49.7 million sentence pairs between English and 11 Indic languages (from two language families). Specifically, we compile 12.4 million sentence pairs from existing, publicly-available parallel corpora, and additionally mine 37.4 million sentence pairs from the web, resulting in a 4x increase. We mine the parallel sentences from the web by combining many corpora, tools, and methods: (a) web-crawled monolingual corpora, (b) document OCR for extracting sentences from scanned documents, (c) multilingual representation models for aligning sentences, and (d) approximate nearest neighbor search for searching in a large collection of sentences. Human evaluation of samples from the newly mined corpora validate the high quality of the parallel sentences across 11 languages. Further, we extract 83.4 million sentence pairs between all 55 Indic language pairs from the English-centric parallel corpus using English as the pivot language. We trained multilingual NMT models spanning all these languages on Samanantar, which outperform existing models and baselines on publicly available benchmarks, such as FLORES, establishing the utility of Samanantar. Our data and models are available publicly at https://ai4bharat.iitm.ac.in/samanantar and we hope they will help advance research in NMT and multilingual NLP for Indic languages. △ Less

Submitted 12 June, 2023; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Accepted to the Transactions of the Association for Computational Linguistics (TACL)

arXiv:1304.7025 [pdf, other]

Recovery of bilevel causal signals with finite rate of innovation using positive sampling kernels

Authors: Gayatri Ramesh, Elie Atallah, Qiyu Sun

Abstract: Bilevel signal $x$ with maximal local rate of innovation $R$ is a continuous-time signal that takes only two values 0 and 1 and that there is at most one transition position in any time period of 1/R.In this note, we introduce a recovery method for bilevel causal signals $x$ with maximal local rate of innovation $R$ from their uniform samples $x*h(nT), n\ge 1$, where the sampling kernel $h$ is cau… ▽ More Bilevel signal $x$ with maximal local rate of innovation $R$ is a continuous-time signal that takes only two values 0 and 1 and that there is at most one transition position in any time period of 1/R.In this note, we introduce a recovery method for bilevel causal signals $x$ with maximal local rate of innovation $R$ from their uniform samples $x*h(nT), n\ge 1$, where the sampling kernel $h$ is causal and positive on $(0, T)$, and the sampling rate $τ:=1/T$ is at (or above) the maximal local rate of innovation $R$. We also discuss stability of the bilevel signal recovery procedure in the presence of bounded noises. △ Less

Submitted 25 April, 2013; originally announced April 2013.

arXiv:0912.0602 [pdf]

A Reliable and Fault Tolerant Routing for Optical WDM Networks

Authors: G. Ramesh, S. SundaraVadivelu

Abstract: In optical WDM networks, since each lightpath can carry a huge mount of traffic, failures may seriously damage the end user applications. Hence fault tolerance becomes an important issue on these networks. The light path which carries traffic during normal operation is called as primary path. The traffic is rerouted on a backup path in case of a failure. In this paper we propose to design a reli… ▽ More In optical WDM networks, since each lightpath can carry a huge mount of traffic, failures may seriously damage the end user applications. Hence fault tolerance becomes an important issue on these networks. The light path which carries traffic during normal operation is called as primary path. The traffic is rerouted on a backup path in case of a failure. In this paper we propose to design a reliable and fault tolerant routing algorithm for establishing primary and backup paths. In order to establish the primary path, this algorithm uses load balancing in which link cost metrics are estimated based on the current load of the links. In backup path setup, the source calculates the blocking probability through the received feedback from the destination by sending a small fraction of probe packets along the existing paths. It then selects the optimal light path with the lowest blocking probability. Based on the simulation results, we show that the reliable and fault tolerant routing algorithm reduces the blocking probability and latency while increasing the throughput and channel utilization. △ Less

Submitted 3 December, 2009; originally announced December 2009.

Comments: 7 pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS November 2009, ISSN 1947 5500, http://sites.google.com/site/ijcsis/

Report number: ISSN 1947 5500

Journal ref: International Journal of Computer Science and Information Security, IJCSIS, Vol. 6, No. 2, pp. 048-054, November 2009, USA

Showing 1–12 of 12 results for author: Ramesh, G