-
EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search
Authors:
Kamalkumar Rathinasamy,
Jayarama Nettar,
Amit Kumar,
Vishal Manchanda,
Arun Vijayakumar,
Ayush Kataria,
Venkateshprasanna Manjunath,
Chidambaram GS,
Jaskirat Singh Sodhi,
Shoeb Shaikh,
Wasim Akhtar Khan,
Prashant Singh,
Tanishq Dattatray Ige,
Vipin Tiwari,
Rajab Ali Mondal,
Harshini K,
S Reka,
Chetana Amancharla,
Faiz ur Rahman,
Harikrishnan P A,
Indraneel Saha,
Bhavya Tiwary,
Navin Shankar Patel,
Pradeep T S,
Balaji A J
, et al. (2 additional authors not shown)
Abstract:
Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components.…
▽ More
Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components. While pre-trained embeddings may exhibit proximity or disparity based on their original training objectives, they might not fully align with the unique characteristics of enterprise-specific data, leading to suboptimal alignment with the retrieval goals of enterprise environments. In this paper, we propose a methodology to fine-tune pre-trained embedding models specifically for enterprise environments. By adapting the embeddings to better suit the retrieval tasks prevalent in enterprises, we aim to enhance the performance of information retrieval solutions. We discuss the process of fine-tuning, its effect on retrieval accuracy, and the potential benefits for enterprise information management. Our findings demonstrate the efficacy of fine-tuned embedding models in improving the precision and relevance of search results in enterprise settings.
△ Less
Submitted 18 May, 2024;
originally announced June 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1110 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 8 August, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model
Authors:
Leonie Weissweiler,
Valentin Hofmann,
Anjali Kantharuban,
Anna Cai,
Ritam Dutt,
Amey Hengle,
Anubha Kabra,
Atharva Kulkarni,
Abhishek Vijayakumar,
Haofei Yu,
Hinrich Schütze,
Kemal Oflazer,
David R. Mortensen
Abstract:
Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (i…
▽ More
Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko's (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results -- through the lens of morphology -- cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.
△ Less
Submitted 26 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Reading Comprehension as Natural Language Inference: A Semantic Analysis
Authors:
Anshuman Mishra,
Dhruvesh Patel,
Aparna Vijayakumar,
Xiang Li,
Pavan Kapanipathi,
Kartik Talamadupula
Abstract:
In the recent past, Natural language Inference (NLI) has gained significant attention, particularly given its promise for downstream NLP tasks. However, its true impact is limited and has not been well studied. Therefore, in this paper, we explore the utility of NLI for one of the most prominent downstream tasks, viz. Question Answering (QA). We transform the one of the largest available MRC datas…
▽ More
In the recent past, Natural language Inference (NLI) has gained significant attention, particularly given its promise for downstream NLP tasks. However, its true impact is limited and has not been well studied. Therefore, in this paper, we explore the utility of NLI for one of the most prominent downstream tasks, viz. Question Answering (QA). We transform the one of the largest available MRC dataset (RACE) to an NLI form, and compare the performances of a state-of-the-art model (RoBERTa) on both these forms. We propose new characterizations of questions, and evaluate the performance of QA and NLI models on these categories. We highlight clear categories for which the model is able to perform better when the data is presented in a coherent entailment form, and a structured question-answer concatenation form, respectively.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
Looking Beyond Sentence-Level Natural Language Inference for Downstream Tasks
Authors:
Anshuman Mishra,
Dhruvesh Patel,
Aparna Vijayakumar,
Xiang Li,
Pavan Kapanipathi,
Kartik Talamadupula
Abstract:
In recent years, the Natural Language Inference (NLI) task has garnered significant attention, with new datasets and models achieving near human-level performance on it. However, the full promise of NLI -- particularly that it learns knowledge that should be generalizable to other downstream NLP tasks -- has not been realized. In this paper, we study this unfulfilled promise from the lens of two d…
▽ More
In recent years, the Natural Language Inference (NLI) task has garnered significant attention, with new datasets and models achieving near human-level performance on it. However, the full promise of NLI -- particularly that it learns knowledge that should be generalizable to other downstream NLP tasks -- has not been realized. In this paper, we study this unfulfilled promise from the lens of two downstream tasks: question answering (QA), and text summarization. We conjecture that a key difference between the NLI datasets and these downstream tasks concerns the length of the premise; and that creating new long premise NLI datasets out of existing QA datasets is a promising avenue for training a truly generalizable NLI model. We validate our conjecture by showing competitive results on the task of QA and obtaining the best reported results on the task of Checking Factual Correctness of Summaries.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
Manifold-based Test Generation for Image Classifiers
Authors:
Taejoon Byun,
Abhishek Vijayakumar,
Sanjai Rayadurgam,
Darren Cofer
Abstract:
Neural networks used for image classification tasks in critical applications must be tested with sufficient realistic data to assure their correctness. To effectively test an image classification neural network, one must obtain realistic test data adequate enough to inspire confidence that differences between the implicit requirements and the learned model would be exposed. This raises two challen…
▽ More
Neural networks used for image classification tasks in critical applications must be tested with sufficient realistic data to assure their correctness. To effectively test an image classification neural network, one must obtain realistic test data adequate enough to inspire confidence that differences between the implicit requirements and the learned model would be exposed. This raises two challenges: first, an adequate subset of the data points must be carefully chosen to inspire confidence, and second, the implicit requirements must be meaningfully extrapolated to data points beyond those in the explicit training set. This paper proposes a novel framework to address these challenges. Our approach is based on the premise that patterns in a large input data space can be effectively captured in a smaller manifold space, from which similar yet novel test cases---both the input and the label---can be sampled and generated. A variant of Conditional Variational Autoencoder (CVAE) is used for capturing this manifold with a generative function, and a search technique is applied on this manifold space to efficiently find fault-revealing inputs. Experiments show that this approach enables generation of thousands of realistic yet fault-revealing test cases efficiently even for well-trained models.
△ Less
Submitted 15 February, 2020;
originally announced February 2020.
-
Physical Design Obfuscation of Hardware: A Comprehensive Investigation of Device- and Logic-Level Techniques
Authors:
Arunkumar Vijayakumar,
Vinay C. Patil,
Daniel E. Holcomb,
Christof Paar,
Sandip Kundu
Abstract:
The threat of hardware reverse engineering is a growing concern for a large number of applications. A main defense strategy against reverse engineering is hardware obfuscation. In this paper, we investigate physical obfuscation techniques, which perform alterations of circuit elements that are difficult or impossible for an adversary to observe. The examples of such stealthy manipulations are chan…
▽ More
The threat of hardware reverse engineering is a growing concern for a large number of applications. A main defense strategy against reverse engineering is hardware obfuscation. In this paper, we investigate physical obfuscation techniques, which perform alterations of circuit elements that are difficult or impossible for an adversary to observe. The examples of such stealthy manipulations are changes in the doping concentrations or dielectric manipulations. An attacker will, thus, extract a netlist, which does not correspond to the logic function of the device-under-attack. This approach of camouflaging has garnered recent attention in the literature. In this paper, we expound on this promising direction to conduct a systematic end-to-end study of the VLSI design process to find multiple ways to obfuscate a circuit for hardware security. This paper makes three major contributions. First, we provide a categorization of the available physical obfuscation techniques as it pertains to various design stages. There is a large and multidimensional design space for introducing obfuscated elements and mechanisms, and the proposed taxonomy is helpful for a systematic treatment. Second, we provide a review of the methods that have been proposed or in use. Third, we present recent and new device and logic-level techniques for design obfuscation. For each technique considered, we discuss feasibility of the approach and assess likelihood of its detection. Then we turn our focus to open research questions, and conclude with suggestions for future research directions.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Input Prioritization for Testing Neural Networks
Authors:
Taejoon Byun,
Vaibhav Sharma,
Abhishek Vijayakumar,
Sanjai Rayadurgam,
Darren Cofer
Abstract:
Deep neural networks (DNNs) are increasingly being adopted for sensing and control functions in a variety of safety and mission-critical systems such as self-driving cars, autonomous air vehicles, medical diagnostics, and industrial robotics. Failures of such systems can lead to loss of life or property, which necessitates stringent verification and validation for providing high assurance. Though…
▽ More
Deep neural networks (DNNs) are increasingly being adopted for sensing and control functions in a variety of safety and mission-critical systems such as self-driving cars, autonomous air vehicles, medical diagnostics, and industrial robotics. Failures of such systems can lead to loss of life or property, which necessitates stringent verification and validation for providing high assurance. Though formal verification approaches are being investigated, testing remains the primary technique for assessing the dependability of such systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining test oracle data---the expected output, a.k.a. label, for a given input---is high, which significantly impacts the amount and quality of testing that can be performed. Thus, prioritizing input data for testing DNNs in meaningful ways to reduce the cost of labeling can go a long way in increasing testing efficacy. This paper proposes using gauges of the DNN's sentiment derived from the computation performed by the model, as a means to identify inputs that are likely to reveal weaknesses. We empirically assessed the efficacy of three such sentiment measures for prioritization---confidence, uncertainty, and surprise---and compare their effectiveness in terms of their fault-revealing capability and retraining effectiveness. The results indicate that sentiment measures can effectively flag inputs that expose unacceptable DNN behavior. For MNIST models, the average percentage of inputs correctly flagged ranged from 88% to 94.8%.
△ Less
Submitted 11 January, 2019;
originally announced January 2019.
-
Sound-Word2Vec: Learning Word Representations Grounded in Sounds
Authors:
Ashwin K Vijayakumar,
Ramakrishna Vedantam,
Devi Parikh
Abstract:
To be able to interact better with humans, it is crucial for machines to understand sound - a primary modality of human perception. Previous works have used sound to learn embeddings for improved generic textual similarity assessment. In this work, we treat sound as a first-class citizen, studying downstream textual tasks which require aural grounding. To this end, we propose sound-word2vec - a ne…
▽ More
To be able to interact better with humans, it is crucial for machines to understand sound - a primary modality of human perception. Previous works have used sound to learn embeddings for improved generic textual similarity assessment. In this work, we treat sound as a first-class citizen, studying downstream textual tasks which require aural grounding. To this end, we propose sound-word2vec - a new embedding scheme that learns specialized word embeddings grounded in sounds. For example, we learn that two seemingly (semantically) unrelated concepts, like leaves and paper are similar due to the similar rustling sounds they make. Our embeddings prove useful in textual tasks requiring aural reasoning like text-based sound retrieval and discovering foley sound effects (used in movies). Moreover, our embedding space captures interesting dependencies between words and onomatopoeia and outperforms prior work on aurally-relevant word relatedness datasets such as AMEN and ASLex.
△ Less
Submitted 29 August, 2017; v1 submitted 5 March, 2017;
originally announced March 2017.
-
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
Authors:
Ashwin K Vijayakumar,
Michael Cogswell,
Ramprasath R. Selvaraju,
Qing Sun,
Stefan Lee,
David Crandall,
Dhruv Batra
Abstract:
Neural sequence models are widely used to model time-series data. Equally ubiquitous is the usage of beam search (BS) as an approximate inference algorithm to decode output sequences from these models. BS explores the search space in a greedy left-right fashion retaining only the top-B candidates - resulting in sequences that differ only slightly from each other. Producing lists of nearly identica…
▽ More
Neural sequence models are widely used to model time-series data. Equally ubiquitous is the usage of beam search (BS) as an approximate inference algorithm to decode output sequences from these models. BS explores the search space in a greedy left-right fashion retaining only the top-B candidates - resulting in sequences that differ only slightly from each other. Producing lists of nearly identical sequences is not only computationally wasteful but also typically fails to capture the inherent ambiguity of complex AI tasks. To overcome this problem, we propose Diverse Beam Search (DBS), an alternative to BS that decodes a list of diverse outputs by optimizing for a diversity-augmented objective. We observe that our method finds better top-1 solutions by controlling for the exploration and exploitation of the search space - implying that DBS is a better search algorithm. Moreover, these gains are achieved with minimal computational or memory over- head as compared to beam search. To demonstrate the broad applicability of our method, we present results on image captioning, machine translation and visual question generation using both standard quantitative metrics and qualitative human studies. Further, we study the role of diversity for image-grounded language generation tasks as the complexity of the image changes. We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.
△ Less
Submitted 22 October, 2018; v1 submitted 7 October, 2016;
originally announced October 2016.
-
Heredity for generalized power domination
Authors:
Paul Dorbec,
Seethu Varghese,
Ambat Vijayakumar
Abstract:
In this paper, we study the behaviour of the generalized power domination number of a graph by small changes on the graph, namely edge and vertex deletion and edge contraction. We prove optimal bounds for $γ\_{p,k}(G-e)$, $γ\_{p,k}(G/e)$ and for $γ\_{p,k}(G-v)$ in terms of $γ\_{p,k}(G)$, and give examples for which these bounds are tight. We characterize all graphs for which…
▽ More
In this paper, we study the behaviour of the generalized power domination number of a graph by small changes on the graph, namely edge and vertex deletion and edge contraction. We prove optimal bounds for $γ\_{p,k}(G-e)$, $γ\_{p,k}(G/e)$ and for $γ\_{p,k}(G-v)$ in terms of $γ\_{p,k}(G)$, and give examples for which these bounds are tight. We characterize all graphs for which $γ\_{p,k}(G-e) = γ\_{p,k}(G)+1$ for any edge $e$. We also consider the behaviour of the propagation radius of graphs by similar modifications.
△ Less
Submitted 23 March, 2016;
originally announced March 2016.
-
We Are Humor Beings: Understanding and Predicting Visual Humor
Authors:
Arjun Chandrasekaran,
Ashwin K. Vijayakumar,
Stanislaw Antol,
Mohit Bansal,
Dhruv Batra,
C. Lawrence Zitnick,
Devi Parikh
Abstract:
Humor is an integral part of human lives. Despite being tremendously impactful, it is perhaps surprising that we do not have a detailed understanding of humor yet. As interactions between humans and AI systems increase, it is imperative that these systems are taught to understand subtleties of human expressions such as humor. In this work, we are interested in the question - what content in a scen…
▽ More
Humor is an integral part of human lives. Despite being tremendously impactful, it is perhaps surprising that we do not have a detailed understanding of humor yet. As interactions between humans and AI systems increase, it is imperative that these systems are taught to understand subtleties of human expressions such as humor. In this work, we are interested in the question - what content in a scene causes it to be funny? As a first step towards understanding visual humor, we analyze the humor manifested in abstract scenes and design computational models for them. We collect two datasets of abstract scenes that facilitate the study of humor at both the scene-level and the object-level. We analyze the funny scenes and explore the different types of humor depicted in them via human studies. We model two tasks that we believe demonstrate an understanding of some aspects of visual humor. The tasks involve predicting the funniness of a scene and altering the funniness of a scene. We show that our models perform well quantitatively, and qualitatively through human studies. Our datasets are publicly available.
△ Less
Submitted 5 May, 2016; v1 submitted 14 December, 2015;
originally announced December 2015.