Search | arXiv e-print repository

Efficient Document Ranking with Learnable Late Interactions

Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer based on query and document token embeddings. However, these lightweight scorers are often hand-crafted, and there is no understanding of their approximation power; further, such scorers require access to individual document token embeddings, which imposes an increased latency and storage burden. In this paper, we propose novel learnable late-interaction models (LITE) that resolve these issues. Theoretically, we prove that LITE is a universal approximator of continuous scoring functions, even for relatively small embedding dimension. Empirically, LITE outperforms previous late-interaction models such as ColBERT on both in-domain and zero-shot re-ranking tasks. For instance, experiments on MS MARCO passage re-ranking show that LITE not only yields a model with better generalization, but also lowers latency and requires 0.25x storage compared to ColBERT. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2404.12063 [pdf, other]

FastVPINNs: Tensor-Driven Acceleration of VPINNs for Complex Geometries

Authors: Thivin Anandh, Divij Ghose, Himanshu Jain, Sashikumaar Ganesan

Abstract: Variational Physics-Informed Neural Networks (VPINNs) utilize a variational loss function to solve partial differential equations, mirroring Finite Element Analysis techniques. Traditional hp-VPINNs, while effective for high-frequency problems, are computationally intensive and scale poorly with increasing element counts, limiting their use in complex geometries. This work introduces FastVPINNs, a… ▽ More Variational Physics-Informed Neural Networks (VPINNs) utilize a variational loss function to solve partial differential equations, mirroring Finite Element Analysis techniques. Traditional hp-VPINNs, while effective for high-frequency problems, are computationally intensive and scale poorly with increasing element counts, limiting their use in complex geometries. This work introduces FastVPINNs, a tensor-based advancement that significantly reduces computational overhead and improves scalability. Using optimized tensor operations, FastVPINNs achieve a 100-fold reduction in the median training time per epoch compared to traditional hp-VPINNs. With proper choice of hyperparameters, FastVPINNs surpass conventional PINNs in both speed and accuracy, especially in problems with high-frequency solutions. Demonstrated effectiveness in solving inverse problems on complex domains underscores FastVPINNs' potential for widespread application in scientific and engineering challenges, opening new avenues for practical implementations in scientific machine learning. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 31 pages, 19 figures, 4 algorithms

arXiv:2402.14889 [pdf]

COBIAS: Contextual Reliability in Bias Assessment

Authors: Priyanshul Govil, Hemang Jain, Vamshi Krishna Bonagiri, Aman Chadha, Ponnurangam Kumaraguru, Manas Gaur, Sanorita Dey

Abstract: Large Language Models (LLMs) are trained on extensive web corpora, which enable them to understand and generate human-like text. However, this training process also results in inherent biases within the models. These biases arise from web data's diverse and often uncurated nature, containing various stereotypes and prejudices. Previous works on debiasing models rely on benchmark datasets to measur… ▽ More Large Language Models (LLMs) are trained on extensive web corpora, which enable them to understand and generate human-like text. However, this training process also results in inherent biases within the models. These biases arise from web data's diverse and often uncurated nature, containing various stereotypes and prejudices. Previous works on debiasing models rely on benchmark datasets to measure their method's performance. However, these datasets suffer from several pitfalls due to the highly subjective understanding of bias, highlighting a critical need for contextual exploration. We propose understanding the context of inputs by considering the diverse situations in which they may arise. Our contribution is two-fold: (i) we augment 2,291 stereotyped statements from two existing bias-benchmark datasets with points for adding context; (ii) we develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to assess a statement's contextual reliability in measuring bias. Our metric aligns with human judgment on contextual reliability of statements (Spearman's $ρ= 0.65, p = 3.4 * 10^{-60}$) and can be used to create reliable datasets, which would assist bias mitigation works. △ Less

Submitted 17 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2310.15141 [pdf, other]

SpecTr: Fast Speculative Decoding via Optimal Transport

Authors: Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix Yu

Abstract: Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow, and even prohibitive in certain tasks. One way to speed up sampling is $\textit{speculative decoding}$: use a small model to sample a $\textit{draft}$ (block or sequence of tokens), and then score a… ▽ More Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow, and even prohibitive in certain tasks. One way to speed up sampling is $\textit{speculative decoding}$: use a small model to sample a $\textit{draft}$ (block or sequence of tokens), and then score all tokens in the draft by the large language model in parallel. A subset of the tokens in the draft are accepted (and the rest rejected) based on a statistical method to guarantee that the final output follows the distribution of the large model. In this work, we provide a principled understanding of speculative decoding through the lens of optimal transport (OT) with $\textit{membership cost}$. This framework can be viewed as an extension of the well-known $\textit{maximal-coupling}$ problem. This new formulation enables us to generalize the speculative decoding method to allow for a set of $k$ candidates at the token-level, which leads to an improved optimal membership cost. We show that the optimal draft selection algorithm (transport plan) can be computed via linear programming, whose best-known runtime is exponential in $k$. We then propose a valid draft selection algorithm whose acceptance probability is $(1-1/e)$-optimal multiplicatively. Moreover, it can be computed in time almost linear with size of domain of a single token. Using this $new draft selection$ algorithm, we develop a new autoregressive sampling algorithm called $\textit{SpecTr}$, which provides speedup in decoding while ensuring that there is no quality degradation in the decoded output. We experimentally demonstrate that for state-of-the-art large language models, the proposed approach achieves a wall clock speedup of 2.13X, a further 1.37X speedup over speculative decoding on standard benchmarks. △ Less

Submitted 17 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2302.04149 [pdf, other]

Domain Adaptation of Synthetic Driving Datasets for Real-World Autonomous Driving

Authors: Koustav Mullick, Harshil Jain, Sanchit Gupta, Amit Arvind Kale

Abstract: While developing perception based deep learning models, the benefit of synthetic data is enormous. However, performance of networks trained with synthetic data for certain computer vision tasks degrade significantly when tested on real world data due to the domain gap between them. One of the popular solutions in bridging this gap between synthetic and actual world data is to frame it as a domain… ▽ More While developing perception based deep learning models, the benefit of synthetic data is enormous. However, performance of networks trained with synthetic data for certain computer vision tasks degrade significantly when tested on real world data due to the domain gap between them. One of the popular solutions in bridging this gap between synthetic and actual world data is to frame it as a domain adaptation task. In this paper, we propose and evaluate novel ways for the betterment of such approaches. In particular we build upon the method of UNIT-GAN. In normal GAN training for the task of domain translation, pairing of images from both the domains (viz, real and synthetic) is done randomly. We propose a novel method to efficiently incorporate semantic supervision into this pair selection, which helps in boosting the performance of the model along with improving the visual quality of such transformed images. We illustrate our empirical findings on Cityscapes \cite{cityscapes} and challenging synthetic dataset Synscapes. Though the findings are reported on the base network of UNIT-GAN, they can be easily extended to any other similar network. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2211.04367 [pdf, other]

Much Easier Said Than Done: Falsifying the Causal Relevance of Linear Decoding Methods

Authors: Lucas Hayne, Abhijit Suresh, Hunar Jain, Rahul Kumar, R. McKell Carter

Abstract: Linear classifier probes are frequently utilized to better understand how neural networks function. Researchers have approached the problem of determining unit importance in neural networks by probing their learned, internal representations. Linear classifier probes identify highly selective units as the most important for network function. Whether or not a network actually relies on high selectiv… ▽ More Linear classifier probes are frequently utilized to better understand how neural networks function. Researchers have approached the problem of determining unit importance in neural networks by probing their learned, internal representations. Linear classifier probes identify highly selective units as the most important for network function. Whether or not a network actually relies on high selectivity units can be tested by removing them from the network using ablation. Surprisingly, when highly selective units are ablated they only produce small performance deficits, and even then only in some cases. In spite of the absence of ablation effects for selective neurons, linear decoding methods can be effectively used to interpret network function, leaving their effectiveness a mystery. To falsify the exclusive role of selectivity in network function and resolve this contradiction, we systematically ablate groups of units in subregions of activation space. Here, we find a weak relationship between neurons identified by probes and those identified by ablation. More specifically, we find that an interaction between selectivity and the average activity of the unit better predicts ablation performance deficits for groups of units in AlexNet, VGG16, MobileNetV2, and ResNet101. Linear decoders are likely somewhat effective because they overlap with those units that are causally important for network function. Interpretability methods could be improved by focusing on causally important units. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: 6 pages, 3 figures, to be published in I Can't Believe It's Note Better Workshop at NeurIPS 2022

arXiv:2208.09015 [pdf, other]

Treeformer: Dense Gradient Trees for Efficient Attention Computation

Authors: Lovish Madaan, Srinadh Bhojanapalli, Himanshu Jain, Prateek Jain

Abstract: Standard inference and training with transformer based architectures scale quadratically with input sequence length. This is prohibitively large for a variety of applications especially in web-page translation, query-answering etc. Consequently, several approaches have been developed recently to speedup attention computation by enforcing different attention structures such as sparsity, low-rank, a… ▽ More Standard inference and training with transformer based architectures scale quadratically with input sequence length. This is prohibitively large for a variety of applications especially in web-page translation, query-answering etc. Consequently, several approaches have been developed recently to speedup attention computation by enforcing different attention structures such as sparsity, low-rank, approximating attention using kernels. In this work, we view attention computation as that of nearest neighbor retrieval, and use decision tree based hierarchical navigation to reduce the retrieval cost per query token from linear in sequence length to nearly logarithmic. Based on such hierarchical navigation, we design Treeformer which can use one of two efficient attention layers -- TF-Attention and TC-Attention. TF-Attention computes the attention in a fine-grained style, while TC-Attention is a coarse attention layer which also ensures that the gradients are "dense". To optimize such challenging discrete layers, we propose a two-level bootstrapped training method. Using extensive experiments on standard NLP benchmarks, especially for long-sequences, we demonstrate that our Treeformer architecture can be almost as accurate as baseline Transformer while using 30x lesser FLOPs in the attention layer. Compared to Linformer, the accuracy can be as much as 12% higher while using similar FLOPs in the attention layer. △ Less

Submitted 17 March, 2023; v1 submitted 18 August, 2022; originally announced August 2022.

Comments: ICLR 2023

arXiv:2208.06825 [pdf, other]

Teacher Guided Training: An Efficient Framework for Knowledge Transfer

Authors: Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar

Abstract: The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also necessitates a large amount of (labeled or unlabeled) training data. In this paper, we propose the teacher-guided training (TGT) framework for training a… ▽ More The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also necessitates a large amount of (labeled or unlabeled) training data. In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data. TGT exploits the fact that the teacher has acquired a good representation of the underlying data domain, which typically corresponds to a much lower dimensional manifold than the input space. Furthermore, we can use the teacher to explore input space more efficiently through sampling or gradient-based methods; thus, making TGT especially attractive for limited data or long-tail settings. We formally capture this benefit of proposed data-domain exploration in our generalization bounds. We find that TGT can improve accuracy on several image classification benchmarks as well as a range of text classification and retrieval tasks. △ Less

Submitted 14 August, 2022; originally announced August 2022.

arXiv:2201.07612 [pdf, other]

ReGNL: Rapid Prediction of GDP during Disruptive Events using Nightlights

Authors: Rushabh Musthyala, Rudrajit Kargupta, Hritish Jain, Dipanjan Chakraborty

Abstract: Policy makers often make decisions based on parameters such as GDP, unemployment rate, industrial output, etc. The primary methods to obtain or even estimate such information are resource intensive and time consuming. In order to make timely and well-informed decisions, it is imperative to be able to come up with proxies for these parameters which can be sampled quickly and efficiently, especially… ▽ More Policy makers often make decisions based on parameters such as GDP, unemployment rate, industrial output, etc. The primary methods to obtain or even estimate such information are resource intensive and time consuming. In order to make timely and well-informed decisions, it is imperative to be able to come up with proxies for these parameters which can be sampled quickly and efficiently, especially during disruptive events, like the COVID-19 pandemic. Recently, there has been a lot of focus on using remote sensing data for this purpose. The data has become cheaper to collect compared to surveys, and can be available in real time. In this work, we present Regional GDP NightLight (ReGNL), a neural network based model which is trained on a custom dataset of historical nightlights and GDP data along with the geographical coordinates of a place, and estimates the GDP of the place, given the other parameters. Taking the case of 50 US states, we find that ReGNL is disruption-agnostic and is able to predict the GDP for both normal years (2019) and for years with a disruptive event (2020). ReGNL outperforms timeseries ARIMA methods for prediction, even during the pandemic. Following from our findings, we make a case for building infrastructures to collect and make available granular data, especially in resource-poor geographies, so that these can be leveraged for policy making during disruptive events. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2112.03252 [pdf, other]

CSG0: Continual Urban Scene Generation with Zero Forgetting

Authors: Himalaya Jain, Tuan-Hung Vu, Patrick Pérez, Matthieu Cord

Abstract: With the rapid advances in generative adversarial networks (GANs), the visual quality of synthesised scenes keeps improving, including for complex urban scenes with applications to automated driving. We address in this work a continual scene generation setup in which GANs are trained on a stream of distinct domains; ideally, the learned models should eventually be able to generate new scenes in al… ▽ More With the rapid advances in generative adversarial networks (GANs), the visual quality of synthesised scenes keeps improving, including for complex urban scenes with applications to automated driving. We address in this work a continual scene generation setup in which GANs are trained on a stream of distinct domains; ideally, the learned models should eventually be able to generate new scenes in all seen domains. This setup reflects the real-life scenario where data are continuously acquired in different places at different times. In such a continual setup, we aim for learning with zero forgetting, \IE, with no degradation in synthesis quality over earlier domains due to catastrophic forgetting. To this end, we introduce a novel framework that not only (i) enables seamless knowledge transfer in continual training but also (ii) guarantees zero forgetting with a small overhead cost. While being more memory efficient, thanks to continual learning, our model obtains better synthesis quality as compared against the brute-force solution that trains one full model for each domain. Especially, under extreme low-data regimes, our approach outperforms the brute-force one by a large margin. △ Less

Submitted 2 May, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

Comments: Published at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022 Workshop on Continual Learning

arXiv:2111.06685 [pdf, other]

doi 10.1145/3437963.3441810

DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Authors: Kunal Dahiya, Deepak Saini, Anshul Mittal, Ankush Shaw, Kushal Dave, Akshay Soni, Himanshu Jain, Sumeet Agarwal, Manik Varma

Abstract: Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub… ▽ More Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub-tasks each of which can be trained accurately and efficiently. Choosing different components for the four sub-tasks allows DeepXML to generate a family of algorithms with varying trade-offs between accuracy and scalability. In particular, DeepXML yields the Astec algorithm that could be 2-12% more accurate and 5-30x faster to train than leading deep extreme classifiers on publically available short text datasets. Astec could also efficiently train on Bing short text datasets containing up to 62 million labels while making predictions for billions of users and data points per day on commodity hardware. This allowed Astec to be deployed on the Bing search engine for a number of short text applications ranging from matching user queries to advertiser bid phrases to showing personalized ads where it yielded significant gains in click-through-rates, coverage, revenue and other online metrics over state-of-the-art techniques currently in production. DeepXML's code is available at https://github.com/Extreme-classification/deepxml △ Less

Submitted 12 November, 2021; originally announced November 2021.

ACM Class: F.2.2; I.2.7

Journal ref: Web Search and Data Mining 2021

arXiv:2110.06821 [pdf, other]

Leveraging redundancy in attention with Reuse Transformers

Authors: Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

Abstract: Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision. However, a typical Transformer model computes such pairwise attention scores repeatedly for the same sequence, in multiple heads in multiple layers. We systematically analyze the empirical similari… ▽ More Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision. However, a typical Transformer model computes such pairwise attention scores repeatedly for the same sequence, in multiple heads in multiple layers. We systematically analyze the empirical similarity of these scores across heads and layers and find them to be considerably redundant, especially adjacent layers showing high similarity. Motivated by these findings, we propose a novel architecture that reuses attention scores computed in one layer in multiple subsequent layers. Experiments on a number of standard benchmarks show that reusing attention delivers performance equivalent to or better than standard transformers, while reducing both compute and memory usage. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2106.08823 [pdf, other]

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

Authors: Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit

Abstract: State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length. In this paper, we investigate the global structure of attention scores computed using this dot product mechanism on a typical distribution of inputs, and study the principal components of their variation. Through eigen analysis of full atten… ▽ More State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length. In this paper, we investigate the global structure of attention scores computed using this dot product mechanism on a typical distribution of inputs, and study the principal components of their variation. Through eigen analysis of full attention score matrices, as well as of their individual rows, we find that most of the variation among attention scores lie in a low-dimensional eigenspace. Moreover, we find significant overlap between these eigenspaces for different layers and even different transformer models. Based on this, we propose to compute scores only for a partial subset of token pairs, and use them to estimate scores for the remaining pairs. Beyond investigating the accuracy of reconstructing attention scores themselves, we investigate training transformer models that employ these approximations, and analyze the effect on overall accuracy. Our analysis and the proposed method provide insights into how to balance the benefits of exact pair-wise attention and its significant computational expense. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: 14 pages

arXiv:2106.01629 [pdf, other]

Semantic Palette: Guiding Scene Generation with Class Proportions

Authors: Guillaume Le Moing, Tuan-Hung Vu, Himalaya Jain, Patrick Pérez, Matthieu Cord

Abstract: Despite the recent progress of generative adversarial networks (GANs) at synthesizing photo-realistic images, producing complex urban scenes remains a challenging problem. Previous works break down scene generation into two consecutive phases: unconditional semantic layout synthesis and image synthesis conditioned on layouts. In this work, we propose to condition layout generation as well for high… ▽ More Despite the recent progress of generative adversarial networks (GANs) at synthesizing photo-realistic images, producing complex urban scenes remains a challenging problem. Previous works break down scene generation into two consecutive phases: unconditional semantic layout synthesis and image synthesis conditioned on layouts. In this work, we propose to condition layout generation as well for higher semantic control: given a vector of class proportions, we generate layouts with matching composition. To this end, we introduce a conditional framework with novel architecture designs and learning objectives, which effectively accommodates class proportions to guide the scene generation process. The proposed architecture also allows partial layout editing with interesting applications. Thanks to the semantic control, we can produce layouts close to the real distribution, helping enhance the whole scene generation process. On different metrics and urban scene benchmarks, our models outperform existing baselines. Moreover, we demonstrate the merit of our approach for data augmentation: semantic segmenters trained on real layout-image pairs along with additional ones generated by our approach outperform models only trained on real pairs. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: Accepted to IEEE CVPR 2021

arXiv:2011.03705 [pdf, other]

Blind Motion Deblurring through SinGAN Architecture

Authors: Harshil Jain, Rohit Patil, Indra Deep Mastan, Shanmuganathan Raman

Abstract: Blind motion deblurring involves reconstructing a sharp image from an observation that is blurry. It is a problem that is ill-posed and lies in the categories of image restoration problems. The training data-based methods for image deblurring mostly involve training models that take a lot of time. These models are data-hungry i.e., they require a lot of training data to generate satisfactory resul… ▽ More Blind motion deblurring involves reconstructing a sharp image from an observation that is blurry. It is a problem that is ill-posed and lies in the categories of image restoration problems. The training data-based methods for image deblurring mostly involve training models that take a lot of time. These models are data-hungry i.e., they require a lot of training data to generate satisfactory results. Recently, there are various image feature learning methods developed which relieve us of the need for training data and perform image restoration and image synthesis, e.g., DIP, InGAN, and SinGAN. SinGAN is a generative model that is unconditional and could be learned from a single natural image. This model primarily captures the internal distribution of the patches which are present in the image and is capable of generating samples of varied diversity while preserving the visual content of the image. Images generated from the model are very much like real natural images. In this paper, we focus on blind motion deblurring through SinGAN architecture. △ Less

Submitted 7 November, 2020; originally announced November 2020.

Comments: Deep Internal Learning: Training with no prior examples. ECCV'2020 Workshop

arXiv:2011.00057 [pdf, other]

doi 10.1186/s12859-021-04249-7

A Sui Generis QA Approach using RoBERTa for Adverse Drug Event Identification

Authors: Harshit Jain, Nishant Raj, Suyash Mishra

Abstract: Extraction of adverse drug events from biomedical literature and other textual data is an important component to monitor drug-safety and this has attracted attention of many researchers in healthcare. Existing works are more pivoted around entity-relation extraction using bidirectional long short term memory networks (Bi-LSTM) which does not attain the best feature representations. In this paper,… ▽ More Extraction of adverse drug events from biomedical literature and other textual data is an important component to monitor drug-safety and this has attracted attention of many researchers in healthcare. Existing works are more pivoted around entity-relation extraction using bidirectional long short term memory networks (Bi-LSTM) which does not attain the best feature representations. In this paper, we introduce a question answering framework that exploits the robustness, masking and dynamic attention capabilities of RoBERTa by a technique of domain adaptation and attempt to overcome the aforementioned limitations. Our model outperforms the prior work by 9.53% F1-Score. △ Less

Submitted 30 October, 2020; originally announced November 2020.

Journal ref: BMC Bioinformatics 22, 330 (2021)

arXiv:2010.07447 [pdf, ps, other]

Semantic Label Smoothing for Sequence to Sequence Problems

Authors: Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Abstract: Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation, is challenging: the large target output space of such problems makes it intractable to apply label smoothing over all possible outputs. Most existing approache… ▽ More Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation, is challenging: the large target output space of such problems makes it intractable to apply label smoothing over all possible outputs. Most existing approaches for seq2seq settings either do token level smoothing, or smooth over sequences generated by randomly substituting tokens in the target sequence. Unlike these works, in this paper, we propose a technique that smooths over \emph{well formed} relevant sequences that not only have sufficient n-gram overlap with the target sequence, but are also \emph{semantically similar}. Our method shows a consistent and significant improvement over the state-of-the-art techniques on different datasets. △ Less

Submitted 14 October, 2020; originally announced October 2020.

arXiv:2010.05223 [pdf, other]

End to End Binarized Neural Networks for Text Classification

Authors: Harshil Jain, Akshat Agarwal, Kumar Shridhar, Denis Kleyko

Abstract: Deep neural networks have demonstrated their superior performance in almost every Natural Language Processing task, however, their increasing complexity raises concerns. In particular, these networks require high expenses on computational hardware, and training budget is a concern for many. Even for a trained network, the inference phase can be too demanding for resource-constrained devices, thus… ▽ More Deep neural networks have demonstrated their superior performance in almost every Natural Language Processing task, however, their increasing complexity raises concerns. In particular, these networks require high expenses on computational hardware, and training budget is a concern for many. Even for a trained network, the inference phase can be too demanding for resource-constrained devices, thus limiting its applicability. The state-of-the-art transformer models are a vivid example. Simplifying the computations performed by a network is one way of relaxing the complexity requirements. In this paper, we propose an end to end binarized neural network architecture for the intent classification task. In order to fully utilize the potential of end to end binarization, both input representations (vector embeddings of tokens statistics) and the classifier are binarized. We demonstrate the efficiency of such architecture on the intent classification of short texts over three datasets and for text classification with a larger dataset. The proposed architecture achieves comparable to the state-of-the-art results on standard intent classification datasets while utilizing ~ 20-40% lesser memory and training time. Furthermore, the individual components of the architecture, such as binarized vector embeddings of documents or binarized classifiers, can be used separately with not necessarily fully binary architectures. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: 14 pages. Accepted at the SustaiNLP Workshop on Simple and Efficient Natural Language Processing at EMNLP 2020

arXiv:2007.07314 [pdf, other]

Long-tail learning via logit adjustment

Authors: Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Abstract: Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these chall… ▽ More Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples. This poses a challenge for generalisation on such labels, and also makes naïve learning biased towards dominant labels. In this paper, we present two simple modifications of standard softmax cross-entropy training to cope with these challenges. Our techniques revisit the classic idea of logit adjustment based on the label frequencies, either applied post-hoc to a trained model, or enforced in the loss during training. Such adjustment encourages a large relative margin between logits of rare versus dominant labels. These techniques unify and generalise several recent proposals in the literature, while possessing firmer statistical grounding and empirical performance. △ Less

Submitted 9 July, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: Published as a conference paper in ICLR 2021

arXiv:2007.06555 [pdf, other]

Adversarial robustness via robust low rank representations

Authors: Pranjal Awasthi, Himanshu Jain, Ankit Singh Rawat, Aravindan Vijayaraghavan

Abstract: Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees. Our first contribution is for certified robustness to perturbations measured i… ▽ More Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees. Our first contribution is for certified robustness to perturbations measured in $\ell_2$ norm. We exploit low rank data representations to provide improved guarantees over state-of-the-art randomized smoothing-based approaches on standard benchmark datasets such as CIFAR-10 and CIFAR-100. Our second contribution is for the more challenging setting of certified robustness to perturbations measured in $\ell_\infty$ norm. We demonstrate empirically that natural low rank representations have inherent robustness properties, that can be leveraged to provide significantly better guarantees for certified robustness to $\ell_\infty$ perturbations in those representations. Our certificate of $\ell_\infty$ robustness relies on a natural quantity involving the $\infty \to 2$ matrix operator norm associated with the representation, to translate robustness guarantees from $\ell_2$ to $\ell_\infty$ perturbations. A key technical ingredient for our certification guarantees is a fast algorithm with provable guarantees based on the multiplicative weights update method to provide upper bounds on the above matrix norm. Our algorithmic guarantees improve upon the state of the art for this problem, and may be of independent interest. △ Less

Submitted 1 August, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: fixed a bug in the proof of Proposition B.2

arXiv:2002.12096 [pdf, other]

Action Quality Assessment using Siamese Network-Based Deep Metric Learning

Authors: Hiteshi Jain, Gaurav Harit, Avinash Sharma

Abstract: Automated vision-based score estimation models can be used as an alternate opinion to avoid judgment bias. In the past works the score estimation models were learned by regressing the video representations to the ground truth score provided by the judges. However such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores… ▽ More Automated vision-based score estimation models can be used as an alternate opinion to avoid judgment bias. In the past works the score estimation models were learned by regressing the video representations to the ground truth score provided by the judges. However such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores more explicable is to compare the given action video with a reference video. This would capture the temporal variations w.r.t. the reference video and map those variations to the final score. In this work, we propose a new action scoring system as a two-phase system: (1) A Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges; (2) A Score Estimation Module that uses the first module to find the resemblance of a video to a reference video in order to give the assessment score. The proposed scoring model has been tested for Olympics Diving and Gymnastic vaults and the model outperforms the existing state-of-the-art scoring models. △ Less

Submitted 27 February, 2020; originally announced February 2020.

Comments: 12 pages, 5 Figures, 8 tables

arXiv:2001.09599 [pdf, other]

Achieving Multi-Port Memory Performance on Single-Port Memory with Coding Techniques

Authors: Hardik Jain, Matthew Edwards, Ethan Elenberg, Ankit Singh Rawat, Sriram Vishwanath

Abstract: Many performance critical systems today must rely on performance enhancements, such as multi-port memories, to keep up with the increasing demand of memory-access capacity. However, the large area footprints and complexity of existing multi-port memory designs limit their applicability. This paper explores a coding theoretic framework to address this problem. In particular, this paper introduces a… ▽ More Many performance critical systems today must rely on performance enhancements, such as multi-port memories, to keep up with the increasing demand of memory-access capacity. However, the large area footprints and complexity of existing multi-port memory designs limit their applicability. This paper explores a coding theoretic framework to address this problem. In particular, this paper introduces a framework to encode data across multiple single-port memory banks in order to {\em algorithmically} realize the functionality of multi-port memory. This paper proposes three code designs with significantly less storage overhead compared to the existing replication based emulations of multi-port memories. To further improve performance, we also demonstrate a memory controller design that utilizes redundancy across coded memory banks to more efficiently schedule read and write requests sent across multiple cores. Furthermore, guided by DRAM traces, the paper explores {\em dynamic coding} techniques to improve the efficiency of the coding based memory design. We then show significant performance improvements in critical word read and write latency in the proposed coded-memory design when compared to a traditional uncoded-memory design. △ Less

Submitted 27 January, 2020; originally announced January 2020.

Comments: 10 pages, 20 figures, ICICT 2020 conference

arXiv:2001.08184 [pdf, other]

doi 10.1145/3366423.3380201

GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation

Authors: Nikhil Goyal, Harsh Vardhan Jain, Sayan Ranu

Abstract: Graph generative models have been extensively studied in the data mining literature. While traditional techniques are based on generating structures that adhere to a pre-decided distribution, recent techniques have shifted towards learning this distribution directly from the data. While learning-based approaches have imparted significant improvement in quality, some limitations remain to be addres… ▽ More Graph generative models have been extensively studied in the data mining literature. While traditional techniques are based on generating structures that adhere to a pre-decided distribution, recent techniques have shifted towards learning this distribution directly from the data. While learning-based approaches have imparted significant improvement in quality, some limitations remain to be addressed. First, learning graph distributions introduces additional computational overhead, which limits their scalability to large graph databases. Second, many techniques only learn the structure and do not address the need to also learn node and edge labels, which encode important semantic information and influence the structure itself. Third, existing techniques often incorporate domain-specific rules and lack generalizability. Fourth, the experimentation of existing techniques is not comprehensive enough due to either using weak evaluation metrics or focusing primarily on synthetic or small datasets. In this work, we develop a domain-agnostic technique called GraphGen to overcome all of these limitations. GraphGen converts graphs to sequences using minimum DFS codes. Minimum DFS codes are canonical labels and capture the graph structure precisely along with the label information. The complex joint distributions between structure and semantic labels are learned through a novel LSTM architecture. Extensive experiments on million-sized, real graph datasets show GraphGen to be 4 times faster on average than state-of-the-art techniques while being significantly better in quality across a comprehensive set of 11 different metrics. Our code is released at https://github.com/idea-iitd/graphgen. △ Less

Submitted 8 April, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

Comments: Fixed typo in Table 1; The Web Conference (WWW) 2020

arXiv:1912.01540 [pdf, other]

QUEST: Quantized embedding space for transferring knowledge

Authors: Himalaya Jain, Spyros Gidaris, Nikos Komodakis, Patrick Pérez, Matthieu Cord

Abstract: Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow the teacher by matching the teacher's output, feature maps or their distribution. In this work, we propose a novel way to achieve this goal: by distilling the… ▽ More Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow the teacher by matching the teacher's output, feature maps or their distribution. In this work, we propose a novel way to achieve this goal: by distilling the knowledge through a quantized space. According to our method, the teacher's feature maps are quantized to represent the main visual concepts encompassed in the feature maps. The student is then asked to predict the quantized representation, which thus forms the task that the student uses to learn from the teacher. Despite its simplicity, we show that our approach is able to yield results that improve the state of the art on knowledge distillation. To that end, we provide an extensive evaluation across several network architectures and most commonly used benchmark datasets. △ Less

Submitted 17 July, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

Comments: Accepted at ECCV 2020

arXiv:1911.08271 [pdf]

Python vs. R: A Text Mining Approach for analyzing the Research Trends in Scopus Database

Authors: Neeraj Bhanot, Harwinder Singh, Divyansu Sharma, Harshit Jain, Shreyansh Jain

Abstract: In the contemporary world, with the incubation of advanced technologies and tremendous outbursts of research works, analyzing big data to incorporate research strategies becomes more helpful using the tools and techniques presented in the current research scenario. This paper indeed tries to tackle the most prominent challenges relating to big data analysis by utilizing a text mining approach to a… ▽ More In the contemporary world, with the incubation of advanced technologies and tremendous outbursts of research works, analyzing big data to incorporate research strategies becomes more helpful using the tools and techniques presented in the current research scenario. This paper indeed tries to tackle the most prominent challenges relating to big data analysis by utilizing a text mining approach to analyze research data published in the field of production management as a case to begin with. The study has been conducted by considering research data of International Journal of Production Research (IJPR) indexed in Scopus between 1961-2017 by dividing the analysis incurred into 3 fragments being 1961-1990, 1991-2010 and finally 2011-2017 as a case to highlight the focus of journal. This has indeed provided multi-faceted benefits such as increasing the effectiveness of the procured data with well-established comparisons between R and Python Programming along with providing detailed research trends on the research work incubated. The results of the study highlighted some most prominent topics in the existing IJPR literature such as system's optimization, supplier selection, process design, etc. providing well-established details relating to ongoing research works. The study also compared both languages suiting to a particular field of study for better comprehension and vastness of the research topics. The current research work is one of the part of a copyright work with registration number SW-10310/2018 titled Program for Analyzing Key Trends in Research Data-set. It has been designed in Python for carrying out detailed content analysis based on the available research database in bib format as in the current context it has been applied for IJPR journal and can be replicated on articles of any domain found using keyword search. △ Less

Submitted 10 November, 2019; originally announced November 2019.

Comments: This study aims to help researchers by developing a Python based algorithm to analyse research trends using Scopus Database considering large amount of information in different domains as it will help the beginners to get fair enough idea of research being carried out in their fields of interest. A comparison with R has also been done to find as in which platform provides more relevant results

arXiv:1911.05161 [pdf, other]

All It Takes is 20 Questions!: A Knowledge Graph Based Approach

Authors: Alvin Dey, Harsh Kumar Jain, Vikash Kumar Pandey, Tanmoy Chakraborty

Abstract: 20 Questions (20Q) is a two-player game. One player is the answerer, and the other is a questioner. The answerer chooses an entity from a specified domain and does not reveal this to the other player. The questioner can ask at most 20 questions to the answerer to guess the entity. The answerer can reply to the questions asked by saying yes/no/maybe. In this paper, we propose a novel approach based… ▽ More 20 Questions (20Q) is a two-player game. One player is the answerer, and the other is a questioner. The answerer chooses an entity from a specified domain and does not reveal this to the other player. The questioner can ask at most 20 questions to the answerer to guess the entity. The answerer can reply to the questions asked by saying yes/no/maybe. In this paper, we propose a novel approach based on the knowledge graph for designing the 20Q game on Bollywood movies. The system assumes the role of the questioner and asks questions to predict the movie thought by the answerer. It uses a probabilistic learning model for template-based question generation and answers prediction. A dataset of interrelated entities is represented as a weighted knowledge graph, which updates as the game progresses by asking questions. An evolutionary approach helps the model to gain a better understanding of user choices and predicts the answer in fewer questions over time. Experimental results show that our model was able to predict the correct movie in less than 10 questions for more than half of the times the game was played. This kind of model can be used to design applications that can detect diseases by asking questions based on symptoms, improving recommendation systems, etc. △ Less

Submitted 12 November, 2019; originally announced November 2019.

arXiv:1911.02888 [pdf, other]

This dataset does not exist: training models from generated images

Authors: Victor Besnier, Himalaya Jain, Andrei Bursuc, Matthieu Cord, Patrick Pérez

Abstract: Current generative networks are increasingly proficient in generating high-resolution realistic images. These generative networks, especially the conditional ones, can potentially become a great tool for providing new image datasets. This naturally brings the question: Can we train a classifier only on the generated data? This potential availability of nearly unlimited amounts of training data cha… ▽ More Current generative networks are increasingly proficient in generating high-resolution realistic images. These generative networks, especially the conditional ones, can potentially become a great tool for providing new image datasets. This naturally brings the question: Can we train a classifier only on the generated data? This potential availability of nearly unlimited amounts of training data challenges standard practices for training machine learning models, which have been crafted across the years for limited and fixed size datasets. In this work we investigate this question and its related challenges. We identify ways to improve significantly the performance over naive training on randomly generated images with regular heuristics. We propose three standalone techniques that can be applied at different stages of the pipeline, i.e., data generation, training on generated data, and deploying on real data. We evaluate our proposed approaches on a subset of the ImageNet dataset and show encouraging results compared to classifiers trained on real images. △ Less

Submitted 7 November, 2019; originally announced November 2019.

arXiv:1904.01886 [pdf, other]

DADA: Depth-aware Domain Adaptation in Semantic Segmentation

Authors: Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez

Abstract: Unsupervised domain adaptation (UDA) is important for applications where large scale annotation of representative data is challenging. For semantic segmentation in particular, it helps deploy on real "target domain" data models that are trained on annotated images from a different "source domain", notably a virtual environment. To this end, most previous works consider semantic segmentation as the… ▽ More Unsupervised domain adaptation (UDA) is important for applications where large scale annotation of representative data is challenging. For semantic segmentation in particular, it helps deploy on real "target domain" data models that are trained on annotated images from a different "source domain", notably a virtual environment. To this end, most previous works consider semantic segmentation as the only mode of supervision for source domain data, while ignoring other, possibly available, information like depth. In this work, we aim at exploiting at best such a privileged information while training the UDA model. We propose a unified depth-aware UDA framework that leverages in several complementary ways the knowledge of dense depth in the source domain. As a result, the performance of the trained semantic segmentation model on the target domain is boosted. Our novel approach indeed achieves state-of-the-art performance on different challenging synthetic-2-real benchmarks. △ Less

Submitted 19 August, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: Accepted in ICCV'19

arXiv:1811.12833 [pdf, other]

ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation

Authors: Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez

Abstract: Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe… ▽ More Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) entropy loss and (ii) adversarial loss respectively. We demonstrate state-of-the-art performance in semantic segmentation on two challenging "synthetic-2-real" set-ups and show that the approach can also be used for detection. △ Less

Submitted 17 April, 2019; v1 submitted 30 November, 2018; originally announced November 2018.

Comments: Accepted in CVPR'19. Code is available at https://github.com/valeoai/ADVENT

arXiv:1712.04480 [pdf, other]

Learning a Complete Image Indexing Pipeline

Authors: Himalaya Jain, Joaquin Zepeda, Patrick Pérez, Rémi Gribonval

Abstract: To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to the latter, the former continues to be based on u… ▽ More To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to the latter, the former continues to be based on unsupervised clustering in the literature. In this work, we propose a first system that learns both components within a unifying neural framework of structured binary encoding. △ Less

Submitted 12 December, 2017; originally announced December 2017.

arXiv:1711.10283 [pdf, other]

Data Backup Network Formation with Heterogeneous Agents

Authors: Harshit Jain, Guduru Sai Teja, Pramod Mane, Kapil Ahuja, Nagarajan Krishnamurthy

Abstract: Social storage systems are becoming increasingly popular compared to the existing data backup systems like local, centralized and P2P systems. An endogenously built symmetric social storage model and its aspects like the utility of each agent, bilateral stability, contentment, and efficiency have been extensively discussed in Mane et. al. (2017). We include heterogeneity in this model by using the… ▽ More Social storage systems are becoming increasingly popular compared to the existing data backup systems like local, centralized and P2P systems. An endogenously built symmetric social storage model and its aspects like the utility of each agent, bilateral stability, contentment, and efficiency have been extensively discussed in Mane et. al. (2017). We include heterogeneity in this model by using the concept of Social Range Matrix from Kuznetsov et. al (2010). Now, each agent is concerned about its perceived utility, which is a linear combination of its utility as well as others utilities (depending upon whether the pair are friends, enemies or do not care about each other). We derive conditions when two agents may want to add or delete a link, and provide an algorithm that checks if a bilaterally stable network is possible or not. Finally, we take some special Social Range Matrices and prove that under certain conditions on network parameters, a bilaterally stable network is unique. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Comments: 3 Pages, double columns, 1 figure, extended abstract

MSC Class: 91

arXiv:1710.03027 [pdf]

A Bottom Up Procedure for Text Line Segmentation of Latin Script

Authors: Himanshu Jain, Archana Praveen Kumar

Abstract: In this paper we present a bottom up procedure for segmentation of text lines written or printed in the Latin script. The proposed method uses a combination of image morphology, feature extraction and Gaussian mixture model to perform this task. The experimental results show the validity of the procedure. In this paper we present a bottom up procedure for segmentation of text lines written or printed in the Latin script. The proposed method uses a combination of image morphology, feature extraction and Gaussian mixture model to perform this task. The experimental results show the validity of the procedure. △ Less

Submitted 9 October, 2017; originally announced October 2017.

Comments: Accepted and presented at the IEEE conference "International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2017"

MSC Class: 68T45

arXiv:1710.03025 [pdf]

A Sequential Thinning Algorithm For Multi-Dimensional Binary Patterns

Authors: Himanshu Jain, Archana Praveen Kumar

Abstract: Thinning is the removal of contour pixels/points of connected components in an image to produce their skeleton with retained connectivity and structural properties. The output requirements of a thinning procedure often vary with application. This paper proposes a sequential algorithm that is very easy to understand and modify based on application to perform the thinning of multi-dimensional binary… ▽ More Thinning is the removal of contour pixels/points of connected components in an image to produce their skeleton with retained connectivity and structural properties. The output requirements of a thinning procedure often vary with application. This paper proposes a sequential algorithm that is very easy to understand and modify based on application to perform the thinning of multi-dimensional binary patterns. The algorithm was tested on 2D and 3D patterns and showed very good results. Moreover, comparisons were also made with two of the state-of-the-art methods used for 2D patterns. The results obtained prove the validity of the procedure. △ Less

Submitted 16 November, 2017; v1 submitted 9 October, 2017; originally announced October 2017.

MSC Class: 68T10

arXiv:1708.02932 [pdf, other]

SUBIC: A supervised, structured binary code for image search

Authors: Himalaya Jain, Joaquin Zepeda, Patrick Pérez, Rémi Gribonval

Abstract: For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and… ▽ More For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and novel architectures ushered in by the deep learning revolution. We hence propose herein a novel method to make deep convolutional neural networks produce supervised, compact, structured binary codes for visual search. Our method makes use of a novel block-softmax non-linearity and of batch-based entropy losses that together induce structure in the learned encodings. We show that our method outperforms state-of-the-art compact representations based on deep hashing or structured quantization in single and cross-domain category retrieval, instance retrieval and classification. We make our code and models publicly available online. △ Less

Submitted 9 August, 2017; originally announced August 2017.

Comments: Accepted at ICCV 2017 (Spotlight)

arXiv:1706.06651 [pdf, other]

Passive Classification of Source Printer using Text-line-level Geometric Distortion Signatures from Scanned Images of Printed Documents

Authors: Hardik Jain, Gaurav Gupta, Sharad Joshi, Nitin Khanna

Abstract: In this digital era, one thing that still holds the convention is a printed archive. Printed documents find their use in many critical domains such as contract papers, legal tenders and proof of identity documents. As more advanced printing, scanning and image editing techniques are becoming available, forgeries on these legal tenders pose a serious threat. Ability to easily and reliably identify… ▽ More In this digital era, one thing that still holds the convention is a printed archive. Printed documents find their use in many critical domains such as contract papers, legal tenders and proof of identity documents. As more advanced printing, scanning and image editing techniques are becoming available, forgeries on these legal tenders pose a serious threat. Ability to easily and reliably identify source printer of a printed document can help a lot in reducing this menace. During printing procedure, printer hardware introduces certain distortions in printed characters' locations and shapes which are invisible to naked eyes. These distortions are referred as geometric distortions, their profile (or signature) is generally unique for each printer and can be used for printer classification purpose. This paper proposes a set of features for characterizing text-line-level geometric distortions, referred as geometric distortion signatures and presents a novel system to use them for identification of the origin of a printed document. Detailed experiments performed on a set of thirteen printers demonstrate that the proposed system achieves state of the art performance and gives much higher accuracy under small training size constraint. For four training and six test pages of three different fonts, the proposed method gives 99\% classification accuracy. △ Less

Submitted 20 June, 2017; originally announced June 2017.

Comments: 20 pages

arXiv:1608.03308 [pdf, other]

Approximate search with quantized sparse representations

Authors: Himalaya Jain, Patrick Pérez, Rémi Gribonval, Joaquin Zepeda, Hervé Jégou

Abstract: This paper tackles the task of storing a large collection of vectors, such as visual descriptors, and of searching in it. To this end, we propose to approximate database vectors by constrained sparse coding, where possible atom weights are restricted to belong to a finite subset. This formulation encompasses, as particular cases, previous state-of-the-art methods such as product or residual quanti… ▽ More This paper tackles the task of storing a large collection of vectors, such as visual descriptors, and of searching in it. To this end, we propose to approximate database vectors by constrained sparse coding, where possible atom weights are restricted to belong to a finite subset. This formulation encompasses, as particular cases, previous state-of-the-art methods such as product or residual quantization. As opposed to traditional sparse coding methods, quantized sparse coding includes memory usage as a design constraint, thereby allowing us to index a large collection such as the BIGANN billion-sized benchmark. Our experiments, carried out on standard benchmarks, show that our formulation leads to competitive solutions when considering different trade-offs between learning/coding time, index size and search quality. △ Less

Submitted 10 August, 2016; originally announced August 2016.

Comments: ECCV 2016

arXiv:1507.02743 [pdf, ps, other]

Locally Non-linear Embeddings for Extreme Multi-label Learning

Authors: Kush Bhatia, Himanshu Jain, Purushottam Kar, Prateek Jain, Manik Varma

Abstract: The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensio… ▽ More The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches have been unable to deliver high prediction accuracies or scale to large problems as the low rank assumption is violated in most real world applications. This paper develops the X-One classifier to address both limitations. The main technical contribution in X-One is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows X-One to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors. We conducted extensive experiments on several real-world as well as benchmark data sets and compared our method against state-of-the-art methods for extreme multi-label classification. Experiments reveal that X-One can make significantly more accurate predictions then the state-of-the-art methods including both embeddings (by as much as 35%) as well as trees (by as much as 6%). X-One can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods. △ Less

Submitted 9 July, 2015; originally announced July 2015.

Showing 1–37 of 37 results for author: Jain, H