-
Preventing Reward Hacking with Occupancy Measure Regularization
Authors:
Cassidy Laidlaw,
Shivam Singhal,
Anca Dragan
Abstract:
Reward hacking occurs when an agent performs very well with respect to a "proxy" reward function (which may be hand-specified or learned), but poorly with respect to the unknown true reward. Since ensuring good alignment between the proxy and true reward is extremely difficult, one approach to prevent reward hacking is optimizing the proxy conservatively. Prior work has particularly focused on enf…
▽ More
Reward hacking occurs when an agent performs very well with respect to a "proxy" reward function (which may be hand-specified or learned), but poorly with respect to the unknown true reward. Since ensuring good alignment between the proxy and true reward is extremely difficult, one approach to prevent reward hacking is optimizing the proxy conservatively. Prior work has particularly focused on enforcing the learned policy to behave similarly to a "safe" policy by penalizing the KL divergence between their action distributions (AD). However, AD regularization doesn't always work well since a small change in action distribution at a single state can lead to potentially calamitous outcomes, while large changes might not be indicative of any dangerous activity. Our insight is that when reward hacking, the agent visits drastically different states from those reached by the safe policy, causing large deviations in state occupancy measure (OM). Thus, we propose regularizing based on the OM divergence between policies instead of AD divergence to prevent reward hacking. We theoretically establish that OM regularization can more effectively avoid large drops in true reward. Then, we empirically demonstrate in a variety of realistic environments that OM divergence is superior to AD divergence for preventing reward hacking by regularizing towards a safe policy. Furthermore, we show that occupancy measure divergence can also regularize learned policies away from reward hacking behavior. Our code and data are available at https://github.com/cassidylaidlaw/orpo
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Effective Backdoor Mitigation Depends on the Pre-training Objective
Authors:
Sahil Verma,
Gantavya Bhatt,
Avi Schwarzschild,
Soumye Singhal,
Arnav Mohanty Das,
Chirag Shah,
John P Dickerson,
Jeff Bilmes
Abstract:
Despite the advanced capabilities of contemporary machine learning (ML) models, they remain vulnerable to adversarial and backdoor attacks. This vulnerability is particularly concerning in real-world deployments, where compromised models may exhibit unpredictable behavior in critical scenarios. Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for…
▽ More
Despite the advanced capabilities of contemporary machine learning (ML) models, they remain vulnerable to adversarial and backdoor attacks. This vulnerability is particularly concerning in real-world deployments, where compromised models may exhibit unpredictable behavior in critical scenarios. Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for pre-training multimodal models, as these datasets may harbor backdoors. Various techniques have been proposed to mitigate the effects of backdooring in these models such as CleanCLIP which is the current state-of-the-art approach. In this work, we demonstrate that the efficacy of CleanCLIP in mitigating backdoors is highly dependent on the particular objective used during model pre-training. We observe that stronger pre-training objectives correlate with harder to remove backdoors behaviors. We show this by training multimodal models on two large datasets consisting of 3 million (CC3M) and 6 million (CC6M) datapoints, under various pre-training objectives, followed by poison removal using CleanCLIP. We find that CleanCLIP is ineffective when stronger pre-training objectives are used, even with extensive hyperparameter tuning. Our findings underscore critical considerations for ML practitioners who pre-train models using large-scale web-curated data and are concerned about potential backdoor threats. Notably, our results suggest that simpler pre-training objectives are more amenable to effective backdoor removal. This insight is pivotal for practitioners seeking to balance the trade-offs between using stronger pre-training objectives and security against backdoor attacks.
△ Less
Submitted 5 December, 2023; v1 submitted 25 November, 2023;
originally announced November 2023.
-
Navigating Resource Conflicts: Co-opetition and Fairness
Authors:
Shiksha Singhal
Abstract:
In today's dynamic and interconnected world, resource constraints pose significant challenges across various domains, ranging from networks, logistics and manufacturing to project management and optimization, etc. Resource-constrained problems (RCPs) represent a class of complex computational problems that require efficient allocation and utilization of limited resources to achieve optimal outcome…
▽ More
In today's dynamic and interconnected world, resource constraints pose significant challenges across various domains, ranging from networks, logistics and manufacturing to project management and optimization, etc. Resource-constrained problems (RCPs) represent a class of complex computational problems that require efficient allocation and utilization of limited resources to achieve optimal outcomes. This thesis aims to delve into such problems involving multiple agents, where agents aim to enhance their own payoffs, or a neutral moderator aims to maximise the system revenue while distributing the resources appropriately among all agents. In the former type of problems, agents may seek collaboration to achieve higher individual shares, resulting in a cooperative game with competition, i.e., co-opetition. Cooperative and non-cooperative game theory tools are utilized to analyze such games. On the other hand, for the latter kind of problems, we use tools from optimization and Markov decision processes.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Social Optimal Freshness in Multi-Source, Multi-Channel Systems via MDP
Authors:
Shiksha Singhal,
Veeraruna Kavitha,
Vidya Shankar
Abstract:
Many systems necessitate frequent and consistent updates of a specific information. Often this information is updated regularly, where an old packet becomes completely obsolete in the presence of a new packet. In this context, we consider a system with multiple sources, each equipped with a storage buffer of size one, communicating to a common destination via d orthogonal channels. In each slot, t…
▽ More
Many systems necessitate frequent and consistent updates of a specific information. Often this information is updated regularly, where an old packet becomes completely obsolete in the presence of a new packet. In this context, we consider a system with multiple sources, each equipped with a storage buffer of size one, communicating to a common destination via d orthogonal channels. In each slot, the packets arrive at each source with certain probability and occupy the buffer (by discarding the old packet if any), and each transfer (to the destination) is successful with certain other probability. Thus in any slot, there are two (Age of Information) AoI-measures for each source: one corresponding to the information at the source itself and the other corresponding to the information of the same source available at the destination; some sources may not even have the packet to transmit. The aim of the controller at the destination is to maintain the freshness of information of all the sources, to the best extent possible -- it aims to design an optimal scheduling policy that assigns in each slot, a subset of sources with packets (at maximum d) for transmission. This is achieved using an appropriate Markov Decision Process (MDP) framework, where the objective function is the sum of Average AoIs (AAoI) of all the sources. We derive a very simple stationary policy that is epsilon-optimal -- in any slot, order the sources with packets in the decreasing order of the differences in AoI at the destination and the source and choose the top sources for transmission. With moderate number of sources (less than 30), the AAoI reduces in the range of 30-90%.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
On the ubiquity of duopolies in constant sum congestion games
Authors:
Shiksha Singhal,
Veeraruna Kavitha,
Jayakrishnan Nair
Abstract:
We analyse a coalition formation game between strategic service providers of a congestible service. The key novelty of our formulation is that it is a constant sum game, i.e., the total payoff across all service providers (or coalitions of providers) is fixed, and dictated by the size of the market. The game thus captures the tension between resource pooling (to benefit from the resulting statisti…
▽ More
We analyse a coalition formation game between strategic service providers of a congestible service. The key novelty of our formulation is that it is a constant sum game, i.e., the total payoff across all service providers (or coalitions of providers) is fixed, and dictated by the size of the market. The game thus captures the tension between resource pooling (to benefit from the resulting statistical economies of scale) and competition between coalitions over market share. In a departure from the prior literature on resource pooling for congestible services, we show that the grand coalition is in general not stable, once we allow for competition over market share. In fact, under classical notions of stability (defined via blocking by any coalition), we show that no partition is stable. This motivates us to introduce more restricted (and relevant) notions of blocking; interestingly, we find that the stable configurations under these novel notions of stability are duopolies, where the dominant coalition exploits its economies of scale to corner a disproportionate market share. Furthermore, we completely characterise the stable duopolies in heavy and light traffic regimes.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
SSS at SemEval-2023 Task 10: Explainable Detection of Online Sexism using Majority Voted Fine-Tuned Transformers
Authors:
Sriya Rallabandi,
Sanchit Singhal,
Pratinav Seth
Abstract:
This paper describes our submission to Task 10 at SemEval 2023-Explainable Detection of Online Sexism (EDOS), divided into three subtasks. The recent rise in social media platforms has seen an increase in disproportionate levels of sexism experienced by women on social media platforms. This has made detecting and explaining online sexist content more important than ever to make social media safer…
▽ More
This paper describes our submission to Task 10 at SemEval 2023-Explainable Detection of Online Sexism (EDOS), divided into three subtasks. The recent rise in social media platforms has seen an increase in disproportionate levels of sexism experienced by women on social media platforms. This has made detecting and explaining online sexist content more important than ever to make social media safer and more accessible for women. Our approach consists of experimenting and finetuning BERT-based models and using a Majority Voting ensemble model that outperforms individual baseline model scores. Our system achieves a macro F1 score of 0.8392 for Task A, 0.6092 for Task B, and 0.4319 for Task C.
△ Less
Submitted 23 April, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
CoReFusion: Contrastive Regularized Fusion for Guided Thermal Super-Resolution
Authors:
Aditya Kasliwal,
Pratinav Seth,
Sriya Rallabandi,
Sanchit Singhal
Abstract:
Thermal imaging has numerous advantages over regular visible-range imaging since it performs well in low-light circumstances. Super-Resolution approaches can broaden their usefulness by replicating accurate high-resolution thermal pictures using measurements from low-cost, low-resolution thermal sensors. Because of the spectral range mismatch between the images, Guided Super-Resolution of thermal…
▽ More
Thermal imaging has numerous advantages over regular visible-range imaging since it performs well in low-light circumstances. Super-Resolution approaches can broaden their usefulness by replicating accurate high-resolution thermal pictures using measurements from low-cost, low-resolution thermal sensors. Because of the spectral range mismatch between the images, Guided Super-Resolution of thermal images utilizing visible range images is difficult. However, In case of failure to capture Visible Range Images can prevent the operations of applications in critical areas. We present a novel data fusion framework and regularization technique for Guided Super Resolution of Thermal images. The proposed architecture is computationally in-expensive and lightweight with the ability to maintain performance despite missing one of the modalities, i.e., high-resolution RGB image or the lower-resolution thermal image, and is designed to be robust in the presence of missing data. The proposed method presents a promising solution to the frequently occurring problem of missing modalities in a real-world scenario. Code is available at https://github.com/Kasliwal17/CoReFusion .
△ Less
Submitted 24 April, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Language Is Not All You Need: Aligning Perception with Language Models
Authors:
Shaohan Huang,
Li Dong,
Wenhui Wang,
Yaru Hao,
Saksham Singhal,
Shuming Ma,
Tengchao Lv,
Lei Cui,
Owais Khan Mohammed,
Barun Patra,
Qiang Liu,
Kriti Aggarwal,
Zewen Chi,
Johan Bjorck,
Vishrav Chaudhary,
Subhojit Som,
Xia Song,
Furu Wei
Abstract:
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal co…
▽ More
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. We evaluate various settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting, on a wide range of tasks without any gradient updates or finetuning. Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP (directly fed with document images), (ii) perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and (iii) vision tasks, such as image recognition with descriptions (specifying classification via text instructions). We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.
△ Less
Submitted 1 March, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Performance evaluation of deep segmentation models for Contrails detection
Authors:
Akshat Bhandari,
Sriya Rallabandi,
Sanchit Singhal,
Aditya Kasliwal,
Pratinav Seth
Abstract:
Contrails, short for condensation trails, are line-shaped ice clouds produced by aircraft engine exhaust when they fly through cold and humid air. They generate a greenhouse effect by absorbing or directing back to Earth approximately 33% of emitted outgoing longwave radiation. They account for over half of the climate change resulting from aviation activities. Avoiding contrails and adjusting fli…
▽ More
Contrails, short for condensation trails, are line-shaped ice clouds produced by aircraft engine exhaust when they fly through cold and humid air. They generate a greenhouse effect by absorbing or directing back to Earth approximately 33% of emitted outgoing longwave radiation. They account for over half of the climate change resulting from aviation activities. Avoiding contrails and adjusting flight routes could be an inexpensive and effective way to reduce their impact. An accurate, automated, and reliable detection algorithm is required to develop and evaluate contrail avoidance strategies. Advancement in contrail detection has been severely limited due to several factors, primarily due to a lack of quality-labeled data. Recently, proposed a large human-labeled Landsat-8 contrails dataset. Each contrail is carefully labeled with various inputs in various scenes of Landsat-8 satellite imagery. In this work, we benchmark several popular segmentation models with combinations of different loss functions and encoder backbones. This work is the first to apply state-of-the-art segmentation techniques to detect contrails in low-orbit satellite imagery. Our work can also be used as an open benchmark for contrail segmentation and is publicly available.
△ Less
Submitted 4 November, 2023; v1 submitted 27 November, 2022;
originally announced November 2022.
-
Squeeze flow of micro-droplets: convolutional neural network with trainable and tunable refinement
Authors:
Aryan Mehboudi,
Shrawan Singhal,
S. V. Sreenivasan
Abstract:
We propose a platform based on neural networks to solve the image-to-image translation problem in the context of squeeze flow of micro-droplets. In the first part of this paper, we present the governing partial differential equations to lay out the underlying physics of the problem. We also discuss our developed Python package, sqflow, which can potentially serve as free, flexible, and scalable st…
▽ More
We propose a platform based on neural networks to solve the image-to-image translation problem in the context of squeeze flow of micro-droplets. In the first part of this paper, we present the governing partial differential equations to lay out the underlying physics of the problem. We also discuss our developed Python package, sqflow, which can potentially serve as free, flexible, and scalable standardized benchmarks in the fields of machine learning and computer vision. In the second part of this paper, we introduce a residual convolutional neural network to solve the corresponding inverse problem: to translate a high-resolution (HR) imprint image with a specific liquid film thickness to a low-resolution (LR) droplet pattern image capable of producing the given imprint image for an appropriate spread time of droplets. We propose a neural network architecture that learns to systematically tune the refinement level of its residual convolutional blocks by using the function approximators that are trained to map a given input parameter (film thickness) to an appropriate refinement level indicator. We use multiple stacks of convolutional layers the output of which is translated according to the refinement level indicators provided by the directly-connected function approximators. Together with a non-linear activation function, such a translation mechanism enables the HR imprint image to be refined sequentially in multiple steps until the target LR droplet pattern image is revealed. The proposed platform can be potentially applied to data compression and data encryption. The developed package and datasets are publicly available on GitHub at https://github.com/sqflow/sqflow.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Authors:
Barun Patra,
Saksham Singhal,
Shaohan Huang,
Zewen Chi,
Li Dong,
Furu Wei,
Vishrav Chaudhary,
Xia Song
Abstract:
In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications. We show that going beyond English-centric bitexts, coupled with a novel sampling strategy aimed at reducing…
▽ More
In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications. We show that going beyond English-centric bitexts, coupled with a novel sampling strategy aimed at reducing under-utilization of training data, substantially boosts performance across model sizes for both Electra and MLM pre-training objectives. We introduce XY-LENT: X-Y bitext enhanced Language ENcodings using Transformers which not only achieves state-of-the-art performance over 5 cross-lingual tasks within all model size bands, is also competitive across bands. Our XY-LENT XL variant outperforms XLM-RXXL and exhibits competitive performance with mT5 XXL while being 5x and 6x smaller respectively. We then show that our proposed method helps ameliorate the curse of multilinguality, with the XY-LENT XL achieving 99.3% GLUE performance and 98.5% SQuAD 2.0 performance compared to a SoTA English only model in the same size band. We then analyze our models performance on extremely low resource languages and posit that scaling alone may not be sufficient for improving the performance in this scenario
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Foundation Transformers
Authors:
Hongyu Wang,
Shuming Ma,
Shaohan Huang,
Li Dong,
Wenhui Wang,
Zhiliang Peng,
Yu Wu,
Payal Bajaj,
Saksham Singhal,
Alon Benhaim,
Barun Patra,
Zhun Liu,
Vishrav Chaudhary,
Xia Song,
Furu Wei
Abstract:
A big convergence of model architectures across language, vision, speech, and multimodal is emerging. However, under the same name "Transformers", the above areas use different implementations for better performance, e.g., Post-LayerNorm for BERT, and Pre-LayerNorm for GPT and vision Transformers. We call for the development of Foundation Transformer for true general-purpose modeling, which serves…
▽ More
A big convergence of model architectures across language, vision, speech, and multimodal is emerging. However, under the same name "Transformers", the above areas use different implementations for better performance, e.g., Post-LayerNorm for BERT, and Pre-LayerNorm for GPT and vision Transformers. We call for the development of Foundation Transformer for true general-purpose modeling, which serves as a go-to architecture for various tasks and modalities with guaranteed training stability. In this work, we introduce a Transformer variant, named Magneto, to fulfill the goal. Specifically, we propose Sub-LayerNorm for good expressivity, and the initialization strategy theoretically derived from DeepNet for stable scaling up. Extensive experiments demonstrate its superior performance and better stability than the de facto Transformer variants designed for various applications, including language modeling (i.e., BERT, and GPT), machine translation, vision pretraining (i.e., BEiT), speech recognition, and multimodal pretraining (i.e., BEiT-3).
△ Less
Submitted 19 October, 2022; v1 submitted 12 October, 2022;
originally announced October 2022.
-
DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory (Extended Version)
Authors:
Sekwon Lee,
Soujanya Ponnapalli,
Sharad Singhal,
Marcos K. Aguilera,
Kimberly Keeton,
Vijay Chidambaram
Abstract:
We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe that previously proposed key-value stores for DPM had architectural limitations that prevent them from achieving all three goals simultaneously. Di…
▽ More
We present Dinomo, a novel key-value store for disaggregated persistent memory (DPM). Dinomo is the first key-value store for DPM that simultaneously achieves high common-case performance, scalability, and lightweight online reconfiguration. We observe that previously proposed key-value stores for DPM had architectural limitations that prevent them from achieving all three goals simultaneously. Dinomo uses a novel combination of techniques such as ownership partitioning, disaggregated adaptive caching, selective replication, and lock-free and log-free indexing to achieve these goals. Compared to a state-of-the-art DPM key-value store, Dinomo achieves at least 3.8x better throughput on various workloads at scale and higher scalability, while providing fast reconfiguration.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Authors:
Wenhui Wang,
Hangbo Bao,
Li Dong,
Johan Bjorck,
Zhiliang Peng,
Qiang Liu,
Kriti Aggarwal,
Owais Khan Mohammed,
Saksham Singhal,
Subhojit Som,
Furu Wei
Abstract:
A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and vision-language tasks. Specifically, we advance the big convergence from three aspects: backbone architecture, pretraining task, and model scaling up. We introduce Mult…
▽ More
A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and vision-language tasks. Specifically, we advance the big convergence from three aspects: backbone architecture, pretraining task, and model scaling up. We introduce Multiway Transformers for general-purpose modeling, where the modular architecture enables both deep fusion and modality-specific encoding. Based on the shared backbone, we perform masked "language" modeling on images (Imglish), texts (English), and image-text pairs ("parallel sentences") in a unified manner. Experimental results show that BEiT-3 obtains state-of-the-art performance on object detection (COCO), semantic segmentation (ADE20K), image classification (ImageNet), visual reasoning (NLVR2), visual question answering (VQAv2), image captioning (COCO), and cross-modal retrieval (Flickr30K, COCO).
△ Less
Submitted 30 August, 2022; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Investigating the impact of BTI, HCI and time-zero variability on neuromorphic spike event generation circuits
Authors:
Shaik Jani Babu,
Rohit Singh,
Siona Menezes Picardo,
Nilesh Goel,
Sonal Singhal
Abstract:
Neuromorphic computing refers to brain-inspired computers, that differentiate it from von Neumann architecture. Analog VLSI based neuromorphic circuits is a current research interest. Two simpler spiking integrate and fire neuron model namely axon-Hillock (AH) and voltage integrate, and fire (VIF) circuits are commonly used for generating spike events. This paper discusses the impact of reliabilit…
▽ More
Neuromorphic computing refers to brain-inspired computers, that differentiate it from von Neumann architecture. Analog VLSI based neuromorphic circuits is a current research interest. Two simpler spiking integrate and fire neuron model namely axon-Hillock (AH) and voltage integrate, and fire (VIF) circuits are commonly used for generating spike events. This paper discusses the impact of reliability issues like Bias Temperature instability (BTI) and Hot Carrier Injection (HCI), and timezero variability on these CMOS based neuromorphic circuits. AH and VIF circuits are implemented using HKMG based 45nm technology. For reliability analysis, industry standard Cadence RelXpert tool is used. For time-zero variability analysis, 1000 Monte-Carlo simulations are performed.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Design and Mathematical Modelling of Inter Spike Interval of Temporal Neuromorphic Encoder for Image Recognition
Authors:
Aadhitiya VS,
Jani Babu Shaik,
Sonal Singhal,
Siona Menezes Picardo,
Nilesh Goel
Abstract:
Neuromorphic computing systems emulate the electrophysiological behavior of the biological nervous system using mixed-mode analog or digital VLSI circuits. These systems show superior accuracy and power efficiency in carrying out cognitive tasks. The neural network architecture used in neuromorphic computing systems is spiking neural networks (SNNs) analogous to the biological nervous system. SNN…
▽ More
Neuromorphic computing systems emulate the electrophysiological behavior of the biological nervous system using mixed-mode analog or digital VLSI circuits. These systems show superior accuracy and power efficiency in carrying out cognitive tasks. The neural network architecture used in neuromorphic computing systems is spiking neural networks (SNNs) analogous to the biological nervous system. SNN operates on spike trains as a function of time. A neuromorphic encoder converts sensory data into spike trains. In this paper, a low-power neuromorphic encoder for image processing is implemented. A mathematical model between pixels of an image and the inter-spike intervals is also formulated. Wherein an exponential relationship between pixels and inter-spike intervals is obtained. Finally, the mathematical equation is validated with circuit simulation.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
On the Representation Collapse of Sparse Mixture of Experts
Authors:
Zewen Chi,
Li Dong,
Shaohan Huang,
Damai Dai,
Shuming Ma,
Barun Patra,
Saksham Singhal,
Payal Bajaj,
Xia Song,
Xian-Ling Mao,
Heyan Huang,
Furu Wei
Abstract:
Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we…
▽ More
Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we propose to estimate the routing scores between tokens and experts on a low-dimensional hypersphere. We conduct extensive experiments on cross-lingual language model pre-training and fine-tuning on downstream tasks. Experimental results across seven multilingual benchmarks show that our method achieves consistent gains. We also present a comprehensive analysis on the representation and routing behaviors of our models. Our method alleviates the representation collapse issue and achieves more consistent routing than the baseline mixture-of-experts methods.
△ Less
Submitted 12 October, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Authors:
Dharma Shukla,
Muthian Sivathanu,
Srinidhi Viswanatha,
Bhargav Gulavani,
Rimma Nehme,
Amey Agrawal,
Chen Chen,
Nipun Kwatra,
Ramachandran Ramjee,
Pankaj Sharma,
Atul Katiyar,
Vipul Modi,
Vaibhav Sharma,
Abhishek Singh,
Shreshth Singhal,
Kaustubh Welankar,
Lu Xun,
Ravi Anupindi,
Karthik Elangovan,
Hasibur Rahman,
Zhou Lin,
Rahul Seetharaman,
Cheng Xu,
Eddie Ailijiang,
Suresh Krishnappa
, et al. (1 additional authors not shown)
Abstract:
Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically sca…
▽ More
Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of AI accelerators (e.g., GPUs, FPGAs).
All jobs in Singularity are preemptable, migratable, and dynamically resizable (elastic) by default: a live job can be dynamically and transparently (a) preempted and migrated to a different set of nodes, cluster, data center or a region and resumed exactly from the point where the execution was preempted, and (b) resized (i.e., elastically scaled-up/down) on a varying set of accelerators of a given type. Our mechanisms are transparent in that they do not require the user to make any changes to their code or require using any custom libraries that may limit flexibility. Additionally, our approach significantly improves the reliability of deep learning workloads. We show that the resulting efficiency and reliability gains with Singularity are achieved with negligible impact on the steady-state performance. Finally, our design approach is agnostic of DNN architectures and handles a variety of parallelism strategies (e.g., data/pipeline/model parallelism).
△ Less
Submitted 21 February, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Discrete Simulation Optimization for Tuning Machine Learning Method Hyperparameters
Authors:
Varun Ramamohan,
Shobhit Singhal,
Aditya Raj Gupta,
Nomesh Bhojkumar Bolia
Abstract:
Machine learning (ML) methods are used in most technical areas such as image recognition, product recommendation, financial analysis, medical diagnosis, and predictive maintenance. An important aspect of implementing ML methods involves controlling the learning process for the ML method so as to maximize the performance of the method under consideration. Hyperparameter tuning is the process of sel…
▽ More
Machine learning (ML) methods are used in most technical areas such as image recognition, product recommendation, financial analysis, medical diagnosis, and predictive maintenance. An important aspect of implementing ML methods involves controlling the learning process for the ML method so as to maximize the performance of the method under consideration. Hyperparameter tuning is the process of selecting a suitable set of ML method parameters that control its learning process. In this work, we demonstrate the use of discrete simulation optimization methods such as ranking and selection (R&S) and random search for identifying a hyperparameter set that maximizes the performance of a ML method. Specifically, we use the KN R&S method and the stochastic ruler random search method and one of its variations for this purpose. We also construct the theoretical basis for applying the KN method, which determines the optimal solution with a statistical guarantee via solution space enumeration. In comparison, the stochastic ruler method asymptotically converges to global optima and incurs smaller computational overheads. We demonstrate the application of these methods to a wide variety of machine learning models, including deep neural network models used for time series prediction and image classification. We benchmark our application of these methods with state-of-the-art hyperparameter optimization libraries such as $hyperopt$ and $mango$. The KN method consistently outperforms $hyperopt$'s random search (RS) and Tree of Parzen Estimators (TPE) methods. The stochastic ruler method outperforms the $hyperopt$ RS method and offers statistically comparable performance with respect to $hyperopt$'s TPE method and the $mango$ algorithm.
△ Less
Submitted 20 June, 2023; v1 submitted 16 January, 2022;
originally announced January 2022.
-
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Authors:
Sai Rajeswar,
Pau Rodriguez,
Soumye Singhal,
David Vazquez,
Aaron Courville
Abstract:
Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that datasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data.…
▽ More
Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that datasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Authors:
Jian Yang,
Shuming Ma,
Haoyang Huang,
Dongdong Zhang,
Li Dong,
Shaohan Huang,
Alexandre Muzio,
Saksham Singhal,
Hany Hassan Awadalla,
Xia Song,
Furu Wei
Abstract:
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation. We participated in all three evaluation tracks including Large Track and two Small Tracks where the former one is unconstrained and the latter two are fully constrained. Our model submissions to the shared task were initialized with DeltaLM\footnote{\url{https://…
▽ More
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation. We participated in all three evaluation tracks including Large Track and two Small Tracks where the former one is unconstrained and the latter two are fully constrained. Our model submissions to the shared task were initialized with DeltaLM\footnote{\url{https://aka.ms/deltalm}}, a generic pre-trained multilingual encoder-decoder model, and fine-tuned correspondingly with the vast collected parallel data and allowed data sources according to track settings, together with applying progressive learning and iterative back-translation approaches to further improve the performance. Our final submissions ranked first on three tracks in terms of the automatic evaluation metric.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Coalition Formation in Constant Sum Queueing Games
Authors:
Shiksha Singhal,
Veeraruna Kavitha,
Jayakrishnan Nair
Abstract:
We analyse a coalition formation game between strategic service providers of a congestible service. The key novelty of our formulation is that it is a constant sum game, i.e., the total payoff across all service providers (or coalitions of providers) is fixed, and dictated by the total size of the market. The game thus captures the tension between resource pooling (to benefit from the resulting st…
▽ More
We analyse a coalition formation game between strategic service providers of a congestible service. The key novelty of our formulation is that it is a constant sum game, i.e., the total payoff across all service providers (or coalitions of providers) is fixed, and dictated by the total size of the market. The game thus captures the tension between resource pooling (to benefit from the resulting statistical economies of scale) and competition between coalitions over market share. In a departure from the prior literature on resource pooling for congestible services, we show that the grand coalition is in general not stable, once we allow for competition over market share. Instead, the stable configurations are duopolies, where the dominant coalition exploits its economies of scale to corner a disproportionate market share. We analyse the stable duopolies that emerge from this interaction, and also study a dynamic variant of this game.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
Authors:
Bo Zheng,
Li Dong,
Shaohan Huang,
Saksham Singhal,
Wanxiang Che,
Ting Liu,
Xia Song,
Furu Wei
Abstract:
Compared to monolingual models, cross-lingual models usually require a more expressive vocabulary to represent all languages adequately. We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity. To this end, we propose an algorithm VoCap to determine the desired vocabulary capacity of each language. However, increasing the voc…
▽ More
Compared to monolingual models, cross-lingual models usually require a more expressive vocabulary to represent all languages adequately. We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity. To this end, we propose an algorithm VoCap to determine the desired vocabulary capacity of each language. However, increasing the vocabulary size significantly slows down the pre-training speed. In order to address the issues, we propose k-NN-based target sampling to accelerate the expensive softmax. Our experiments show that the multilingual vocabulary learned with VoCap benefits cross-lingual language model pre-training. Moreover, k-NN-based target sampling mitigates the side-effects of increasing the vocabulary size while achieving comparable performance and faster pre-training speed. The code and the pretrained multilingual vocabularies are available at https://github.com/bozheng-hit/VoCapXLM.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
MODC: Resilience for disaggregated memory architectures using task-based programming
Authors:
Kimberly Keeton,
Sharad Singhal,
Haris Volos,
Yupu Zhang,
Ramesh Chandra Chaurasiya,
Clarete Riana Crasta,
Sherin T George,
Nagaraju K N,
Mashood Abdulla K,
Kavitha Natarajan,
Porno Shome,
Sanish Suresh
Abstract:
Disaggregated memory architectures provide benefits to applications beyond traditional scale out environments, such as independent scaling of compute and memory resources. They also provide an independent failure model, where computations or the compute nodes they run on may fail independently of the disaggregated memory; thus, data that's resident in the disaggregated memory is unaffected by the…
▽ More
Disaggregated memory architectures provide benefits to applications beyond traditional scale out environments, such as independent scaling of compute and memory resources. They also provide an independent failure model, where computations or the compute nodes they run on may fail independently of the disaggregated memory; thus, data that's resident in the disaggregated memory is unaffected by the compute failure. Blind application of traditional techniques for resilience (e.g., checkpoints or data replication) does not take advantage of these architectures. To demonstrate the potential benefit of these architectures for resilience, we develop Memory-Oriented Distributed Computing (MODC), a framework for programming disaggregated architectures that borrows and adapts ideas from task-based programming models, concurrent programming techniques, and lock-free data structures. This framework includes a task-based application programming model and a runtime system that provides scheduling, coordination, and fault tolerance mechanisms. We present highlights of our MODC prototype and experimental results demonstrating that MODC-style resilience outperforms a checkpoint-based approach in the face of failures.
△ Less
Submitted 11 September, 2021;
originally announced September 2021.
-
Desk Organization: Effect of Multimodal Inputs on Spatial Relational Learning
Authors:
Ryan Rowe,
Shivam Singhal,
Daqing Yi,
Tapomayukh Bhattacharjee,
Siddhartha S. Srinivasa
Abstract:
For robots to operate in a three dimensional world and interact with humans, learning spatial relationships among objects in the surrounding is necessary. Reasoning about the state of the world requires inputs from many different sensory modalities including vision ($V$) and haptics ($H$). We examine the problem of desk organization: learning how humans spatially position different objects on a pl…
▽ More
For robots to operate in a three dimensional world and interact with humans, learning spatial relationships among objects in the surrounding is necessary. Reasoning about the state of the world requires inputs from many different sensory modalities including vision ($V$) and haptics ($H$). We examine the problem of desk organization: learning how humans spatially position different objects on a planar surface according to organizational ''preference''. We model this problem by examining how humans position objects given multiple features received from vision and haptic modalities. However, organizational habits vary greatly between people both in structure and adherence. To deal with user organizational preferences, we add an additional modality, ''utility'' ($U$), which informs on a particular human's perceived usefulness of a given object. Models were trained as generalized (over many different people) or tailored (per person). We use two types of models: random forests, which focus on precise multi-task classification, and Markov logic networks, which provide an easily interpretable insight into organizational habits. The models were applied to both synthetic data, which proved to be learnable when using fixed organizational constraints, and human-study data, on which the random forest achieved over 90% accuracy. Over all combinations of $\{H, U, V\}$ modalities, $UV$ and $HUV$ were the most informative for organization. In a follow-up study, we gauged participants preference of desk organizations by a generalized random forest organization vs. by a random model. On average, participants rated the random forest models as 4.15 on a 5-point Likert scale compared to 1.84 for the random model
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Authors:
Zewen Chi,
Shaohan Huang,
Li Dong,
Shuming Ma,
Bo Zheng,
Saksham Singhal,
Payal Bajaj,
Xia Song,
Xian-Ling Mao,
Heyan Huang,
Furu Wei
Abstract:
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. Specifically, we present two pre-training tasks, namely multilingual replaced token detection, and translation replaced token detection. Besides, we pretrain the model, named as XLM-E, on both multilingual and parallel corpora. Our model outperforms the baseline models on various cross-lingual understandi…
▽ More
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. Specifically, we present two pre-training tasks, namely multilingual replaced token detection, and translation replaced token detection. Besides, we pretrain the model, named as XLM-E, on both multilingual and parallel corpora. Our model outperforms the baseline models on various cross-lingual understanding tasks with much less computation cost. Moreover, analysis shows that XLM-E tends to obtain better cross-lingual transferability.
△ Less
Submitted 19 April, 2022; v1 submitted 30 June, 2021;
originally announced June 2021.
-
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Authors:
Shuming Ma,
Li Dong,
Shaohan Huang,
Dongdong Zhang,
Alexandre Muzio,
Saksham Singhal,
Hany Hassan Awadalla,
Xia Song,
Furu Wei
Abstract:
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model…
▽ More
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks, including machine translation, abstractive text summarization, data-to-text, and question generation. The code and pretrained models are available at \url{https://aka.ms/deltalm}.
△ Less
Submitted 17 August, 2021; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Consistency Regularization for Cross-Lingual Fine-Tuning
Authors:
Bo Zheng,
Li Dong,
Shaohan Huang,
Wenhui Wang,
Zewen Chi,
Saksham Singhal,
Wanxiang Che,
Ting Liu,
Xia Song,
Furu Wei
Abstract:
Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others. In this work, we propose to improve cross-lingual fine-tuning with consistency regularization. Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations, i.e., subword sampling, Gaussian noise, code-sw…
▽ More
Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others. In this work, we propose to improve cross-lingual fine-tuning with consistency regularization. Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations, i.e., subword sampling, Gaussian noise, code-switch substitution, and machine translation. In addition, we employ model consistency to regularize the models trained with two augmented versions of the same training set. Experimental results on the XTREME benchmark show that our method significantly improves cross-lingual fine-tuning across various tasks, including text classification, question answering, and sequence labeling.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
Towards Designing Computer Vision-based Explainable-AI Solution: A Use Case of Livestock Mart Industry
Authors:
Devam Dave,
Het Naik,
Smiti Singhal,
Rudresh Dwivedi,
Pankesh Patel
Abstract:
The objective of an online Mart is to match buyers and sellers, to weigh animals and to oversee their sale. A reliable pricing method can be developed by ML models that can read through historical sales data. However, when AI models suggest or recommend a price, that in itself does not reveal too much (i.e., it acts like a black box) about the qualities and the abilities of an animal. An intereste…
▽ More
The objective of an online Mart is to match buyers and sellers, to weigh animals and to oversee their sale. A reliable pricing method can be developed by ML models that can read through historical sales data. However, when AI models suggest or recommend a price, that in itself does not reveal too much (i.e., it acts like a black box) about the qualities and the abilities of an animal. An interested buyer would like to know more about the salient features of an animal before making the right choice based on his requirements. A model capable of explaining the different factors that impact the price point is essential for the needs of the market. It can also inspire confidence in buyers and sellers about the price point offered. To achieve these objectives, we have been working with the team at MartEye, a startup based in Portershed in Galway City, Ireland. Through this paper, we report our work-in-progress research towards building a smart video analytic platform, leveraging Explainable AI techniques.
△ Less
Submitted 8 February, 2021;
originally announced March 2021.
-
Factorization of Fact-Checks for Low Resource Indian Languages
Authors:
Shivangi Singhal,
Rajiv Ratn Shah,
Ponnurangam Kumaraguru
Abstract:
The advancement in technology and accessibility of internet to each individual is revolutionizing the real time information. The liberty to express your thoughts without passing through any credibility check is leading to dissemination of fake content in the ecosystem. It can have disastrous effects on both individuals and society as a whole. The amplification of fake news is becoming rampant in I…
▽ More
The advancement in technology and accessibility of internet to each individual is revolutionizing the real time information. The liberty to express your thoughts without passing through any credibility check is leading to dissemination of fake content in the ecosystem. It can have disastrous effects on both individuals and society as a whole. The amplification of fake news is becoming rampant in India too. Debunked information often gets republished with a replacement description, claiming it to depict some different incidence. To curb such fabricated stories, it is necessary to investigate such deduplicates and false claims made in public. The majority of studies on automatic fact-checking and fake news detection is restricted to English only. But for a country like India where only 10% of the literate population speak English, role of regional languages in spreading falsity cannot be undermined. In this paper, we introduce FactDRIL: the first large scale multilingual Fact-checking Dataset for Regional Indian Languages. We collect an exhaustive dataset across 7 months covering 11 low-resource languages. Our propose dataset consists of 9,058 samples belonging to English, 5,155 samples to Hindi and remaining 8,222 samples are distributed across various regional languages, i.e. Bangla, Marathi, Malayalam, Telugu, Tamil, Oriya, Assamese, Punjabi, Urdu, Sinhala and Burmese. We also present the detailed characterization of three M's (multi-lingual, multi-media, multi-domain) in the FactDRIL accompanied with the complete list of other varied attributes making it a unique dataset to study. Lastly, we present some potential use cases of the dataset. We expect this dataset will be a valuable resource and serve as a starting point to fight proliferation of fake news in low resource languages.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Authors:
Shuming Ma,
Jian Yang,
Haoyang Huang,
Zewen Chi,
Li Dong,
Dongdong Zhang,
Hany Hassan Awadalla,
Alexandre Muzio,
Akiko Eriguchi,
Saksham Singhal,
Xia Song,
Arul Menezes,
Furu Wei
Abstract:
Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fi…
▽ More
Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fine-tunes it with multilingual parallel data. This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs. Surprisingly, the method is also effective even upon the strong baseline with back-translation. Moreover, extensive analysis of XLM-T on unsupervised syntactic parsing, word alignment, and multilingual classification explains its effectiveness for machine translation. The code will be at https://aka.ms/xlm-t.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
Graph500 from OCaml-Multicore Perspective
Authors:
Shubhendra Pal Singhal
Abstract:
OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml was developed for solving numerical and scientific problems involving large scale data-intensive operations and one such classic application set is Graph Algorithms, which are a core part of most analytics workloads. In this paper, we aim to implement the graph benchmarks along with th…
▽ More
OCaml is an industrial-strength, multi-paradigm programming language, widely used in industry and academia. OCaml was developed for solving numerical and scientific problems involving large scale data-intensive operations and one such classic application set is Graph Algorithms, which are a core part of most analytics workloads. In this paper, we aim to implement the graph benchmarks along with the performance analysis. Graph500 is one such serious benchmark which aims at developing data intensive applications requiring extreme computational power. We try to implement Graph Construction, BFS, Shortest-Path problems using the desired specifications and rules posed by graph500. This paper aims at providing a clear direction of choices of several data structures used, algorithms developed and pose a reason behind every step of program. The first few sections of the paper discusses a formal approach to the problem with a small guide for starters in OCaml. The latter sections describe the algorithms in detail with the possibilities of future exploration and several mistakes which we committed or encountered whilst approaching the solution. All performance metrics were tested on Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz 24 core machine. Every section talks about the initial performance failures encountered, which will help analyse and prioritise our preferred implementation from a performance perspective.
△ Less
Submitted 25 December, 2020;
originally announced December 2020.
-
Cooperative Ressource Sharing With Adamant Player
Authors:
Shiksha Singhal,
Veeraruna Kavitha
Abstract:
Cooperative game theory deals with systems where players want to cooperate to improve their payoffs. But players may choose coalitions in a non-cooperative manner, leading to a coalition-formation game. We consider such a game with several players (willing to cooperate) and an adamant player (unwilling to cooperate) involved in resource-sharing. Here, the strategy of a player is the set of players…
▽ More
Cooperative game theory deals with systems where players want to cooperate to improve their payoffs. But players may choose coalitions in a non-cooperative manner, leading to a coalition-formation game. We consider such a game with several players (willing to cooperate) and an adamant player (unwilling to cooperate) involved in resource-sharing. Here, the strategy of a player is the set of players with whom it wants to form a coalition. Given a strategy profile, an appropriate partition of coalitions is formed; players in each coalition maximize their collective utilities leading to a non-cooperative resource-sharing game among the coalitions, the utilities at the resulting equilibrium are shared via Shapley-value; these shares define the utilities of players for the given strategy profile in coalition-formation game. We also consider the utilitarian solution to derive the price of anarchy (PoA). We considered a case with symmetric players and an adamant player; wherein we observed that players prefer to stay alone at Nash equilibrium when the number of players (n) is more than 4. In contrast, in the majority of the cases, the utilitarian partition is grand coalition. Interestingly the PoA is smaller with an adamant player of intermediate strength. Further, PoA grows like O(n).
△ Less
Submitted 5 December, 2020;
originally announced December 2020.
-
Explainable AI meets Healthcare: A Study on Heart Disease Dataset
Authors:
Devam Dave,
Het Naik,
Smiti Singhal,
Pankesh Patel
Abstract:
With the increasing availability of structured and unstructured data and the swift progress of analytical techniques, Artificial Intelligence (AI) is bringing a revolution to the healthcare industry. With the increasingly indispensable role of AI in healthcare, there are growing concerns over the lack of transparency and explainability in addition to potential bias encountered by predictions of th…
▽ More
With the increasing availability of structured and unstructured data and the swift progress of analytical techniques, Artificial Intelligence (AI) is bringing a revolution to the healthcare industry. With the increasingly indispensable role of AI in healthcare, there are growing concerns over the lack of transparency and explainability in addition to potential bias encountered by predictions of the model. This is where Explainable Artificial Intelligence (XAI) comes into the picture. XAI increases the trust placed in an AI system by medical practitioners as well as AI researchers, and thus, eventually, leads to an increasingly widespread deployment of AI in healthcare.
In this paper, we present different interpretability techniques. The aim is to enlighten practitioners on the understandability and interpretability of explainable AI systems using a variety of techniques available which can be very advantageous in the health-care domain. Medical diagnosis model is responsible for human life and we need to be confident enough to treat a patient as instructed by a black-box model. Our paper contains examples based on the heart disease dataset and elucidates on how the explainability techniques should be preferred to create trustworthiness while using AI systems in healthcare.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
Supervised Seeded Iterated Learning for Interactive Language Learning
Authors:
Yuchen Lu,
Soumye Singhal,
Florian Strub,
Olivier Pietquin,
Aaron Courville
Abstract:
Language drift has been one of the major obstacles to train language models through interaction. When word-based conversational agents are trained towards completing a task, they tend to invent their language rather than leveraging natural language. In recent literature, two general methods partially counter this phenomenon: Supervised Selfplay (S2P) and Seeded Iterated Learning (SIL). While S2P j…
▽ More
Language drift has been one of the major obstacles to train language models through interaction. When word-based conversational agents are trained towards completing a task, they tend to invent their language rather than leveraging natural language. In recent literature, two general methods partially counter this phenomenon: Supervised Selfplay (S2P) and Seeded Iterated Learning (SIL). While S2P jointly trains interactive and supervised losses to counter the drift, SIL changes the training dynamics to prevent language drift from occurring. In this paper, we first highlight their respective weaknesses, i.e., late-stage training collapses and higher negative likelihood when evaluated on human corpus. Given these observations, we introduce Supervised Seeded Iterated Learning to combine both methods to minimize their respective weaknesses. We then show the effectiveness of \algo in the language-drift translation game.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training
Authors:
Zewen Chi,
Li Dong,
Furu Wei,
Nan Yang,
Saksham Singhal,
Wenhui Wang,
Xia Song,
Xian-Ling Mao,
Heyan Huang,
Ming Zhou
Abstract:
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on co…
▽ More
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https://aka.ms/infoxlm.
△ Less
Submitted 7 April, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Countering Language Drift with Seeded Iterated Learning
Authors:
Yuchen Lu,
Soumye Singhal,
Florian Strub,
Olivier Pietquin,
Aaron Courville
Abstract:
Pretraining on human corpus and then finetuning in a simulator has become a standard pipeline for training a goal-oriented dialogue agent. Nevertheless, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we pro…
▽ More
Pretraining on human corpus and then finetuning in a simulator has become a standard pipeline for training a goal-oriented dialogue agent. Nevertheless, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we propose a generic approach to counter language drift called Seeded iterated learning (SIL). We periodically refine a pretrained student agent by imitating data sampled from a newly generated teacher agent. At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion. SIL does not require external syntactic constraint nor semantic knowledge, making it a valuable task-agnostic finetuning protocol. We evaluate SIL in a toy-setting Lewis Game, and then scale it up to the translation game with natural language. In both settings, SIL helps counter language drift as well as it improves the task completion compared to baselines.
△ Less
Submitted 24 August, 2020; v1 submitted 27 March, 2020;
originally announced March 2020.
-
Interpretability of Blackbox Machine Learning Models through Dataview Extraction and Shadow Model creation
Authors:
Rupam Patir,
Shubham Singhal,
C. Anantaram,
Vikram Goyal
Abstract:
Deep learning models trained using massive amounts of data tend to capture one view of the data and its associated mapping. Different deep learning models built on the same training data may capture different views of the data based on the underlying techniques used. For explaining the decisions arrived by blackbox deep learning models, we argue that it is essential to reproduce that model's view…
▽ More
Deep learning models trained using massive amounts of data tend to capture one view of the data and its associated mapping. Different deep learning models built on the same training data may capture different views of the data based on the underlying techniques used. For explaining the decisions arrived by blackbox deep learning models, we argue that it is essential to reproduce that model's view of the training data faithfully. This faithful reproduction can then be used for explanation generation. We investigate two methods for data view extraction: hill-climbing approach and a GAN-driven approach. We then use this synthesized data for creating shadow models for explanation generation: Decision-Tree model and Formal Concept Analysis based model. We evaluate these approaches on a Blackbox model trained on public datasets and show its usefulness in explanation generation.
△ Less
Submitted 2 February, 2020;
originally announced February 2020.
-
Jointly Trained Image and Video Generation using Residual Vectors
Authors:
Yatin Dandi,
Aniket Das,
Soumye Singhal,
Vinay P. Namboodiri,
Piyush Rai
Abstract:
In this work, we propose a modeling technique for jointly training image and video generation models by simultaneously learning to map latent variables with a fixed prior onto real images and interpolate over images to generate videos. The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire vid…
▽ More
In this work, we propose a modeling technique for jointly training image and video generation models by simultaneously learning to map latent variables with a fixed prior onto real images and interpolate over images to generate videos. The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire video. We utilize the technique to jointly train an image generation model with a fixed prior along with a video generation model lacking constraints such as disentanglement. The joint training enables the image generator to exploit temporal information while the video generation model learns to flexibly share information across frames. Moreover, experimental results verify our approach's compatibility with pre-training on videos or images and training on datasets containing a mixture of both. A comprehensive set of quantitative and qualitative evaluations reveal the improvements in sample quality and diversity over both video generation and image generation baselines. We further demonstrate the technique's capabilities of exploiting similarity in features across frames by applying it to a model based on decomposing the video into motion and content. The proposed model allows minor variations in content across frames while maintaining the temporal dependence through latent vectors encoding the pose or motion features.
△ Less
Submitted 17 December, 2019;
originally announced December 2019.
-
Profiling minisat based on user defined execution time -- GPROF
Authors:
Shubhendra Pal Singhal,
Sandeep Gupta,
Pierluigi Nuzzo
Abstract:
This paper focuses on the explanation of the architecture of profilers particularly gprof and how to profile a program according to the user defined input of execution time . Gprof is a profiler available open source in the package of binutils. Gprof records the flow of the program including the callee and caller information and their respective execution time. This information is represented in t…
▽ More
This paper focuses on the explanation of the architecture of profilers particularly gprof and how to profile a program according to the user defined input of execution time . Gprof is a profiler available open source in the package of binutils. Gprof records the flow of the program including the callee and caller information and their respective execution time. This information is represented in the form of a call graph. Profilers at the time of execution creates a call graph file which indicates the full flow of the program including the individual execution time as well. This paper aims at providing a better understanding of the data structure used to store the information and how is a profiler(gprof) actually using this data structure to give user a readable format. The next section of this paper solves one of the limitation of gprof i.e. edit the time of block of code without understanding the call graph. Any changes in the execution time of a particular block of code would affect the total execution time. So if we edit the gprof in such a way that its consistent and platform independent, then it can yield various results like testing execution time after parallelism, before even designing it by replacing the values with theoretical/emulated ones and see if the total execution time is getting reduced by a desired number or not? Gprof edit can help us figure out that what section of code can be parallelized or which part of code is taking the most time and which call or part can be changed to reduce the execution time. The last section of the paper walks through the application of gprof in minisat and how gprof helps in the hardware acceleration in minisat by suggesting which part to be parallelised and how does it affect the total percentage.
△ Less
Submitted 28 September, 2019;
originally announced September 2019.
-
Is change the only constant? Profile change perspective on #LokSabhaElections2019
Authors:
Kumari Neha,
Shashank Srikanth,
Sonali Singhal,
Shwetanshu Singh,
Arun Balaji Buduru,
Ponnurangam Kumaraguru
Abstract:
Users on Twitter are identified with the help of their profile attributes that consists of username, display name, profile image, to name a few. The profile attributes that users adopt can reflect their interests, belief, or thematic inclinations. Literature has proposed the implications and significance of profile attribute change for a random population of users. However, the use of profile attr…
▽ More
Users on Twitter are identified with the help of their profile attributes that consists of username, display name, profile image, to name a few. The profile attributes that users adopt can reflect their interests, belief, or thematic inclinations. Literature has proposed the implications and significance of profile attribute change for a random population of users. However, the use of profile attribute for endorsements and to start a movement have been under-explored. In this work, we consider #LokSabhaElections2019 as a movement and perform a large-scale study of the profile of users who actively made changes to profile attributes centered around #LokSabhaElections2019. We collect the profile metadata for 49.4M users for a period of 2 months from April 5, 2019 to June 5, 2019 amid #LokSabhaElections2019. We investigate how the profile changes vary for the influential leaders and their followers over the social movement. We further differentiate the organic and inorganic ways to show the political inclination from the prism of profile changes. We report how the addition of election campaign related keywords lead to spread of behavior contagion and further investigate it with respect to "Chowkidar Movement" in detail.
△ Less
Submitted 22 September, 2019;
originally announced September 2019.
-
Comparative study of performance of parallel Alpha Beta Pruning for different architectures
Authors:
Shubhendra Pal Singhal,
M. Sridevi
Abstract:
Optimization of searching the best possible action depending on various states like state of environment, system goal etc. has been a major area of study in computer systems. In any search algorithm, searching best possible solution from the pool of every possibility known can lead to the construction of the whole state search space popularly called as minimax algorithm. This may lead to a impract…
▽ More
Optimization of searching the best possible action depending on various states like state of environment, system goal etc. has been a major area of study in computer systems. In any search algorithm, searching best possible solution from the pool of every possibility known can lead to the construction of the whole state search space popularly called as minimax algorithm. This may lead to a impractical time complexities which may not be suitable for real time searching operations. One of the practical solution for the reduction in computational time is Alpha Beta pruning. Instead of searching for the whole state space, we prune the unnecessary branches, which helps reduce the time by significant amount. This paper focuses on the various possible implementations of the Alpha Beta pruning algorithms and gives an insight of what algorithm can be used for parallelism. Various studies have been conducted on how to make Alpha Beta pruning faster. Parallelizing Alpha Beta pruning for the GPUs specific architectures like mesh(CUDA) etc. or shared memory model(OpenMP) helps in the reduction of the computational time. This paper studies the comparison between sequential and different parallel forms of Alpha Beta pruning and their respective efficiency for the chess game as an application.
△ Less
Submitted 29 October, 2019; v1 submitted 30 August, 2019;
originally announced August 2019.
-
Porting of eChronos RTOS on RISC-V Architecture
Authors:
Shubhendra Pal Singhal,
M. Sridevi,
N Sathya Narayanan,
M J Shankar Raman
Abstract:
eChronos is a formally verified Real Time Operating System(RTOS) designed for embedded micro-controllers. eChronos was targeted for tightly constrained devices without memory management units. Currently, eChronos is available on proprietary designs like ARM, PowerPC and Intel architectures. eChronos is adopted in safety critical systems like aircraft control system and medical implant devices. eCh…
▽ More
eChronos is a formally verified Real Time Operating System(RTOS) designed for embedded micro-controllers. eChronos was targeted for tightly constrained devices without memory management units. Currently, eChronos is available on proprietary designs like ARM, PowerPC and Intel architectures. eChronos is adopted in safety critical systems like aircraft control system and medical implant devices. eChronos is one of the very few system software not been ported to RISC-V. RISC-V is an open-source Instruction Set Architecture (ISA) that enables new era of processor development. Many standard Operating Systems, software tool chain have migrated to the RISC-V architecture. According to the latest trends, RISC-V is replacing many proprietary chips. As a secure RTOS, it is attractive to port on an open-source ISA. SHAKTI and PicoRV32 are some of the proven open-source RISC-V designs available. Now having a secure RTOS on an open-source hardware design, designed based on an open-source ISA makes it more interesting. In addition to this, the current architectures supported by eChronos are all proprietary designs, and porting eChronos to the RISC-V architecture increases the secure system development as a whole. This paper, presents an idea of porting eChronos on a chip which is open-source and effective, thus reducing the cost of embedded systems. Designing a open-source system that is completely open-source reduces the overall cost, increased the security and can be critically reviewed. This paper explores the design and architecture aspect involved in porting eChronos to RISC-V. The authors have successfully ported eChronos to RISC-V architecture and verified it on spike. The port of RISC-V to eChronos is made available open-source by authors. Along with that, the safe removal of architectural dependencies and subsequent changes in eChronos are also analyzed.
△ Less
Submitted 26 December, 2019; v1 submitted 30 August, 2019;
originally announced August 2019.
-
Reputation Systems -- Fair allocation of points to the editors in the collaborative community
Authors:
Shubhendra Pal Singhal
Abstract:
In this paper we are trying to determine a scheme for the fair allocation of points to the contributors of the collaborative community. The major problem of fair allocation of points among the contributors is that we have to analyze the improvement in the versions of an article. Lets say there is a contribution of major change in content which is relevant vs the contribution of adding a single com…
▽ More
In this paper we are trying to determine a scheme for the fair allocation of points to the contributors of the collaborative community. The major problem of fair allocation of points among the contributors is that we have to analyze the improvement in the versions of an article. Lets say there is a contribution of major change in content which is relevant vs the contribution of adding a single comma. Every contributor cannot be given the same points in such a case. There are many ways which can be used like number of changes in a new version. That might seem relevant but it becomes irrelevant in terms of correct content contribution and other significant changes. There is no AI system too which can detect such a change and award the points accordingly. So this problem of allocation of points to the contributors is presented by an algorithm with a theoretical proof. It relies on the interactive interaction of the users in the system which is trivial in case of big system design economies.
△ Less
Submitted 28 June, 2019; v1 submitted 17 June, 2019;
originally announced June 2019.
-
Recall Traces: Backtracking Models for Efficient Reinforcement Learning
Authors:
Anirudh Goyal,
Philemon Brakel,
William Fedus,
Soumye Singhal,
Timothy Lillicrap,
Sergey Levine,
Hugo Larochelle,
Yoshua Bengio
Abstract:
In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate a…
▽ More
In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.
△ Less
Submitted 28 January, 2019; v1 submitted 1 April, 2018;
originally announced April 2018.
-
Quantitative Assessment of TV White Space in India
Authors:
Gaurang Naik,
Sudesh Singhal,
Animesh Kumar,
Abhay Karandikar
Abstract:
Licensed but unutilized television (TV) band spectrum is called as TV white space in the literature. Ultra high frequency (UHF) TV band spectrum has very good wireless radio propagation characteristics. The amount of TV white space in the UHF TV band in India is of interest. Comprehensive quantitative assessment and estimates for the TV white space in the 470-590MHz band for four zones of India (a…
▽ More
Licensed but unutilized television (TV) band spectrum is called as TV white space in the literature. Ultra high frequency (UHF) TV band spectrum has very good wireless radio propagation characteristics. The amount of TV white space in the UHF TV band in India is of interest. Comprehensive quantitative assessment and estimates for the TV white space in the 470-590MHz band for four zones of India (all except north) are presented in this work. This is the first effort in India to estimate TV white spaces in a comprehensive manner. The average available TV white space per unit area in these four zones is calculated using two methods: (i) the primary (licensed) user and secondary (unlicensed) user point of view; and, (ii) the regulations of Federal Communications Commission in the United States. By both methods, the average available TV white space in the UHF TV band is shown to be more than 100MHz! A TV transmitter frequency-reassignment algorithm is also described. Based on spatial-reuse ideas, a TV channel allocation scheme is presented which results in insignicant interference to the TV receivers while using the least number of TV channels for transmission across the four zones. Based on this reassignment, it is found that four TV band channels (or 32MHz) are sufficient to provide the existing UHF TV band coverage in India.
△ Less
Submitted 31 October, 2013;
originally announced October 2013.