-
Facial Demorphing via Identity Preserving Image Decomposition
Authors:
Nitish Shukla,
Arun Ross
Abstract:
A face morph is created by combining the face images usually pertaining to two distinct identities. The goal is to generate an image that can be matched with two identities thereby undermining the security of a face recognition system. To deal with this problem, several morph attack detection techniques have been developed. But these methods do not extract any information about the underlying bona…
▽ More
A face morph is created by combining the face images usually pertaining to two distinct identities. The goal is to generate an image that can be matched with two identities thereby undermining the security of a face recognition system. To deal with this problem, several morph attack detection techniques have been developed. But these methods do not extract any information about the underlying bonafides used to create them. Demorphing addresses this limitation. However, current demorphing techniques are mostly reference-based, i.e, they need an image of one of the identities to recover the other. In this work, we treat demorphing as an ill-posed decomposition problem. We propose a novel method that is reference-free and recovers the bonafides with high accuracy. Our method decomposes the morph into several identity-preserving feature components. A merger network then weighs and combines these components to recover the bonafides. Our method is observed to reconstruct high-quality bonafides in terms of definition and fidelity. Experiments on the CASIA-WebFace, SMDD and AMSL datasets demonstrate the effectiveness of our method.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Time series forecasting with high stakes: A field study of the air cargo industry
Authors:
Abhinav Garg,
Naman Shukla,
Maarten Wormer
Abstract:
Time series forecasting in the air cargo industry presents unique challenges due to volatile market dynamics and the significant impact of accurate forecasts on generated revenue. This paper explores a comprehensive approach to demand forecasting at the origin-destination (O\&D) level, focusing on the development and implementation of machine learning models in decision-making for the air cargo in…
▽ More
Time series forecasting in the air cargo industry presents unique challenges due to volatile market dynamics and the significant impact of accurate forecasts on generated revenue. This paper explores a comprehensive approach to demand forecasting at the origin-destination (O\&D) level, focusing on the development and implementation of machine learning models in decision-making for the air cargo industry. We leverage a mixture of experts framework, combining statistical and advanced deep learning models to provide reliable forecasts for cargo demand over a six-month horizon. The results demonstrate that our approach outperforms industry benchmarks, offering actionable insights for cargo capacity allocation and strategic decision-making in the air cargo industry. While this work is applied in the airline industry, the methodology is broadly applicable to any field where forecast-based decision-making in a volatile environment is crucial.
△ Less
Submitted 13 August, 2024; v1 submitted 29 July, 2024;
originally announced July 2024.
-
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Authors:
Kanchana Ranasinghe,
Satya Narayan Shukla,
Omid Poursaeed,
Michael S. Ryoo,
Tsung-Yu Lin
Abstract:
Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA). However, existing V-LLMs (e.g. BLIP-2, LLaVA) demonstrate weak spatial reasoning and localization awareness. Despite generating highly descriptive and elaborate textual answers, these…
▽ More
Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA). However, existing V-LLMs (e.g. BLIP-2, LLaVA) demonstrate weak spatial reasoning and localization awareness. Despite generating highly descriptive and elaborate textual answers, these models fail at simple tasks like distinguishing a left vs right location. In this work, we explore how image-space coordinate based instruction fine-tuning objectives could inject spatial awareness into V-LLMs. We discover optimal coordinate representations, data-efficient instruction fine-tuning objectives, and pseudo-data generation strategies that lead to improved spatial awareness in V-LLMs. Additionally, our resulting model improves VQA across image and video domains, reduces undesired hallucination, and generates better contextual object descriptions. Experiments across 5 vision-language tasks involving 14 different datasets establish the clear performance improvements achieved by our proposed framework.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Designing a K-state P-bit Engine
Authors:
Mohammad Khairul Bashar,
Abir Hasan,
Nikhil Shukla
Abstract:
Probabilistic bit (p-bit)-based compute engines utilize the unique capability of a p-bit to probabilistically switch between two states to solve computationally challenging problems. However, when solving problems that require more than two states (e.g., problems such as Max-3-Cut, verifying if a graph is K-partite (K>2) etc.), additional pre-processing steps such as graph reduction are required t…
▽ More
Probabilistic bit (p-bit)-based compute engines utilize the unique capability of a p-bit to probabilistically switch between two states to solve computationally challenging problems. However, when solving problems that require more than two states (e.g., problems such as Max-3-Cut, verifying if a graph is K-partite (K>2) etc.), additional pre-processing steps such as graph reduction are required to make the problem compatible with a two-state p-bit platform. Moreover, this not only increases the problem size by entailing the use of auxiliary variables but can also degrade the solution quality. In this work, we develop a unique framework for implementing a K-state (K>2) p-bit engine. Furthermore, from an implementation standpoint, we show that such a K-state p-bit engine can be implemented using N traditional (2-state) p-bits, and one multi-state p-bit -- a novel concept proposed here. Augmenting traditional p-bit platforms, our approach enables us to solve an archetypal combinatoric problem class requiring multiple states, namely Max-K-Cut (K=3, 4 shown here), without using any additional auxiliary variables. Thus, our work fundamentally advances the functional capability of p-bit engines, enabling them to solve a broader class of computationally challenging problems more efficiently.
△ Less
Submitted 27 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Universal Pyramid Adversarial Training for Improved ViT Performance
Authors:
Ping-yeh Chiang,
Yipin Zhou,
Omid Poursaeed,
Satya Narayan Shukla,
Ashish Shah,
Tom Goldstein,
Ser-Nam Lim
Abstract:
Recently, Pyramid Adversarial training (Herrmann et al., 2022) has been shown to be very effective for improving clean accuracy and distribution-shift robustness of vision transformers. However, due to the iterative nature of adversarial training, the technique is up to 7 times more expensive than standard training. To make the method more efficient, we propose Universal Pyramid Adversarial traini…
▽ More
Recently, Pyramid Adversarial training (Herrmann et al., 2022) has been shown to be very effective for improving clean accuracy and distribution-shift robustness of vision transformers. However, due to the iterative nature of adversarial training, the technique is up to 7 times more expensive than standard training. To make the method more efficient, we propose Universal Pyramid Adversarial training, where we learn a single pyramid adversarial pattern shared across the whole dataset instead of the sample-wise patterns. With our proposed technique, we decrease the computational cost of Pyramid Adversarial training by up to 70% while retaining the majority of its benefit on clean performance and distribution-shift robustness. In addition, to the best of our knowledge, we are also the first to find that universal adversarial training can be leveraged to improve clean model performance.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
CMOS-based Single-Cycle In-Memory XOR/XNOR
Authors:
Shamiul Alam,
Jack Hutchins,
Nikhil Shukla,
Kazi Asifuzzaman,
Ahmedullah Aziz
Abstract:
Big data applications are on the rise, and so is the number of data centers. The ever-increasing massive data pool needs to be periodically backed up in a secure environment. Moreover, a massive amount of securely backed-up data is required for training binary convolutional neural networks for image classification. XOR and XNOR operations are essential for large-scale data copy verification, encry…
▽ More
Big data applications are on the rise, and so is the number of data centers. The ever-increasing massive data pool needs to be periodically backed up in a secure environment. Moreover, a massive amount of securely backed-up data is required for training binary convolutional neural networks for image classification. XOR and XNOR operations are essential for large-scale data copy verification, encryption, and classification algorithms. The disproportionate speed of existing compute and memory units makes the von Neumann architecture inefficient to perform these Boolean operations. Compute-in-memory (CiM) has proved to be an optimum approach for such bulk computations. The existing CiM-based XOR/XNOR techniques either require multiple cycles for computing or add to the complexity of the fabrication process. Here, we propose a CMOS-based hardware topology for single-cycle in-memory XOR/XNOR operations. Our design provides at least 2 times improvement in the latency compared with other existing CMOS-compatible solutions. We verify the proposed system through circuit/system-level simulations and evaluate its robustness using a 5000-point Monte Carlo variation analysis. This all-CMOS design paves the way for practical implementation of CiM XOR/XNOR at scaled technology nodes.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
A Note on Analyzing the Stability of Oscillator Ising Machines
Authors:
Mohammad Khairul Bashar,
Zongli Lin,
Nikhil Shukla
Abstract:
The rich non-linear dynamics of the coupled oscillators (under second harmonic injection) can be leveraged to solve computationally hard problems in combinatorial optimization such as finding the ground state of the Ising Hamiltonian. While prior work on the stability of the so-called Oscillator Ising Machines (OIMs) has used the linearization method, in this letter, we present a complementary met…
▽ More
The rich non-linear dynamics of the coupled oscillators (under second harmonic injection) can be leveraged to solve computationally hard problems in combinatorial optimization such as finding the ground state of the Ising Hamiltonian. While prior work on the stability of the so-called Oscillator Ising Machines (OIMs) has used the linearization method, in this letter, we present a complementary method to analyze stability using the second order derivative test of the energy / cost function. We establish the equivalence between the two methods, thus augmenting the tool kit for the design and implementation of OIMs.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding
Authors:
Mohamed Afham,
Satya Narayan Shukla,
Omid Poursaeed,
Pengchuan Zhang,
Ashish Shah,
Sernam Lim
Abstract:
While most modern video understanding models operate on short-range clips, real-world videos are often several minutes long with semantically consistent segments of variable length. A common approach to process long videos is applying a short-form video model over uniformly sampled clips of fixed temporal length and aggregating the outputs. This approach neglects the underlying nature of long vide…
▽ More
While most modern video understanding models operate on short-range clips, real-world videos are often several minutes long with semantically consistent segments of variable length. A common approach to process long videos is applying a short-form video model over uniformly sampled clips of fixed temporal length and aggregating the outputs. This approach neglects the underlying nature of long videos since fixed-length clips are often redundant or uninformative. In this paper, we aim to provide a generic and adaptive sampling approach for long-form videos in lieu of the de facto uniform sampling. Viewing videos as semantically consistent segments, we formulate a task-agnostic, unsupervised, and scalable approach based on Kernel Temporal Segmentation (KTS) for sampling and tokenizing long videos. We evaluate our method on long-form video understanding tasks such as video classification and temporal action localization, showing consistent gains over existing approaches and achieving state-of-the-art performance on long-form video modeling.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Authors:
Lucas Bandarkar,
Davis Liang,
Benjamin Muller,
Mikel Artetxe,
Satya Narayan Shukla,
Donald Husa,
Naman Goyal,
Abhinandan Krishnan,
Luke Zettlemoyer,
Madian Khabsa
Abstract:
We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multip…
▽ More
We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.
△ Less
Submitted 25 July, 2024; v1 submitted 31 August, 2023;
originally announced August 2023.
-
SDeMorph: Towards Better Facial De-morphing from Single Morph
Authors:
Nitish Shukla
Abstract:
Face Recognition Systems (FRS) are vulnerable to morph attacks. A face morph is created by combining multiple identities with the intention to fool FRS and making it match the morph with multiple identities. Current Morph Attack Detection (MAD) can detect the morph but are unable to recover the identities used to create the morph with satisfactory outcomes. Existing work in de-morphing is mostly r…
▽ More
Face Recognition Systems (FRS) are vulnerable to morph attacks. A face morph is created by combining multiple identities with the intention to fool FRS and making it match the morph with multiple identities. Current Morph Attack Detection (MAD) can detect the morph but are unable to recover the identities used to create the morph with satisfactory outcomes. Existing work in de-morphing is mostly reference-based, i.e. they require the availability of one identity to recover the other. Sudipta et al. \cite{ref9} proposed a reference-free de-morphing technique but the visual realism of outputs produced were feeble. In this work, we propose SDeMorph (Stably Diffused De-morpher), a novel de-morphing method that is reference-free and recovers the identities of bona fides. Our method produces feature-rich outputs that are of significantly high quality in terms of definition and facial fidelity. Our method utilizes Denoising Diffusion Probabilistic Models (DDPM) by destroying the input morphed signal and then reconstructing it back using a branched-UNet. Experiments on ASML, FRLL-FaceMorph, FRLL-MorDIFF, and SMDD datasets support the effectiveness of the proposed method.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
MOPO-LSI: A User Guide
Authors:
Yong Zheng,
Kumar Neelotpal Shukla,
Jasmine Xu,
David,
Wang,
Michael O'Leary
Abstract:
MOPO-LSI is an open-source Multi-Objective Portfolio Optimization Library for Sustainable Investments. This document provides a user guide for MOPO-LSI version 1.0, including problem setup, workflow and the hyper-parameters in configurations.
MOPO-LSI is an open-source Multi-Objective Portfolio Optimization Library for Sustainable Investments. This document provides a user guide for MOPO-LSI version 1.0, including problem setup, workflow and the hyper-parameters in configurations.
△ Less
Submitted 12 July, 2023; v1 submitted 4 July, 2023;
originally announced July 2023.
-
Generating Adversarial Attacks in the Latent Space
Authors:
Nitish Shukla,
Sudipta Banerjee
Abstract:
Adversarial attacks in the input (pixel) space typically incorporate noise margins such as $L_1$ or $L_{\infty}$-norm to produce imperceptibly perturbed data that confound deep learning networks. Such noise margins confine the magnitude of permissible noise. In this work, we propose injecting adversarial perturbations in the latent (feature) space using a generative adversarial network, removing t…
▽ More
Adversarial attacks in the input (pixel) space typically incorporate noise margins such as $L_1$ or $L_{\infty}$-norm to produce imperceptibly perturbed data that confound deep learning networks. Such noise margins confine the magnitude of permissible noise. In this work, we propose injecting adversarial perturbations in the latent (feature) space using a generative adversarial network, removing the need for margin-based priors. Experiments on MNIST, CIFAR10, Fashion-MNIST, CIFAR100 and Stanford Dogs datasets support the effectiveness of the proposed method in generating adversarial attacks in the latent space while ensuring a high degree of visual realism with respect to pixel-based adversarial attack methods.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Mixed-Type Wafer Classification For Low Memory Devices Using Knowledge Distillation
Authors:
Nitish Shukla,
Anurima Dey,
Srivatsan K
Abstract:
Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects…
▽ More
Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects in a wafer is generally harder compared to identifying a single defect. Recently, deep learning methods have gained significant traction in mixed-type DPR. However, the complexity of defects requires complex and large models making them very difficult to operate on low-memory embedded devices typically used in fabrication labs. Another common issue is the unavailability of labeled data to train complex networks. In this work, we propose an unsupervised training routine to distill the knowledge of complex pre-trained models to lightweight deployment-ready models. We empirically show that this type of training compresses the model without sacrificing accuracy despite being up to 10 times smaller than the teacher model. The compressed model also manages to outperform contemporary state-of-the-art models.
△ Less
Submitted 18 October, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Efficient Mixed-Type Wafer Defect Pattern Recognition Using Compact Deformable Convolutional Transformers
Authors:
Nitish Shukla
Abstract:
Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial to find the root cause of the issue and further improving the yield in the wafer foundry. Mixed-type DPR is much more complicated compared to single-type DPR due to varied spatial features, the uncertainty of defects, and the number of defects present. To accurately pre…
▽ More
Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial to find the root cause of the issue and further improving the yield in the wafer foundry. Mixed-type DPR is much more complicated compared to single-type DPR due to varied spatial features, the uncertainty of defects, and the number of defects present. To accurately predict the number of defects as well as the types of defects, we propose a novel compact deformable convolutional transformer (DC Transformer). Specifically, DC Transformer focuses on the global features present in the wafer map by virtue of learnable deformable kernels and multi-head attention to the global features. The proposed method succinctly models the internal relationship between the wafer maps and the defects. DC Transformer is evaluated on a real dataset containing 38 defect patterns. Experimental results show that DC Transformer performs exceptionally well in recognizing both single and mixed-type defects. The proposed method outperforms the current state of the models by a considerable margin
△ Less
Submitted 16 October, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
An Embarrassingly Simple Approach for Wafer Feature Extraction and Defect Pattern Recognition
Authors:
Nitish Shukla
Abstract:
Identifying defect patterns in a wafer map during manufacturing is crucial to find the root cause of the underlying issue and provides valuable insights on improving yield in the foundry. Currently used methods use deep neural networks to identify the defects. These methods are generally very huge and have significant inference time. They also require GPU support to efficiently operate. All these…
▽ More
Identifying defect patterns in a wafer map during manufacturing is crucial to find the root cause of the underlying issue and provides valuable insights on improving yield in the foundry. Currently used methods use deep neural networks to identify the defects. These methods are generally very huge and have significant inference time. They also require GPU support to efficiently operate. All these issues make these models not fit for on-line prediction in the manufacturing foundry. In this paper, we propose an extremely simple yet effective technique to extract features from wafer images. The proposed method is extremely fast, intuitive, and non-parametric while being explainable. The experiment results show that the proposed pipeline outperforms conventional deep learning models. Our feature extraction requires no training or fine-tuning while preserving the relative shape and location of data points as revealed by our interpretability analysis.
△ Less
Submitted 16 October, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop
Authors:
Neelesh K Shukla,
Msp Raja,
Raghu Katikeri,
Amit Vaid
Abstract:
Business documents come in a variety of structures, formats and information needs which makes information extraction a challenging task. Due to these variations, having a document generic model which can work well across all types of documents and for all the use cases seems far-fetched. For document-specific models, we would need customized document-specific labels. We introduce DoSA (Document Sp…
▽ More
Business documents come in a variety of structures, formats and information needs which makes information extraction a challenging task. Due to these variations, having a document generic model which can work well across all types of documents and for all the use cases seems far-fetched. For document-specific models, we would need customized document-specific labels. We introduce DoSA (Document Specific Automated Annotations), which helps annotators in generating initial annotations automatically using our novel bootstrap approach by leveraging document generic datasets and models. These initial annotations can further be reviewed by a human for correctness. An initial document-specific model can be trained and its inference can be used as feedback for generating more automated annotations. These automated annotations can be reviewed by human-in-the-loop for the correctness and a new improved model can be trained using the current model as pre-trained model before going for the next iteration. In this paper, our scope is limited to Form like documents due to limited availability of generic annotated datasets, but this idea can be extended to a variety of other documents as more datasets are built. An open-source ready-to-use implementation is made available on GitHub https://github.com/neeleshkshukla/DoSA.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Computational Models based on Synchronized Oscillators for Solving Combinatorial Optimization Problems
Authors:
Antik Mallick,
Mohammad Khairul Bashar,
Zongli Lin,
Nikhil Shukla
Abstract:
The equivalence between the natural minimization of energy in a dynamical system and the minimization of an objective function characterizing a combinatorial optimization problem offers a promising approach to designing dynamical system-inspired computational models and solvers for such problems. For instance, the ground state energy of coupled electronic oscillators, under second harmonic injecti…
▽ More
The equivalence between the natural minimization of energy in a dynamical system and the minimization of an objective function characterizing a combinatorial optimization problem offers a promising approach to designing dynamical system-inspired computational models and solvers for such problems. For instance, the ground state energy of coupled electronic oscillators, under second harmonic injection, can be directly mapped to the optimal solution of the Maximum Cut problem. However, prior work has focused on a limited set of such problems. Therefore, in this work, we formulate computing models based on synchronized oscillator dynamics for a broad spectrum of combinatorial optimization problems ranging from the Max-K-Cut (the general version of the Maximum Cut problem) to the Traveling Salesman Problem. We show that synchronized oscillator dynamics can be engineered to solve these different combinatorial optimization problems by appropriately designing the coupling function and the external injection to the oscillators. Our work marks a step forward towards expanding the functionalities of oscillator-based analog accelerators and furthers the scope of dynamical system solvers for combinatorial optimization problems.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
CMOS-Compatible Ising Machines built using Bistable Latches Coupled through Ferroelectric Transistor Arrays
Authors:
Antik Mallick,
Zijian Zhao,
Mohammad Khairul Bashar,
Shamiul Alam,
Md Mazharul Islam,
Yi Xiao,
Yixin Xu,
Ahmedullah Aziz,
Vijaykrishnan Narayanan,
Kai Ni,
Nikhil Shukla
Abstract:
Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the spin interaction, is also a potential bottleneck in…
▽ More
Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the spin interaction, is also a potential bottleneck in the scalability of such platforms. Therefore, in this work, we propose an Ising machine platform that exploits the novel behavior of compact bi-stable CMOS-latches (cross-coupled inverters) as classical Ising spins interacting through highly scalable and CMOS-process compatible ferroelectric-HfO2-based Ferroelectric FETs (FeFETs) which act as coupling elements. We experimentally demonstrate the prototype building blocks of this system, and evaluate the behavior of the scaled system using simulations. We project that the proposed architecture can compute Ising solutions with an efficiency of ~1.04 x 10^8 solutions/W/second. Our work not only provides a pathway to realizing CMOS-compatible designs but also to overcoming their scaling challenges.
△ Less
Submitted 29 May, 2022;
originally announced May 2022.
-
2-speed network ensemble for efficient classification of incremental land-use/land-cover satellite image chips
Authors:
Michael James Horry,
Subrata Chakraborty,
Biswajeet Pradhan,
Nagesh Shukla,
Sanjoy Paul
Abstract:
The ever-growing volume of satellite imagery data presents a challenge for industry and governments making data-driven decisions based on the timely analysis of very large data sets. Commonly used deep learning algorithms for automatic classification of satellite images are time and resource-intensive to train. The cost of retraining in the context of Big Data presents a practical challenge when n…
▽ More
The ever-growing volume of satellite imagery data presents a challenge for industry and governments making data-driven decisions based on the timely analysis of very large data sets. Commonly used deep learning algorithms for automatic classification of satellite images are time and resource-intensive to train. The cost of retraining in the context of Big Data presents a practical challenge when new image data and/or classes are added to a training corpus. Recognizing the need for an adaptable, accurate, and scalable satellite image chip classification scheme, in this research we present an ensemble of: i) a slow to train but high accuracy vision transformer; and ii) a fast to train, low-parameter convolutional neural network. The vision transformer model provides a scalable and accurate foundation model. The high-speed CNN provides an efficient means of incorporating newly labelled data into analysis, at the expense of lower accuracy. To simulate incremental data, the very large (~400,000 images) So2Sat LCZ42 satellite image chip dataset is divided into four intervals, with the high-speed CNN retrained every interval and the vision transformer trained every half interval. This experimental setup mimics an increase in data volume and diversity over time. For the task of automated land-cover/land-use classification, the ensemble models for each data increment outperform each of the component models, with best accuracy of 65% against a holdout test partition of the So2Sat dataset. The proposed ensemble and staggered training schedule provide a scalable and cost-effective satellite image classification scheme that is optimized to process very large volumes of satellite data.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Distribution Shift in Airline Customer Behavior during COVID-19
Authors:
Abhinav Garg,
Naman Shukla,
Lavanya Marla,
Sriram Somanchi
Abstract:
Traditional AI approaches in customized (personalized) contextual pricing applications assume that the data distribution at the time of online pricing is similar to that observed during training. However, this assumption may be violated in practice because of the dynamic nature of customer buying patterns, particularly due to unanticipated system shocks such as COVID-19. We study the changes in cu…
▽ More
Traditional AI approaches in customized (personalized) contextual pricing applications assume that the data distribution at the time of online pricing is similar to that observed during training. However, this assumption may be violated in practice because of the dynamic nature of customer buying patterns, particularly due to unanticipated system shocks such as COVID-19. We study the changes in customer behavior for a major airline during the COVID-19 pandemic by framing it as a covariate shift and concept drift detection problem. We identify which customers changed their travel and purchase behavior and the attributes affecting that change using (i) Fast Generalized Subset Scanning and (ii) Causal Forests. In our experiments with simulated and real-world data, we present how these two techniques can be used through qualitative analysis.
△ Less
Submitted 23 December, 2021; v1 submitted 29 November, 2021;
originally announced November 2021.
-
FREGAN : an application of generative adversarial networks in enhancing the frame rate of videos
Authors:
Rishik Mishra,
Neeraj Gupta,
Nitya Shukla
Abstract:
A digital video is a collection of individual frames, while streaming the video the scene utilized the time slice for each frame. High refresh rate and high frame rate is the demand of all high technology applications. The action tracking in videos becomes easier and motion becomes smoother in gaming applications due to the high refresh rate. It provides a faster response because of less time in b…
▽ More
A digital video is a collection of individual frames, while streaming the video the scene utilized the time slice for each frame. High refresh rate and high frame rate is the demand of all high technology applications. The action tracking in videos becomes easier and motion becomes smoother in gaming applications due to the high refresh rate. It provides a faster response because of less time in between each frame that is displayed on the screen. FREGAN (Frame Rate Enhancement Generative Adversarial Network) model has been proposed, which predicts future frames of a video sequence based on a sequence of past frames. In this paper, we investigated the GAN model and proposed FREGAN for the enhancement of frame rate in videos. We have utilized Huber loss as a loss function in the proposed FREGAN. It provided excellent results in super-resolution and we have tried to reciprocate that performance in the application of frame rate enhancement. We have validated the effectiveness of the proposed model on the standard datasets (UCF101 and RFree500). The experimental outcomes illustrate that the proposed model has a Peak signal-to-noise ratio (PSNR) of 34.94 and a Structural Similarity Index (SSIM) of 0.95.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Negotiating Networks in Oligopoly Markets for Price-Sensitive Products
Authors:
Naman Shukla,
Kartik Yellepeddi
Abstract:
We present a novel framework to learn functions that estimate decisions of sellers and buyers simultaneously in an oligopoly market for a price-sensitive product. In this setting, the aim of the seller network is to come up with a price for a given context such that the expected revenue is maximized by considering the buyer's satisfaction as well. On the other hand, the aim of the buyer network is…
▽ More
We present a novel framework to learn functions that estimate decisions of sellers and buyers simultaneously in an oligopoly market for a price-sensitive product. In this setting, the aim of the seller network is to come up with a price for a given context such that the expected revenue is maximized by considering the buyer's satisfaction as well. On the other hand, the aim of the buyer network is to assign probability of purchase to the offered price to mimic the real world buyers' responses while also showing price sensitivity through its action. In other words, rejecting the unnecessarily high priced products. Similar to generative adversarial networks, this framework corresponds to a minimax two-player game. In our experiments with simulated and real-world transaction data, we compared our framework with the baseline model and demonstrated its potential through proposed evaluation metrics.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
An Oscillator-based MaxSAT solver
Authors:
Mohammad Khairul Bashar,
Jaykumar Vaidya,
Antik Mallick,
R S Surya Kanthi,
Shamiul Alam,
Nazmul Amin,
Chonghan Lee,
Feng Shi,
Ahmedullah Aziz,
Vijaykrishnan Narayanan,
Nikhil Shukla
Abstract:
The quest to solve hard combinatorial optimization problems efficiently -- still a longstanding challenge for traditional digital computers -- has inspired the exploration of many alternate computing models and platforms. As a case in point, oscillator networks offer a potentially promising energy efficient and scalable option. However, prior oscillator-based combinatorial optimization solvers hav…
▽ More
The quest to solve hard combinatorial optimization problems efficiently -- still a longstanding challenge for traditional digital computers -- has inspired the exploration of many alternate computing models and platforms. As a case in point, oscillator networks offer a potentially promising energy efficient and scalable option. However, prior oscillator-based combinatorial optimization solvers have primarily focused on quadratic combinatorial optimization problems that consider only pairwise interaction among the oscillators. In this work, we propose a new computational model based on the maximum entropy production (MEP) principle that exploits higher order interactions among the oscillators, and demonstrate its application in solving the non-quadratic maximum satisfiability (MaxSAT) problem. We demonstrate that the solution to the MaxSAT problem can be directly mapped to the entropy production rate in the oscillator network, and subsequently, propose an area-efficient hardware implementation that leverages Compute-in-Memory (CiM) primitives. Using experiments along with analytical and circuit simulations, we elucidate the performance of the proposed approach in computing high-quality optimal / near-optimal solutions to the MaxSAT problem. Our work not only reveals how oscillators can solve non-quadratic combinatorial optimization problems such as MaxSAT but also extends the application of this dynamical system-based approach to a broader class of problems that can be easily decomposed to the MaxSAT solution.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Creating Electronic Oscillator-based Ising Machines without External Injection Locking
Authors:
Jaykumar Vaidya,
R S Surya Kanthi,
Nikhil Shukla
Abstract:
Coupled electronic oscillators have recently been explored as a compact, integrated circuit- and room temperature operation- compatible hardware platform to design Ising machines. However, such implementations presently require the injection of an externally generated second-harmonic signal to impose the phase bipartition among the oscillators. In this work, we experimentally demonstrate a new ele…
▽ More
Coupled electronic oscillators have recently been explored as a compact, integrated circuit- and room temperature operation- compatible hardware platform to design Ising machines. However, such implementations presently require the injection of an externally generated second-harmonic signal to impose the phase bipartition among the oscillators. In this work, we experimentally demonstrate a new electronic autaptic oscillator (EAO) that uses engineered feedback to eliminate the need for the generation and injection of the external second harmonic signal to minimize the Ising Hamiltonian. The feedback in the EAO is engineered to effectively generate the second harmonic signal internally. Using this oscillator design, we show experimentally, that a system of capacitively coupled EAOs exhibits the desired bipartition in the oscillator phases, and subsequently, demonstrate its application in solving the computationally hard Maximum Cut (MaxCut) problem. Our work not only establishes a new oscillator design aligned to the needs of the oscillator Ising machine but also advances the efforts to creating application specific analog computing platforms.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Heteroscedastic Temporal Variational Autoencoder For Irregularly Sampled Time Series
Authors:
Satya Narayan Shukla,
Benjamin M. Marlin
Abstract:
Irregularly sampled time series commonly occur in several domains where they present a significant challenge to standard deep learning models. In this paper, we propose a new deep learning framework for probabilistic interpolation of irregularly sampled time series that we call the Heteroscedastic Temporal Variational Autoencoder (HeTVAE). HeTVAE includes a novel input layer to encode information…
▽ More
Irregularly sampled time series commonly occur in several domains where they present a significant challenge to standard deep learning models. In this paper, we propose a new deep learning framework for probabilistic interpolation of irregularly sampled time series that we call the Heteroscedastic Temporal Variational Autoencoder (HeTVAE). HeTVAE includes a novel input layer to encode information about input observation sparsity, a temporal VAE architecture to propagate uncertainty due to input sparsity, and a heteroscedastic output layer to enable variable uncertainty in output interpolations. Our results show that the proposed architecture is better able to reflect variable uncertainty through time due to sparse and irregular sampling than a range of baseline and traditional models, as well as recently proposed deep latent variable models that use homoscedastic output layers.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis
Authors:
Feng Shi,
Chonghan Lee,
Mohammad Khairul Bashar,
Nikhil Shukla,
Song-Chun Zhu,
Vijaykrishnan Narayanan
Abstract:
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. The increasing popularity of these constraint problems in electronic design automation encourages studies on different SAT problems and their properties for further computational efficiency. There has been both theoretical and practical success of modern Conflict-driven clause learning SAT solvers, which allo…
▽ More
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. The increasing popularity of these constraint problems in electronic design automation encourages studies on different SAT problems and their properties for further computational efficiency. There has been both theoretical and practical success of modern Conflict-driven clause learning SAT solvers, which allows solving very large industrial instances in a relatively short amount of time. Recently, machine learning approaches provide a new dimension to solving this challenging problem. Neural symbolic models could serve as generic solvers that can be specialized for specific domains based on data without any changes to the structure of the model. In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem, which is the optimization version of SAT where the goal is to satisfy the maximum number of clauses. Our model has a scale-free structure which could process varying size of instances. We use meta-path and self-attention mechanism to capture interactions among homogeneous nodes. We adopt cross-attention mechanisms on the bipartite graph to capture interactions among heterogeneous nodes. We further apply an iterative algorithm to our model to satisfy additional clauses, enabling a solution approaching that of an exact-SAT problem. The attention mechanisms leverage the parallelism for speedup. Our evaluation indicates improved speedup compared to heuristic approaches and improved completion rate compared to machine learning approaches.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Multi-Time Attention Networks for Irregularly Sampled Time Series
Authors:
Satya Narayan Shukla,
Benjamin M. Marlin
Abstract:
Irregular sampling occurs in many time series modeling applications where it presents a significant challenge to standard deep learning models. This work is motivated by the analysis of physiological time series data in electronic health records, which are sparse, irregularly sampled, and multivariate. In this paper, we propose a new deep learning framework for this setting that we call Multi-Time…
▽ More
Irregular sampling occurs in many time series modeling applications where it presents a significant challenge to standard deep learning models. This work is motivated by the analysis of physiological time series data in electronic health records, which are sparse, irregularly sampled, and multivariate. In this paper, we propose a new deep learning framework for this setting that we call Multi-Time Attention Networks. Multi-Time Attention Networks learn an embedding of continuous-time values and use an attention mechanism to produce a fixed-length representation of a time series containing a variable number of observations. We investigate the performance of this framework on interpolation and classification tasks using multiple datasets. Our results show that the proposed approach performs as well or better than a range of baseline and recently proposed models while offering significantly faster training times than current state-of-the-art methods.
△ Less
Submitted 7 June, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
Using Noise to Augment Synchronization among Oscillators
Authors:
Jaykumar Vaidya,
Mohammad Khairul Bashar,
Nikhil Shukla
Abstract:
Noise is expected to play an important role in the dynamics of analog systems such as coupled oscillators which have recently been explored as a hardware platform for application in computing. In this work, we experimentally investigate the effect of noise on the synchronization of relaxation oscillators and their computational properties. Specifically, in contrast to its typically expected advers…
▽ More
Noise is expected to play an important role in the dynamics of analog systems such as coupled oscillators which have recently been explored as a hardware platform for application in computing. In this work, we experimentally investigate the effect of noise on the synchronization of relaxation oscillators and their computational properties. Specifically, in contrast to its typically expected adverse effect, we first demonstrate that a common white noise input induces global frequency synchronization among uncoupled oscillators. Experiments show that the minimum noise voltage required to induce synchronization increases linearly with the amplitude of the oscillator output whereas it decreases with increasing number of oscillators. Further, our work reveals that in a coupled system of oscillators - relevant to solving computational problems such as graph coloring, the injection of white noise helps reduce the minimum required capacitive coupling strength. With the injection of noise, the coupled system demonstrates frequency synchronization along with the desired phase-based computational properties at 5x lower coupling strength than that required when no external noise is introduced. Consequently, this can reduce the footprint of the coupling element and the corresponding area-intensive coupling architecture. Our work shows that noise can be utilized as an effective knob to optimize the implementation of coupled oscillator-based computing platforms.
△ Less
Submitted 10 January, 2021; v1 submitted 2 December, 2020;
originally announced December 2020.
-
A Survey on Principles, Models and Methods for Learning from Irregularly Sampled Time Series
Authors:
Satya Narayan Shukla,
Benjamin M. Marlin
Abstract:
Irregularly sampled time series data arise naturally in many application domains including biology, ecology, climate science, astronomy, and health. Such data represent fundamental challenges to many classical models from machine learning and statistics due to the presence of non-uniform intervals between observations. However, there has been significant progress within the machine learning commun…
▽ More
Irregularly sampled time series data arise naturally in many application domains including biology, ecology, climate science, astronomy, and health. Such data represent fundamental challenges to many classical models from machine learning and statistics due to the presence of non-uniform intervals between observations. However, there has been significant progress within the machine learning community over the last decade on developing specialized models and architectures for learning from irregularly sampled univariate and multivariate time series data. In this survey, we first describe several axes along which approaches to learning from irregularly sampled time series differ including what data representations they are based on, what modeling primitives they leverage to deal with the fundamental problem of irregular sampling, and what inference tasks they are designed to perform. We then survey the recent literature organized primarily along the axis of modeling primitives. We describe approaches based on temporal discretization, interpolation, recurrence, attention and structural invariance. We discuss similarities and differences between approaches and highlight primary strengths and weaknesses.
△ Less
Submitted 5 January, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks
Authors:
Anit Kumar Sahu,
Satya Narayan Shukla,
J. Zico Kolter
Abstract:
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle, providing us with loss function evaluations. Although this setting has been investigated in previous work, most past approaches using zeroth order optimization implicitly assume that the gradients of the loss function with respect to the input images are \emph{unstruc…
▽ More
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle, providing us with loss function evaluations. Although this setting has been investigated in previous work, most past approaches using zeroth order optimization implicitly assume that the gradients of the loss function with respect to the input images are \emph{unstructured}. In this work, we show that in fact substantial correlations exist within these gradients, and we propose to capture these correlations via a Gaussian Markov random field (GMRF). Given the intractability of the explicit covariance structure of the MRF, we show that the covariance structure can be efficiently represented using the Fast Fourier Transform (FFT), along with low-rank updates to perform exact posterior estimation under this model. We use this modeling technique to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method~(FGSM), and show that the method uses fewer queries and achieves higher attack success rates than the current state of the art. We also highlight the general applicability of this gradient modeling setup.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Experimental Demonstration of a Reconfigurable Coupled Oscillator Platform to Solve the Max-Cut Problem
Authors:
Mohammad Khairul Bashar,
Antik Mallick,
Daniel S Truesdell,
Benton H. Calhoun,
Siddharth Joshi,
Nikhil Shukla
Abstract:
In this work, we experimentally demonstrate an integrated circuit (IC) of 30 relaxation oscillators with reconfigurable capacitive coupling to solve the NP-Hard Maximum Cut (Max-Cut) problem. We show that under the influence of an external second-harmonic injection signal, the oscillator phases exhibit a bi-partition which can be used to calculate a high quality approximate Max-Cut solution. Lever…
▽ More
In this work, we experimentally demonstrate an integrated circuit (IC) of 30 relaxation oscillators with reconfigurable capacitive coupling to solve the NP-Hard Maximum Cut (Max-Cut) problem. We show that under the influence of an external second-harmonic injection signal, the oscillator phases exhibit a bi-partition which can be used to calculate a high quality approximate Max-Cut solution. Leveraging the all-to-all reconfigurable coupling architecture, we experimentally evaluate the computational properties of the oscillators using randomly generated graph instances of varying size and edge density . Further, comparing the Max-Cut solutions with the optimal values, we show that the oscillators (after simple post-processing) produce a Max-Cut that is within 99% of the optimal value in 28 of the 36 measured graphs; importantly, the oscillators are particularly effective in dense graphs with the Max-Cut being optimal in seven out of nine measured graphs with edge density 0.8. Our work marks a step towards creating an efficient, room-temperature-compatible non-Boolean hardware-based solver for hard combinatorial optimization problems.
△ Less
Submitted 12 October, 2020; v1 submitted 10 August, 2020;
originally announced August 2020.
-
Simple and Efficient Hard Label Black-box Adversarial Attacks in Low Query Budget Regimes
Authors:
Satya Narayan Shukla,
Anit Kumar Sahu,
Devin Willmott,
J. Zico Kolter
Abstract:
We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for developing black-box adversarial attacks. Issues with BO's performance in high dimensions are avo…
▽ More
We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for developing black-box adversarial attacks. Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in a structured low-dimensional subspace. We demonstrate the efficacy of our proposed attack method by evaluating both $\ell_\infty$ and $\ell_2$ norm constrained untargeted and targeted hard label black-box attacks on three standard datasets - MNIST, CIFAR-10 and ImageNet. Our proposed approach consistently achieves 2x to 10x higher attack success rate while requiring 10x to 20x fewer queries compared to the current state-of-the-art black-box adversarial attacks.
△ Less
Submitted 11 June, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Integrating Physiological Time Series and Clinical Notes with Deep Learning for Improved ICU Mortality Prediction
Authors:
Satya Narayan Shukla,
Benjamin M. Marlin
Abstract:
Intensive Care Unit Electronic Health Records (ICU EHRs) store multimodal data about patients including clinical notes, sparse and irregularly sampled physiological time series, lab results, and more. To date, most methods designed to learn predictive models from ICU EHR data have focused on a single modality. In this paper, we leverage the recently proposed interpolation-prediction deep learning…
▽ More
Intensive Care Unit Electronic Health Records (ICU EHRs) store multimodal data about patients including clinical notes, sparse and irregularly sampled physiological time series, lab results, and more. To date, most methods designed to learn predictive models from ICU EHR data have focused on a single modality. In this paper, we leverage the recently proposed interpolation-prediction deep learning architecture(Shukla and Marlin 2019) as a basis for exploring how physiological time series data and clinical notes can be integrated into a unified mortality prediction model. We study both early and late fusion approaches and demonstrate how the relative predictive value of clinical text and physiological data change over time. Our results show that a late fusion approach can provide a statistically significant improvement in mortality prediction performance over using individual modalities in isolation.
△ Less
Submitted 18 March, 2021; v1 submitted 24 March, 2020;
originally announced March 2020.
-
Assessing the Adversarial Robustness of Monte Carlo and Distillation Methods for Deep Bayesian Neural Network Classification
Authors:
Meet P. Vadera,
Satya Narayan Shukla,
Brian Jalaian,
Benjamin M. Marlin
Abstract:
In this paper, we consider the problem of assessing the adversarial robustness of deep neural network models under both Markov chain Monte Carlo (MCMC) and Bayesian Dark Knowledge (BDK) inference approximations. We characterize the robustness of each method to two types of adversarial attacks: the fast gradient sign method (FGSM) and projected gradient descent (PGD). We show that full MCMC-based i…
▽ More
In this paper, we consider the problem of assessing the adversarial robustness of deep neural network models under both Markov chain Monte Carlo (MCMC) and Bayesian Dark Knowledge (BDK) inference approximations. We characterize the robustness of each method to two types of adversarial attacks: the fast gradient sign method (FGSM) and projected gradient descent (PGD). We show that full MCMC-based inference has excellent robustness, significantly outperforming standard point estimation-based learning. On the other hand, BDK provides marginal improvements. As an additional contribution, we present a storage-efficient approach to computing adversarial examples for large Monte Carlo ensembles using both the FGSM and PGD attacks.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
Black-box Adversarial Attacks with Bayesian Optimization
Authors:
Satya Narayan Shukla,
Anit Kumar Sahu,
Devin Willmott,
J. Zico Kolter
Abstract:
We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high…
▽ More
We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization~(BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high dimensional deep learning models by effective dimension upsampling techniques. Our proposed approach achieves performance comparable to the state of the art black-box adversarial attacks albeit with a much lower average query count. In particular, in low query budget regimes, our proposed method reduces the query count up to $80\%$ with respect to the state of the art methods.
△ Less
Submitted 30 September, 2019;
originally announced September 2019.
-
How to Incorporate Monotonicity in Deep Networks While Preserving Flexibility?
Authors:
Akhil Gupta,
Naman Shukla,
Lavanya Marla,
Arinbjörn Kolbeinsson,
Kartik Yellepeddi
Abstract:
The importance of domain knowledge in enhancing model performance and making reliable predictions in the real-world is critical. This has led to an increased focus on specific model properties for interpretability. We focus on incorporating monotonic trends, and propose a novel gradient-based point-wise loss function for enforcing partial monotonicity with deep neural networks. While recent develo…
▽ More
The importance of domain knowledge in enhancing model performance and making reliable predictions in the real-world is critical. This has led to an increased focus on specific model properties for interpretability. We focus on incorporating monotonic trends, and propose a novel gradient-based point-wise loss function for enforcing partial monotonicity with deep neural networks. While recent developments have relied on structural changes to the model, our approach aims at enhancing the learning process. Our model-agnostic point-wise loss function acts as a plug-in to the standard loss and penalizes non-monotonic gradients. We demonstrate that the point-wise loss produces comparable (and sometimes better) results on both AUC and monotonicity measure, as opposed to state-of-the-art deep lattice networks that guarantee monotonicity. Moreover, it is able to learn differentiated individual trends and produces smoother conditional curves which are important for personalized decisions, while preserving the flexibility of deep networks.
△ Less
Submitted 2 December, 2019; v1 submitted 23 September, 2019;
originally announced September 2019.
-
Interpolation-Prediction Networks for Irregularly Sampled Time Series
Authors:
Satya Narayan Shukla,
Benjamin M. Marlin
Abstract:
In this paper, we present a new deep learning architecture for addressing the problem of supervised learning with sparse and irregularly sampled multivariate time series. The architecture is based on the use of a semi-parametric interpolation network followed by the application of a prediction network. The interpolation network allows for information to be shared across multiple dimensions of a mu…
▽ More
In this paper, we present a new deep learning architecture for addressing the problem of supervised learning with sparse and irregularly sampled multivariate time series. The architecture is based on the use of a semi-parametric interpolation network followed by the application of a prediction network. The interpolation network allows for information to be shared across multiple dimensions of a multivariate time series during the interpolation stage, while any standard deep learning model can be used for the prediction network. This work is motivated by the analysis of physiological time series data in electronic health records, which are sparse, irregularly sampled, and multivariate. We investigate the performance of this architecture on both classification and regression tasks, showing that our approach outperforms a range of baseline and recently proposed models.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
Adaptive Model Selection Framework: An Application to Airline Pricing
Authors:
Naman Shukla,
Arinbjörn Kolbeinsson,
Lavanya Marla,
Kartik Yellepeddi
Abstract:
Multiple machine learning and prediction models are often used for the same prediction or recommendation task. In our recent work, where we develop and deploy airline ancillary pricing models in an online setting, we found that among multiple pricing models developed, no one model clearly dominates other models for all incoming customer requests. Thus, as algorithm designers, we face an exploratio…
▽ More
Multiple machine learning and prediction models are often used for the same prediction or recommendation task. In our recent work, where we develop and deploy airline ancillary pricing models in an online setting, we found that among multiple pricing models developed, no one model clearly dominates other models for all incoming customer requests. Thus, as algorithm designers, we face an exploration - exploitation dilemma. In this work, we introduce an adaptive meta-decision framework that uses Thompson sampling, a popular multi-armed bandit solution method, to route customer requests to various pricing models based on their online performance. We show that this adaptive approach outperform a uniformly random selection policy by improving the expected revenue per offer by 43% and conversion score by 58% in an offline simulation.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
Dynamic Pricing for Airline Ancillaries with Customer Context
Authors:
Naman Shukla,
Arinbjörn Kolbeinsson,
Ken Otwell,
Lavanya Marla,
Kartik Yellepeddi
Abstract:
Ancillaries have become a major source of revenue and profitability in the travel industry. Yet, conventional pricing strategies are based on business rules that are poorly optimized and do not respond to changing market conditions. This paper describes the dynamic pricing model developed by Deepair solutions, an AI technology provider for travel suppliers. We present a pricing model that provides…
▽ More
Ancillaries have become a major source of revenue and profitability in the travel industry. Yet, conventional pricing strategies are based on business rules that are poorly optimized and do not respond to changing market conditions. This paper describes the dynamic pricing model developed by Deepair solutions, an AI technology provider for travel suppliers. We present a pricing model that provides dynamic pricing recommendations specific to each customer interaction and optimizes expected revenue per customer. The unique nature of personalized pricing provides the opportunity to search over the market space to find the optimal price-point of each ancillary for each customer, without violating customer privacy. In this paper, we present and compare three approaches for dynamic pricing of ancillaries, with increasing levels of sophistication: (1) a two-stage forecasting and optimization model using a logistic mapping function; (2) a two-stage model that uses a deep neural network for forecasting, coupled with a revenue maximization technique using discrete exhaustive search; (3) a single-stage end-to-end deep neural network that recommends the optimal price. We describe the performance of these models based on both offline and online evaluations. We also measure the real-world business impact of these approaches by deploying them in an A/B test on an airline's internet booking website. We show that traditional machine learning techniques outperform human rule-based approaches in an online setting by improving conversion by 36% and revenue per offer by 10%. We also provide results for our offline experiments which show that deep learning algorithms outperform traditional machine learning techniques for this problem. Our end-to-end deep learning model is currently being deployed by the airline in their booking system.
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
Modeling Irregularly Sampled Clinical Time Series
Authors:
Satya Narayan Shukla,
Benjamin M. Marlin
Abstract:
While the volume of electronic health records (EHR) data continues to grow, it remains rare for hospital systems to capture dense physiological data streams, even in the data-rich intensive care unit setting. Instead, typical EHR records consist of sparse and irregularly observed multivariate time series, which are well understood to present particularly challenging problems for machine learning m…
▽ More
While the volume of electronic health records (EHR) data continues to grow, it remains rare for hospital systems to capture dense physiological data streams, even in the data-rich intensive care unit setting. Instead, typical EHR records consist of sparse and irregularly observed multivariate time series, which are well understood to present particularly challenging problems for machine learning methods. In this paper, we present a new deep learning architecture for addressing this problem based on the use of a semi-parametric interpolation network followed by the application of a prediction network. The interpolation network allows for information to be shared across multiple dimensions during the interpolation stage, while any standard deep learning model can be used for the prediction network. We investigate the performance of this architecture on the problems of mortality and length of stay prediction.
△ Less
Submitted 2 December, 2018;
originally announced December 2018.
-
Vertex coloring of graphs via phase dynamics of coupled oscillatory networks
Authors:
Abhinav Parihar,
Nikhil Shukla,
Matthew Jerry,
Suman Datta,
Arijit Raychowdhury
Abstract:
While Boolean logic has been the backbone of digital information processing, there are classes of computationally hard problems wherein this conventional paradigm is fundamentally inefficient. Vertex coloring of graphs, belonging to the class of combinatorial optimization represents such a problem; and is well studied for its wide spectrum of applications in data sciences, life sciences, social sc…
▽ More
While Boolean logic has been the backbone of digital information processing, there are classes of computationally hard problems wherein this conventional paradigm is fundamentally inefficient. Vertex coloring of graphs, belonging to the class of combinatorial optimization represents such a problem; and is well studied for its wide spectrum of applications in data sciences, life sciences, social sciences and engineering and technology. This motivates alternate, and more efficient non-Boolean pathways to their solution. Here, we demonstrate a coupled relaxation oscillator based dynamical system that exploits the insulator-metal transition in vanadium dioxide (VO2), to efficiently solve the vertex coloring of graphs. By harnessing the natural analogue between optimization, pertinent to graph coloring solutions, and energy minimization processes in highly parallel, interconnected dynamical systems, we harness the physical manifestation of the latter process to approximate the optimal coloring of k-partite graphs. We further indicate a fundamental connection between the eigen properties of a linear dynamical system and the spectral algorithms that can solve approximate graph coloring. Our work not only elucidates a physics-based computing approach but also presents tantalizing opportunities for building customized analog co-processors for solving hard problems efficiently.
△ Less
Submitted 16 March, 2017; v1 submitted 7 September, 2016;
originally announced September 2016.
-
Computing with Dynamical Systems Based on Insulator-Metal-Transition Oscillators
Authors:
Abhinav Parihar,
Nikhil Shukla,
Matthew Jerry,
Suman Datta,
Arijit Raychowdhury
Abstract:
In this paper we review recent work on novel computing paradigms using coupled oscillatory dynamical systems. We explore systems of relaxation oscillators based on linear state transitioning devices, which switch between two discrete states with hysteresis. By harnessing the dynamics of complex, connected systems we embrace the philosophy of "let physics do the computing" and demonstrate how compl…
▽ More
In this paper we review recent work on novel computing paradigms using coupled oscillatory dynamical systems. We explore systems of relaxation oscillators based on linear state transitioning devices, which switch between two discrete states with hysteresis. By harnessing the dynamics of complex, connected systems we embrace the philosophy of "let physics do the computing" and demonstrate how complex phase and frequency dynamics of such systems can be controlled, programmed and observed to solve computationally hard problems. Although our discussion in this paper is limited to Insulator-to-Metallic (IMT) state transition devices, the general philosophy of such computing paradigms can be translated to other mediums including optical systems. We present the necessary mathematical treatments necessary to understand the time evolution of these systems and demonstrate through recent experimental results the potential of such computational primitives.
△ Less
Submitted 19 August, 2016;
originally announced August 2016.
-
Stronger Error Disturbance Relations for Incompatible Quantum Measurements
Authors:
Chiranjib Mukhopadhyay,
Namrata Shukla,
Arun Kumar Pati
Abstract:
We formulate a new error-disturbance relation, which is free from explicit dependence upon variances in observables. This error-disturbance relation shows improvement over the one provided by the Branciard inequality and the Ozawa inequality for some initial states and for particular class of joint measurements under consideration. We also prove a modified form of Ozawa's error-disturbance relatio…
▽ More
We formulate a new error-disturbance relation, which is free from explicit dependence upon variances in observables. This error-disturbance relation shows improvement over the one provided by the Branciard inequality and the Ozawa inequality for some initial states and for particular class of joint measurements under consideration. We also prove a modified form of Ozawa's error-disturbance relation. The later relation provides a tighter bound compared to the Ozawa and the Branciard inequalities for a small number of states.
△ Less
Submitted 13 December, 2016; v1 submitted 17 March, 2015;
originally announced March 2015.