Skip to main content

Showing 1–50 of 68 results for author: Chaudhuri, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.08627  [pdf, other

    stat.ML cs.CE cs.LG

    Multifidelity linear regression for scientific machine learning from scarce data

    Authors: Elizabeth Qian, Dayoung Kang, Vignesh Sella, Anirban Chaudhuri

    Abstract: Machine learning (ML) methods, which fit to data the parameters of a given parameterized model class, have garnered significant interest as potential methods for learning surrogate models for complex engineering systems for which traditional simulation is expensive. However, in many scientific and engineering settings, generating high-fidelity data on which to train ML models is expensive, and the… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  2. arXiv:2403.05033  [pdf, other

    cs.LG cs.AI

    Quantifying Manifolds: Do the manifolds learned by Generative Adversarial Networks converge to the real data manifold

    Authors: Anupam Chaudhuri, Anj Simmons, Mohamed Abdelrazek

    Abstract: This paper presents our experiments to quantify the manifolds learned by ML models (in our experiment, we use a GAN model) as they train. We compare the manifolds learned at each epoch to the real manifolds representing the real data. To quantify a manifold, we study the intrinsic dimensions and topological features of the manifold learned by the ML model, how these metrics change as we continue t… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.13102

  3. arXiv:2402.11682  [pdf, other

    cs.LG cs.CV

    Learning Conditional Invariances through Non-Commutativity

    Authors: Abhra Chaudhuri, Serban Georgescu, Anjan Dutta

    Abstract: Invariance learning algorithms that conditionally filter out domain-specific random variables as distractors, do so based only on the data semantics, and not the target domain under evaluation. We show that a provably optimal and sample-efficient way of learning conditional invariances is by relaxing the invariance criterion to be non-commutatively directed towards the target domain. Under domain… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: International Conference on Learning Representations (ICLR) 2024

  4. arXiv:2401.10068  [pdf, other

    cs.DC q-bio.QM

    GPU Acceleration of a Conjugate Exponential Model for Cancer Tissue Heterogeneity

    Authors: Anik Chaudhuri, Anwoy Mohanty, Manoranjan Satpathy

    Abstract: Heterogeneity in the cell population of cancer tissues poses many challenges in cancer diagnosis and treatment. Studying the heterogeneity in cell populations from gene expression measurement data in the context of cancer research is a problem of paramount importance. In addition, reducing the computation time of the algorithms that deal with high volumes of data has its obvious merits. Paralleliz… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  5. arXiv:2312.02345  [pdf, other

    cs.CV

    CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis

    Authors: Nityanand Mathur, Shyam Marjit, Abhra Chaudhuri, Anjan Dutta

    Abstract: With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like circles and straight lines. Although existing approaches achieve this by sketch-synthesis-through-optimization, they do so on the space of Bézier curves, which exhibit a waste… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  6. arXiv:2311.13102  [pdf, other

    cs.CL cs.LG math.AT

    Detecting out-of-distribution text using topological features of transformer-based language models

    Authors: Andres Pollano, Anupam Chaudhuri, Anj Simmons

    Abstract: To safeguard machine learning systems that operate on textual data against out-of-distribution (OOD) inputs that could cause unpredictable behaviour, we explore the use of topological features of self-attention maps from transformer-based language models to detect when input text is out of distribution. Self-attention forms the core of transformer-based language models, dynamically assigning vecto… ▽ More

    Submitted 18 July, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 8 pages, 6 figures, 3 tables, to be published in proceedings of the IJCAI-2024 AISafety Workshop

  7. arXiv:2311.10099  [pdf

    cs.CV cs.AI eess.IV

    Smart Traffic Management of Vehicles using Faster R-CNN based Deep Learning Method

    Authors: Arindam Chaudhuri

    Abstract: With constant growth of civilization and modernization of cities all across the world since past few centuries smart traffic management of vehicles is one of the most sorted after problem by research community. It is a challenging problem in computer vision and artificial intelligence domain. Smart traffic management basically involves segmentation of vehicles, estimation of traffic density and tr… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: Book Chapter

  8. arXiv:2311.00176  [pdf, other

    cs.CL

    ChipNeMo: Domain-Adapted LLMs for Chip Design

    Authors: Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Ankit Jindal, Brucek Khailany, George Kokai , et al. (17 additional authors not shown)

    Abstract: ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We e… ▽ More

    Submitted 4 April, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: Updated results for ChipNeMo-70B model

  9. arXiv:2310.15999  [pdf, other

    cs.CV cs.LG

    Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships

    Authors: Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta

    Abstract: Recent advances in fine-grained representation learning leverage local-to-global (emergent) relationships for achieving state-of-the-art results. The relational representations relied upon by such methods, however, are abstract. We aim to deconstruct this abstraction by expressing them as interpretable graphs over image views. We begin by theoretically showing that abstract relational representati… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Neural Information Processing Systems (NeurIPS) 2023

  10. arXiv:2310.01430  [pdf, other

    cs.CL cs.AI

    Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection

    Authors: Swapnil Bhosale, Abhra Chaudhuri, Alex Lee Robert Williams, Divyank Tiwari, Anjan Dutta, Xiatian Zhu, Pushpak Bhattacharyya, Diptesh Kanojia

    Abstract: The introduction of the MUStARD dataset, and its emotion recognition extension MUStARD++, have identified sarcasm to be a multi-modal phenomenon -- expressed not only in natural language text, but also through manners of speech (like tonality and intonation) and visual cues (facial expression). With this work, we aim to perform a rigorous benchmarking of the MUStARD++ dataset by considering state-… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  11. arXiv:2308.12429  [pdf, other

    cs.CE math.OC math.PR

    Predictive Digital Twin for Optimizing Patient-Specific Radiotherapy Regimens under Uncertainty in High-Grade Gliomas

    Authors: Anirban Chaudhuri, Graham Pash, David A. Hormuth II, Guillermo Lorenzo, Michael Kapteyn, Chengyue Wu, Ernesto A. B. F. Lima, Thomas E. Yankeelov, Karen Willcox

    Abstract: We develop a methodology to create data-driven predictive digital twins for optimal risk-aware clinical decision-making. We illustrate the methodology as an enabler for an anticipatory personalized treatment that accounts for uncertainties in the underlying tumor biology in high-grade gliomas, where heterogeneity in the response to standard-of-care (SOC) radiotherapy contributes to sub-optimal pat… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Journal ref: Frontiers in Artificial Intelligence, 6, 2023

  12. arXiv:2303.07775  [pdf, other

    cs.CV

    Data-Free Sketch-Based Image Retrieval

    Authors: Abhra Chaudhuri, Ayan Kumar Bhunia, Yi-Zhe Song, Anjan Dutta

    Abstract: Rising concerns about privacy and anonymity preservation of deep learning models have facilitated research in data-free learning (DFL). For the first time, we identify that for data-scarce tasks like Sketch-Based Image Retrieval (SBIR), where the difficulty in acquiring paired photos and hand-drawn sketches limits data-dependent cross-modal learning algorithms, DFL can prove to be a much more prac… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Computer Vision and Pattern Recognition (CVPR) 2023

  13. arXiv:2212.05933  [pdf, other

    q-fin.ST cs.LG

    Nostradamus: Weathering Worth

    Authors: Alapan Chaudhuri, Zeeshan Ahmed, Ashwin Rao, Shivansh Subramanian, Shreyas Pradhan, Abhishek Mittal

    Abstract: Nostradamus, inspired by the French astrologer and reputed seer, is a detailed study exploring relations between environmental factors and changes in the stock market. In this paper, we analyze associative correlation and causation between environmental elements (including natural disasters, climate and weather conditions) and stock prices, using historical stock market data, historical climate da… ▽ More

    Submitted 17 January, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: 13 pages, 13 figures; updated abstract; updated format to Springer LNCS

  14. A Dynamic Weighted Federated Learning for Android Malware Classification

    Authors: Ayushi Chaudhuri, Arijit Nandi, Buddhadeb Pradhan

    Abstract: Android malware attacks are increasing daily at a tremendous volume, making Android users more vulnerable to cyber-attacks. Researchers have developed many machine learning (ML)/ deep learning (DL) techniques to detect and mitigate android malware attacks. However, due to technological advancement, there is a rise in android mobile devices. Furthermore, the devices are geographically dispersed, re… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted in SoCTA 2022

    Report number: Lecture Notes in Networks and Systems book series (LNNS,volume 627)-978-981-19-9857-7

    Journal ref: 25 April 2023

  15. arXiv:2210.10486  [pdf, other

    cs.CV cs.LG

    Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval

    Authors: Abhra Chaudhuri, Massimiliano Mancini, Yanbei Chen, Zeynep Akata, Anjan Dutta

    Abstract: Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information. As instances from different modalities can often provide complementary information describing the underlying concept, we propose a cross-attention framework for Vision Transformers (XModalViT) that fuses modality-specific information instead of discard… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: British Machine Vision Conference (BMVC) 2022

  16. arXiv:2210.02149  [pdf, other

    cs.CV cs.LG

    Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

    Authors: Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta

    Abstract: Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object. We propose Relational Proxies, a novel approach that leverages the relational information between the global and local views of an object for encoding its semantic label. Star… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Neural Information Processing Systems (NeurIPS) 2022

  17. arXiv:2210.01860  [pdf, other

    cs.LG cs.AI stat.ML

    ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits

    Authors: Arghya Roy Chaudhuri, Pratik Jawanpuria, Bamdev Mishra

    Abstract: In this work, we propose a multi-armed bandit-based framework for identifying a compact set of informative data instances (i.e., the prototypes) from a source dataset $S$ that best represents a given target set $T$. Prototypical examples of a given dataset offer interpretable insights into the underlying data distribution and assist in example-based reasoning, thereby influencing every sphere of h… ▽ More

    Submitted 23 August, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Erratum corrected

  18. arXiv:2112.07096  [pdf, other

    cs.LG cs.CE math.NA

    Learning High-Dimensional Parametric Maps via Reduced Basis Adaptive Residual Networks

    Authors: Thomas O'Leary-Roseberry, Xiaosong Du, Anirban Chaudhuri, Joaquim R. R. A. Martins, Karen Willcox, Omar Ghattas

    Abstract: We propose a scalable framework for the learning of high-dimensional parametric maps via adaptively constructed residual network (ResNet) maps between reduced bases of the inputs and outputs. When just few training data are available, it is beneficial to have a compact parametrization in order to ameliorate the ill-posedness of the neural network training problem. By linearly restricting high-dime… ▽ More

    Submitted 15 November, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

  19. arXiv:2108.06617  [pdf

    cs.CV

    B-Splines

    Authors: Arindam Chaudhuri

    Abstract: BSplines are one of the most promising curves in computer graphics. They are blessed with some superior geometric properties which make them an ideal candidate for several applications in computer aided design industry. In this article, some basic properties of B-Spline curves are presented. Two significant B-Spline properties viz convex hull property and repeated points effects are discussed. The… ▽ More

    Submitted 14 August, 2021; originally announced August 2021.

    Comments: This work is published in Encyclopedia of Computer Graphics and Games

  20. arXiv:2107.07831  [pdf, ps, other

    cs.IR cs.LG

    Modeling User Behaviour in Research Paper Recommendation System

    Authors: Arpita Chaudhuri, Debasis Samanta, Monalisa Sarma

    Abstract: User intention which often changes dynamically is considered to be an important factor for modeling users in the design of recommendation systems. Recent studies are starting to focus on predicting user intention (what users want) beyond user preference (what users like). In this work, a user intention model is proposed based on deep sequential topic analysis. The model predicts a user's intention… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

    Comments: 23 pages

  21. arXiv:2107.05942  [pdf

    cs.CV

    A Novel Deep Learning Method for Thermal to Annotated Thermal-Optical Fused Images

    Authors: Suranjan Goswami, Satish Kumar Singh, and Bidyut B. Chaudhuri

    Abstract: Thermal Images profile the passive radiation of objects and capture them in grayscale images. Such images have a very different distribution of data compared to optical colored images. We present here a work that produces a grayscale thermo-optical fused mask given a thermal input. This is a deep learning based pioneering work since to the best of our knowledge, there exists no other work on therm… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

  22. arXiv:2104.09568  [pdf

    cs.CV

    Detecting Vehicle Type and License Plate Number of different Vehicles on Images

    Authors: Aashna Ahuja, Arindam Chaudhuri

    Abstract: With ever increasing number of vehicles, vehicular tracking is one of the major challenges faced by urban areas. In this paper we try to develop a model that can locate a particular vehicle that the user is looking for depending on two factors 1. the Type of vehicle and the 2. License plate number of the car. The proposed system uses a unique mixture consisting of Mask R-CNN model for vehicle type… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: Present Research Work in Progress

  23. arXiv:2101.05129  [pdf, other

    math.OC cs.CE physics.data-an stat.CO

    Certifiable Risk-Based Engineering Design Optimization

    Authors: Anirban Chaudhuri, Boris Kramer, Matthew Norton, Johannes O. Royset, Karen Willcox

    Abstract: Reliable, risk-averse design of complex engineering systems with optimized performance requires dealing with uncertainties. A conventional approach is to add safety margins to a design that was obtained from deterministic optimization. Safer engineering designs require appropriate cost and constraint function definitions that capture the \textit{risk} associated with unwanted system behavior in th… ▽ More

    Submitted 13 July, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Journal ref: AIAA Journal, 60(2), pp.551-565, 2022

  24. arXiv:2012.07678  [pdf

    cs.CC

    Classifying CELESTE as NP Complete

    Authors: Zeeshan Ahmed, Alapan Chaudhuri, Kunwar Shaanjeet Singh Grover, Ashwin Rao, Kushagra Garg, Pulak Malhotra

    Abstract: We analyze the computational complexity of the video game "CELESTE" and prove that solving a generalized level in it is NP-Complete. Further, we also show how, upon introducing a small change in the game mechanics (adding a new game entity), we can make it PSPACE-complete.

    Submitted 1 December, 2022; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: Keywords: complexity analysis, NP completeness, algorithmic analysis, game analysis

    Journal ref: CST 2022

  25. arXiv:2009.13836  [pdf, other

    cs.CV cs.AI cs.IR cs.LG

    SIR: Similar Image Retrieval for Product Search in E-Commerce

    Authors: Theban Stanley, Nihar Vanjara, Yanxin Pan, Ekaterina Pirogova, Swagata Chakraborty, Abon Chaudhuri

    Abstract: We present a similar image retrieval (SIR) platform that is used to quickly discover visually similar products in a catalog of millions. Given the size, diversity, and dynamism of our catalog, product search poses many challenges. It can be addressed by building supervised models to tagging product images with labels representing themes and later retrieving them by labels. This approach suffices f… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: Accepted in 13th International Conference on Similarity Search and Applications, SISAP 2020

  26. arXiv:2007.08711  [pdf, other

    stat.ML cs.CV cs.LG

    Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data

    Authors: Yu Liang, Arin Chaudhuri, Haoyu Wang

    Abstract: Dimension reduction and visualization of high-dimensional data have become very important research topics because of the rapid growth of large databases in data science. In this paper, we propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional spaces. In particular, the parameter b is introduced to the generalized sigmoid function in low-dime… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  27. arXiv:1910.02497  [pdf, other

    stat.ML cs.LG physics.data-an stat.CO

    mfEGRA: Multifidelity Efficient Global Reliability Analysis through Active Learning for Failure Boundary Location

    Authors: Anirban Chaudhuri, Alexandre N. Marques, Karen E. Willcox

    Abstract: This paper develops mfEGRA, a multifidelity active learning method using data-driven adaptively refined surrogates for failure boundary location in reliability analysis. This work addresses the issue of prohibitive cost of reliability analysis using Monte Carlo sampling for expensive-to-evaluate high-fidelity models by using cheaper-to-evaluate approximations of the high-fidelity model. The method… ▽ More

    Submitted 23 September, 2021; v1 submitted 6 October, 2019; originally announced October 2019.

    MSC Class: 62K05; 62L05; 60G15; 68M15

    Journal ref: Structural and Multidisciplinary Optimization 64, 797-811, 2021

  28. A Visual Technique to Analyze Flow of Information in a Machine Learning System

    Authors: Abon Chaudhuri

    Abstract: Machine learning (ML) algorithms and machine learning based software systems implicitly or explicitly involve complex flow of information between various entities such as training data, feature space, validation set and results. Understanding the statistical distribution of such information and how they flow from one entity to another influence the operation and correctness of such systems, especi… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

    Comments: Published in Visualization and Data Analysis (VDA), part of IS&T Electronic Imaging Symposium 2018

  29. arXiv:1905.02234  [pdf, other

    cs.CV cs.AI cs.LG

    Image Matters: Scalable Detection of Offensive and Non-Compliant Content / Logo in Product Images

    Authors: Shreyansh Gandhi, Samrat Kokkula, Abon Chaudhuri, Alessandro Magnani, Theban Stanley, Behzad Ahmadi, Venkatesh Kandaswamy, Omer Ovenc, Shie Mannor

    Abstract: In e-commerce, product content, especially product images have a significant influence on a customer's journey from product discovery to evaluation and finally, purchase decision. Since many e-commerce retailers sell items from other third-party marketplace sellers besides their own, the content published by both internal and external content creators needs to be monitored and enriched, wherever p… ▽ More

    Submitted 2 August, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: 10 pages

  30. arXiv:1902.07808  [pdf, other

    cs.PL

    Optimizing and Evaluating Transient Gradual Typing

    Authors: Michael M. Vitousek, Jeremy G. Siek, Avik Chaudhuri

    Abstract: Gradual typing enables programmers to combine static and dynamic typing in the same language. However, ensuring a sound interaction between the static and dynamic parts can incur significant runtime cost. In this paper, we perform a detailed performance analysis of the transient gradual typing approach implemented in Reticulated Python, a gradually typed variant of Python. The transient approach i… ▽ More

    Submitted 20 February, 2019; originally announced February 2019.

  31. Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection

    Authors: Zekun Xu, Deovrat Kakde, Arin Chaudhuri

    Abstract: In recent years, there have been many practical applications of anomaly detection such as in predictive maintenance, detection of credit fraud, network intrusion, and system failure. The goal of anomaly detection is to identify in the test data anomalous behaviors that are either rare or unseen in the training data. This is a common goal in predictive maintenance, which aims to forecast the immine… ▽ More

    Submitted 1 February, 2019; originally announced February 2019.

    Comments: 15 pages, 5 figures

  32. arXiv:1901.08387  [pdf, ps, other

    cs.LG cs.AI

    Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

    Authors: Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan

    Abstract: In this paper, we propose a constant word (RAM model) algorithm for regret minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB) instances. Most of the existing regret minimisation algorithms need to remember the statistics of all the arms they encounter. This may become a problem for the cases where the number of available words of memory is limited. Designing an efficient… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

  33. arXiv:1901.08386  [pdf, ps, other

    cs.LG stat.ML

    PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits

    Authors: Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan

    Abstract: We consider the problem of identifying any $k$ out of the best $m$ arms in an $n$-armed stochastic multi-armed bandit. Framed in the PAC setting, this particular problem generalises both the problem of `best subset selection' and that of selecting `one out of the best m' arms [arcsk 2017]. In applications such as crowd-sourcing and drug-designing, identifying a single good solution is often not su… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

  34. arXiv:1811.07996  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    A Smart System for Selection of Optimal Product Images in E-Commerce

    Authors: Abon Chaudhuri, Paolo Messina, Samrat Kokkula, Aditya Subramanian, Abhinandan Krishnan, Shreyansh Gandhi, Alessandro Magnani, Venkatesh Kandaswamy

    Abstract: In e-commerce, content quality of the product catalog plays a key role in delivering a satisfactory experience to the customers. In particular, visual content such as product images influences customers' engagement and purchase decisions. With the rapid growth of e-commerce and the advent of artificial intelligence, traditional content management systems are giving way to automated scalable system… ▽ More

    Submitted 11 November, 2018; originally announced November 2018.

    Comments: Accepted in IEEE Big Data Conference 2018 (Industry & Government Track)

  35. arXiv:1811.06838  [pdf, other

    stat.ML cs.LG math.NA

    The Trace Criterion for Kernel Bandwidth Selection for Support Vector Data Description

    Authors: Arin Chaudhuri, Carol Sadek, Deovrat Kakde, Wenhao Hu, Hansi Jiang, Seunghyun Kong, Yuewei Liao, Sergiy Peredriy, Haoyu Wang

    Abstract: Support vector data description (SVDD) is a popular anomaly detection technique. The SVDD classifier partitions the whole data space into an inlier region, which consists of the region near the training data, and an outlier region, which consists of points away from the training data. The computation of the SVDD classifier requires a kernel function, for which the Gaussian kernel is a common choic… ▽ More

    Submitted 5 February, 2020; v1 submitted 15 November, 2018; originally announced November 2018.

    Comments: note: some text overlap with arXiv:1708.05106 because common background material is covered in both papers

  36. arXiv:1811.05561  [pdf, other

    stat.AP cs.LG stat.ML

    A New SVDD-Based Multivariate Non-parametric Process Capability Index

    Authors: Deovrat Kakde, Arin Chaudhuri, Diana Shaw

    Abstract: Process capability index (PCI) is a commonly used statistic to measure ability of a process to operate within the given specifications or to produce products which meet the required quality specifications. PCI can be univariate or multivariate depending upon the number of process specifications or quality characteristics of interest. Most PCIs make distributional assumptions which are often unreal… ▽ More

    Submitted 13 November, 2018; originally announced November 2018.

  37. A fast algorithm for computing distance correlation

    Authors: Arin Chaudhuri, Wenhao Hu

    Abstract: Classical dependence measures such as Pearson correlation, Spearman's $ρ$, and Kendall's $τ$ can detect only monotonic or linear dependence. To overcome these limitations, Szekely et al.(2007) proposed distance covariance as a weighted $L_2$ distance between the joint characteristic function and the product of marginal distributions. The distance covariance is $0$ if and only if two random vectors… ▽ More

    Submitted 15 November, 2018; v1 submitted 26 October, 2018; originally announced October 2018.

    MSC Class: 62H20; 68P10 ACM Class: F.2.2; G.3

  38. arXiv:1806.09612  [pdf

    cs.AI

    Predictive Maintenance for Industrial IoT of Vehicle Fleets using Hierarchical Modified Fuzzy Support Vector Machine

    Authors: Arindam Chaudhuri

    Abstract: Connected vehicle fleets are deployed worldwide in several industrial IoT scenarios. With the gradual increase of machines being controlled and managed through networked smart devices, the predictive maintenance potential grows rapidly. Predictive maintenance has the potential of optimizing uptime as well as performance such that time and labor associated with inspections and preventive maintenanc… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: Research work done at Samsung R & D Institute Delhi India

  39. arXiv:1709.00139  [pdf, other

    stat.ML cs.LG

    Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel

    Authors: Hansi Jiang, Haoyu Wang, Wenhao Hu, Deovrat Kakde, Arin Chaudhuri

    Abstract: Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the… ▽ More

    Submitted 1 November, 2018; v1 submitted 31 August, 2017; originally announced September 2017.

    Comments: 18 pages, 1 table, 4 figures

  40. arXiv:1708.08021  [pdf, other

    cs.PL

    Fast and Precise Type Checking for JavaScript

    Authors: Avik Chaudhuri, Panagiotis Vekris, Sam Goldman, Marshall Roch, Gabriel Levi

    Abstract: In this paper we present the design and implementation of Flow, a fast and precise type checker for JavaScript that is used by thousands of developers on millions of lines of code at Facebook every day. Flow uses sophisticated type inference to understand common JavaScript idioms precisely. This helps it find non-trivial bugs in code and provide code intelligence to editors without requiring signi… ▽ More

    Submitted 30 August, 2017; v1 submitted 26 August, 2017; originally announced August 2017.

  41. arXiv:1708.05106  [pdf, other

    cs.LG cs.AI stat.ML

    The Mean and Median Criterion for Automatic Kernel Bandwidth Selection for Support Vector Data Description

    Authors: Arin Chaudhuri, Deovrat Kakde, Carol Sadek, Laura Gonzalez, Seunghyun Kong

    Abstract: Support vector data description (SVDD) is a popular technique for detecting anomalies. The SVDD classifier partitions the whole space into an inlier region, which consists of the region near the training data, and an outlier region, which consists of points away from the training data. The computation of the SVDD classifier requires a kernel function, and the Gaussian kernel is a common choice for… ▽ More

    Submitted 21 August, 2017; v1 submitted 16 August, 2017; originally announced August 2017.

    ACM Class: I.2.7

  42. Convergence Analysis of Backpropagation Algorithm for Designing an Intelligent System for Sensing Manhole Gases

    Authors: Varun Kumar Ojha, Paramartha Dutta, Atal Chaudhuri, Hiranmay Saha

    Abstract: Human fatalities are reported due to the excessive proportional presence of hazardous gas components in the manhole, such as Hydrogen Sulfide, Ammonia, Methane, Carbon Dioxide, Nitrogen Oxide, Carbon Monoxide, etc. Hence, predetermination of these gases is imperative. A neural network (NN) based intelligent sensory system is proposed for the avoidance of such fatalities. Backpropagation (BP) was a… ▽ More

    Submitted 6 July, 2017; originally announced July 2017.

    Journal ref: Hybrid Soft Computing Approaches (2015) pp 215-236

  43. Identifying hazardousness of sewer pipeline gas mixture using classification methods: a comparative study

    Authors: Varun Kumar Ojha, Paramartha Dutta, Atal Chaudhuri

    Abstract: In this work, we formulated a real-world problem related to sewer pipeline gas detection using the classification-based approaches. The primary goal of this work was to identify the hazardousness of sewer pipeline to offer safe and non-hazardous access to sewer pipeline workers so that the human fatalities, which occurs due to the toxic exposure of sewer gas components, can be avoided. The dataset… ▽ More

    Submitted 16 May, 2017; originally announced July 2017.

    Journal ref: Neural Comput & Applic (2017) 28: 1343

  44. arXiv:1702.05698  [pdf, ps, other

    cs.LG cs.CV stat.AP stat.CO stat.ML

    Online Robust Principal Component Analysis with Change Point Detection

    Authors: Wei Xiao, Xiaolin Huang, Jorge Silva, Saba Emrani, Arin Chaudhuri

    Abstract: Robust PCA methods are typically batch algorithms which requires loading all observations into memory before processing. This makes them inefficient to process big data. In this paper, we develop an efficient online robust principal component methods, namely online moving window robust principal component analysis (OMWRPCA). Unlike existing algorithms, OMWRPCA can successfully track not only slowl… ▽ More

    Submitted 20 March, 2017; v1 submitted 18 February, 2017; originally announced February 2017.

  45. Kernel Bandwidth Selection for SVDD: Peak Criterion Approach for Large Data

    Authors: Sergiy Peredriy, Deovrat Kakde, Arin Chaudhuri

    Abstract: Support Vector Data Description (SVDD) provides a useful approach to construct a description of multivariate data for single-class classification and outlier detection with various practical applications. Gaussian kernel used in SVDD formulation allows flexible data description defined by observations designated as support vectors. The data boundary of such description is non-spherical and conform… ▽ More

    Submitted 19 May, 2017; v1 submitted 31 October, 2016; originally announced November 2016.

    MSC Class: 68T10; 62H99; 65Y20; 68T05 ACM Class: G.3; G.4; I.2.6

  46. arXiv:1610.09455  [pdf

    cs.CV

    Selective De-noising of Sparse-Coloured Images

    Authors: Arjun Chaudhuri

    Abstract: Since time immemorial, noise has been a constant source of disturbance to the various entities known to mankind. Noise models of different kinds have been developed to study noise in more detailed fashion over the years. Image processing, particularly, has extensively implemented several algorithms to reduce noise in photographs and pictorial documents to alleviate the effect of noise. Images with… ▽ More

    Submitted 29 October, 2016; originally announced October 2016.

    Comments: 4 pages, 5 figures, International Journal of Computer Science and Information Technologies, ISSN: 0975-9646, March-April, 2016, Website: http://www.ijcsit.com/

  47. arXiv:1607.07745  [pdf

    cs.AI stat.AP stat.ME stat.ML

    Leveraging Unstructured Data to Detect Emerging Reliability Issues

    Authors: Deovrat Kakde, Arin Chaudhuri

    Abstract: Unstructured data refers to information that does not have a predefined data model or is not organized in a pre-defined manner. Loosely speaking, unstructured data refers to text data that is generated by humans. In after-sales service businesses, there are two main sources of unstructured data: customer complaints, which generally describe symptoms, and technician comments, which outline diagnost… ▽ More

    Submitted 26 July, 2016; originally announced July 2016.

  48. arXiv:1607.07423  [pdf

    cs.LG stat.AP stat.ME stat.ML

    A Non-Parametric Control Chart For High Frequency Multivariate Data

    Authors: Deovrat Kakde, Sergriy Peredriy, Arin Chaudhuri, Anya Mcguirk

    Abstract: Support Vector Data Description (SVDD) is a machine learning technique used for single class classification and outlier detection. SVDD based K-chart was first introduced by Sun and Tsung for monitoring multivariate processes when underlying distribution of process parameters or quality characteristics depart from Normality. The method first trains a SVDD model on data obtained from stable or in-c… ▽ More

    Submitted 29 July, 2016; v1 submitted 25 July, 2016; originally announced July 2016.

    MSC Class: 62N05; 90B25 ACM Class: G.3; H.2.8

  49. arXiv:1606.05382  [pdf, other

    cs.LG stat.AP stat.ML

    Sampling Method for Fast Training of Support Vector Data Description

    Authors: Arin Chaudhuri, Deovrat Kakde, Maria Jahja, Wei Xiao, Hansi Jiang, Seunghyun Kong, Sergiy Peredriy

    Abstract: Support Vector Data Description (SVDD) is a popular outlier detection technique which constructs a flexible description of the input data. SVDD computation time is high for large training datasets which limits its use in big-data process-monitoring applications. We propose a new iterative sampling-based method for SVDD training. The method incrementally learns the training data description at each… ▽ More

    Submitted 25 September, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

  50. arXiv:1602.05257  [pdf, other

    cs.LG stat.AP stat.ML

    Peak Criterion for Choosing Gaussian Kernel Bandwidth in Support Vector Data Description

    Authors: Deovrat Kakde, Arin Chaudhuri, Seunghyun Kong, Maria Jahja, Hansi Jiang, Jorge Silva

    Abstract: Support Vector Data Description (SVDD) is a machine-learning technique used for single class classification and outlier detection. SVDD formulation with kernel function provides a flexible boundary around data. The value of kernel function parameters affects the nature of the data boundary. For example, it is observed that with a Gaussian kernel, as the value of kernel bandwidth is lowered, the da… ▽ More

    Submitted 8 August, 2017; v1 submitted 16 February, 2016; originally announced February 2016.