Search | arXiv e-print repository

Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs

Authors: Tamzeed Mahfuz, Satak Kumar Dey, Ruwad Naswan, Hasnaen Adil, Khondker Salman Sayeed, Haz Sameen Shahgir

Abstract: Each new generation of English-oriented Large Language Models (LLMs) exhibits enhanced cross-lingual transfer capabilities and significantly outperforms older LLMs on low-resource languages. This prompts the question: Is there a need for LLMs dedicated to a particular low-resource language? We aim to explore this question for Bengali, a low-to-moderate resource Indo-Aryan language native to the Be… ▽ More Each new generation of English-oriented Large Language Models (LLMs) exhibits enhanced cross-lingual transfer capabilities and significantly outperforms older LLMs on low-resource languages. This prompts the question: Is there a need for LLMs dedicated to a particular low-resource language? We aim to explore this question for Bengali, a low-to-moderate resource Indo-Aryan language native to the Bengal region of South Asia. We compare the performance of open-weight and closed-source LLMs such as LLaMA-3 and GPT-4 against fine-tuned encoder-decoder models across a diverse set of Bengali downstream tasks, including translation, summarization, paraphrasing, question-answering, and natural language inference. Our findings reveal that while LLMs generally excel in reasoning tasks, their performance in tasks requiring Bengali script generation is inconsistent. Key challenges include inefficient tokenization of Bengali script by existing LLMs, leading to increased computational costs and potential performance degradation. Additionally, we highlight biases in machine-translated datasets commonly used for Bengali NLP tasks. We conclude that there is a significant need for a Bengali-oriented LLM, but the field currently lacks the high-quality pretraining and instruction-tuning datasets necessary to develop a highly effective model. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.15953 [pdf, other]

Exploring the Influence of Online Videos on Parents or Caregivers of Children with Developmental Delays

Authors: Saquib Ahmed, Md Nazmus Sakib, Sanorita Dey

Abstract: Developmental Delays and Disabilities (DDDs) refer to conditions where children are slower or unable to reach developmental milestones compared to typically developing children. This can cause significant stress for parents, leading to social isolation and loneliness. Online videos, particularly those on YouTube, aim to support these parents and caregivers by offering guidance and assistance. Stud… ▽ More Developmental Delays and Disabilities (DDDs) refer to conditions where children are slower or unable to reach developmental milestones compared to typically developing children. This can cause significant stress for parents, leading to social isolation and loneliness. Online videos, particularly those on YouTube, aim to support these parents and caregivers by offering guidance and assistance. Studies show that parents of children with DDDs create videos on YouTube to enhance authenticity and build connections. However, there is limited knowledge about how other parents with children with DDDs perceive and are impacted by these videos. Our study used a mixed-method approach to annotate and analyze more than fifteen hundred YouTube videos on children's DDDs. We found that these videos provide crucial informational content and offer mental and emotional support through shared personal experiences. Comments analysis revealed a strong sense of community among YouTubers and viewers. Interviews with parents of children with DDDs showed that they find these videos relatable and essential for managing their children's diagnosis and treatments. We concluded by discussing platform-centric design implications for supporting parents and other caregivers of children with DDDs. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.04360 [pdf, other]

Size biased Multinomial Modelling of detection data in Software testing

Authors: Pallabi Ghosh, Ashis Kr. Chakraborty, Soumen Dey

Abstract: Estimation of software reliability often poses a considerable challenge, particularly for critical softwares. Several methods of estimation of reliability of software are already available in the literature. But, so far almost nobody used the concept of size of a bug for estimating software reliability. In this article we make used of the bug size or the eventual bug size which helps us to determi… ▽ More Estimation of software reliability often poses a considerable challenge, particularly for critical softwares. Several methods of estimation of reliability of software are already available in the literature. But, so far almost nobody used the concept of size of a bug for estimating software reliability. In this article we make used of the bug size or the eventual bug size which helps us to determine reliability of software more precisely. The size-biased model developed here can also be used for similar fields like hydrocarbon exploration. The model has been validated through simulation and subsequently used for a critical space application software testing data. The estimated results match the actual observations to a large extent. △ Less

Submitted 24 May, 2024; originally announced June 2024.

Comments: Submitted to OPSEARCH

arXiv:2405.04757 [pdf, other]

Communication-efficient and Differentially-private Distributed Nash Equilibrium Seeking with Linear Convergence

Authors: Xiaomeng Chen, Wei Huo, Kemi Ding, Subhrakanti Dey, Ling Shi

Abstract: The distributed computation of a Nash equilibrium (NE) for non-cooperative games is gaining increased attention recently. Due to the nature of distributed systems, privacy and communication efficiency are two critical concerns. Traditional approaches often address these critical concerns in isolation. This work introduces a unified framework, named CDP-NES, designed to improve communication effici… ▽ More The distributed computation of a Nash equilibrium (NE) for non-cooperative games is gaining increased attention recently. Due to the nature of distributed systems, privacy and communication efficiency are two critical concerns. Traditional approaches often address these critical concerns in isolation. This work introduces a unified framework, named CDP-NES, designed to improve communication efficiency in the privacy-preserving NE seeking algorithm for distributed non-cooperative games over directed graphs. Leveraging both general compression operators and the noise adding mechanism, CDP-NES perturbs local states with Laplacian noise and applies difference compression prior to their exchange among neighbors. We prove that CDP-NES not only achieves linear convergence to a neighborhood of the NE in games with restricted monotone mappings but also guarantees $ε$-differential privacy, addressing privacy and communication efficiency simultaneously. Finally, simulations are provided to illustrate the effectiveness of the proposed method. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03106 [pdf, other]

Compression-based Privacy Preservation for Distributed Nash Equilibrium Seeking in Aggregative Games

Authors: Wei Huo, Xiaomeng Chen, Kemi Ding, Subhrakanti Dey, Ling Shi

Abstract: This paper explores distributed aggregative games in multi-agent systems. Current methods for finding distributed Nash equilibrium require players to send original messages to their neighbors, leading to communication burden and privacy issues. To jointly address these issues, we propose an algorithm that uses stochastic compression to save communication resources and conceal information through r… ▽ More This paper explores distributed aggregative games in multi-agent systems. Current methods for finding distributed Nash equilibrium require players to send original messages to their neighbors, leading to communication burden and privacy issues. To jointly address these issues, we propose an algorithm that uses stochastic compression to save communication resources and conceal information through random errors induced by compression. Our theoretical analysis shows that the algorithm guarantees convergence accuracy, even with aggressive compression errors used to protect privacy. We prove that the algorithm achieves differential privacy through a stochastic quantization scheme. Simulation results for energy consumption games support the effectiveness of our approach. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.01114 [pdf, other]

Continual Imitation Learning for Prosthetic Limbs

Authors: Sharmita Dey, Benjamin Paassen, Sarath Ravindran Nair, Sabri Boughorbel, Arndt F. Schilling

Abstract: Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. Motorized bionic limbs offer promise, but their utility depends on mimicking the evolving synergy of human movement in various settings. In this context, we present a novel model for bionic prostheses' application that leverages camera-based motion capture an… ▽ More Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. Motorized bionic limbs offer promise, but their utility depends on mimicking the evolving synergy of human movement in various settings. In this context, we present a novel model for bionic prostheses' application that leverages camera-based motion capture and wearable sensor data, to learn the synergistic coupling of the lower limbs during human locomotion, empowering it to infer the kinematic behavior of a missing lower limb across varied tasks, such as climbing inclines and stairs. We propose a model that can multitask, adapt continually, anticipate movements, and refine. The core of our method lies in an approach which we call -- multitask prospective rehearsal -- that anticipates and synthesizes future movements based on the previous prediction and employs a corrective mechanism for subsequent predictions. We design an evolving architecture that merges lightweight, task-specific modules on a shared backbone, ensuring both specificity and scalability. We empirically validate our model against various baselines using real-world human gait datasets, including experiments with transtibial amputees, which encompass a broad spectrum of locomotion tasks. The results show that our approach consistently outperforms baseline models, particularly under scenarios affected by distributional shifts, adversarial perturbations, and noise. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.16997 [pdf, ps, other]

Probabilistic Interval Analysis of Unreliable Programs

Authors: Dibyendu Das, Soumyajit Dey

Abstract: Advancement of chip technology will make future computer chips faster. Power consumption of such chips shall also decrease. But this speed gain shall not come free of cost, there is going to be a trade-off between speed and efficiency, i.e accuracy of the computation. In order to achieve this extra speed we will simply have to let our computers make more mistakes in computations. Consequently, sys… ▽ More Advancement of chip technology will make future computer chips faster. Power consumption of such chips shall also decrease. But this speed gain shall not come free of cost, there is going to be a trade-off between speed and efficiency, i.e accuracy of the computation. In order to achieve this extra speed we will simply have to let our computers make more mistakes in computations. Consequently, systems built with these type of chips will possess an innate unreliability lying within. Programs written for these systems will also have to incorporate this unreliability. Researchers have already started developing programming frameworks for unreliable architectures as such. In the present work, we use a restricted version of C-type languages to model the programs written for unreliable architectures. We propose a technique for statically analyzing codes written for these kind of architectures. Our technique, which primarily focuses on Interval/Range Analysis of this type of programs, uses the well established theory of abstract interpretation. While discussing unreliability of hardware, there comes scope of failure of the hardware components implicitly. There are two types of failure models, namely: 1) permanent failure model, where the hardware stops execution on failure and 2) transient failure model, where on failure, the hardware continues subsequent operations with wrong operand values. In this paper, we've only taken transient failure model into consideration. The goal of this analysis is to predict the probability with which a program variable assumes values from a given range at a given program point. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.12415 [pdf]

Soil Fertility Prediction Using Combined USB-microscope Based Soil Image, Auxiliary Variables, and Portable X-Ray Fluorescence Spectrometry

Authors: Shubhadip Dasgupta, Satwik Pate, Divya Rathore, L. G. Divyanth, Ayan Das, Anshuman Nayak, Subhadip Dey, Asim Biswas, David C. Weindorf, Bin Li, Sergio Henrique Godinho Silva, Bruno Teixeira Ribeiro, Sanjay Srivastava, Somsubhra Chakraborty

Abstract: This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and t… ▽ More This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and texture features from microscopic soil images, PXRF data, and auxiliary soil variables (AVs) using a Random Forest model. Results indicated that integrating image features (IFs) with auxiliary variables (AVs) significantly enhanced prediction accuracy for available B (R^2 = 0.80) and OC (R^2 = 0.88). A data fusion approach, incorporating IFs, AVs, and PXRF data, further improved predictions for available Mn and SAI with R^2 values of 0.72 and 0.70, respectively. The study demonstrated how these integrated technologies have the potential to provide quick and affordable options for soil testing, opening up access to more sophisticated prediction models and a better comprehension of the fertility and health of the soil. Future research should focus on the application of deep learning models on a larger dataset of soil images, developed using soils from a broader range of agro-climatic zones under field condition. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 37 pages, 10 figures; manuscript under peer-review for publication in the jounral 'Computers and Electronics in Agriculture'

arXiv:2404.00471 [pdf, other]

doi 10.1109/ICASSP48485.2024.10447579

Score-Based Diffusion Models for Photoacoustic Tomography Image Reconstruction

Authors: Sreemanti Dey, Snigdha Saha, Berthy T. Feng, Manxiu Cui, Laure Delisle, Oscar Leong, Lihong V. Wang, Katherine L. Bouman

Abstract: Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use… ▽ More Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use score-based diffusion models to solve the inverse problem of reconstructing an image from limited PAT measurements. The proposed approach allows us to incorporate an expressive prior learned by a diffusion model on simulated vessel structures while still being robust to varying transducer sparsity conditions. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 5 pages

Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 2470-2474

arXiv:2403.10885 [pdf, other]

Could We Generate Cytology Images from Histopathology Images? An Empirical Study

Authors: Soumyajyoti Dey, Sukanta Chakraborty, Utso Guha Roy, Nibaran Das

Abstract: Automation in medical imaging is quite challenging due to the unavailability of annotated datasets and the scarcity of domain experts. In recent years, deep learning techniques have solved some complex medical imaging tasks like disease classification, important object localization, segmentation, etc. However, most of the task requires a large amount of annotated data for their successful implemen… ▽ More Automation in medical imaging is quite challenging due to the unavailability of annotated datasets and the scarcity of domain experts. In recent years, deep learning techniques have solved some complex medical imaging tasks like disease classification, important object localization, segmentation, etc. However, most of the task requires a large amount of annotated data for their successful implementation. To mitigate the shortage of data, different generative models are proposed for data augmentation purposes which can boost the classification performances. For this, different synthetic medical image data generation models are developed to increase the dataset. Unpaired image-to-image translation models here shift the source domain to the target domain. In the breast malignancy identification domain, FNAC is one of the low-cost low-invasive modalities normally used by medical practitioners. But availability of public datasets in this domain is very poor. Whereas, for automation of cytology images, we need a large amount of annotated data. Therefore synthetic cytology images are generated by translating breast histopathology samples which are publicly available. In this study, we have explored traditional image-to-image transfer models like CycleGAN, and Neural Style Transfer. Further, it is observed that the generated cytology images are quite similar to real breast cytology samples by measuring FID and KID scores. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: Accept at International Conference on Advanced Computing and Applications(ICACA-2024)

arXiv:2403.10884 [pdf, other]

Fuzzy Rank-based Late Fusion Technique for Cytology image Segmentation

Authors: Soumyajyoti Dey, Sukanta Chakraborty, Utso Guha Roy, Nibaran Das

Abstract: Cytology image segmentation is quite challenging due to its complex cellular structure and multiple overlapping regions. On the other hand, for supervised machine learning techniques, we need a large amount of annotated data, which is costly. In recent years, late fusion techniques have given some promising performances in the field of image classification. In this paper, we have explored a fuzzy-… ▽ More Cytology image segmentation is quite challenging due to its complex cellular structure and multiple overlapping regions. On the other hand, for supervised machine learning techniques, we need a large amount of annotated data, which is costly. In recent years, late fusion techniques have given some promising performances in the field of image classification. In this paper, we have explored a fuzzy-based late fusion techniques for cytology image segmentation. This fusion rule integrates three traditional semantic segmentation models UNet, SegNet, and PSPNet. The technique is applied on two cytology image datasets, i.e., cervical cytology(HErlev) and breast cytology(JUCYT-v1) image datasets. We have achieved maximum MeanIoU score 84.27% and 83.79% on the HErlev dataset and JUCYT-v1 dataset after the proposed late fusion technique, respectively which are better than that of the traditional fusion rules such as average probability, geometric mean, Borda Count, etc. The codes of the proposed model are available on GitHub. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: Accept at International Conference on Data, Electronics and Computing (ICDEC-2023)

arXiv:2403.06569 [pdf, other]

Enhancing Joint Motion Prediction for Individuals with Limb Loss Through Model Reprogramming

Authors: Sharmita Dey, Sarath R. Nair

Abstract: Mobility impairment caused by limb loss is a significant challenge faced by millions of individuals worldwide. The development of advanced assistive technologies, such as prosthetic devices, has the potential to greatly improve the quality of life for amputee patients. A critical component in the design of such technologies is the accurate prediction of reference joint motion for the missing limb.… ▽ More Mobility impairment caused by limb loss is a significant challenge faced by millions of individuals worldwide. The development of advanced assistive technologies, such as prosthetic devices, has the potential to greatly improve the quality of life for amputee patients. A critical component in the design of such technologies is the accurate prediction of reference joint motion for the missing limb. However, this task is hindered by the scarcity of joint motion data available for amputee patients, in contrast to the substantial quantity of data from able-bodied subjects. To overcome this, we leverage deep learning's reprogramming property to repurpose well-trained models for a new goal without altering the model parameters. With only data-level manipulation, we adapt models originally designed for able-bodied people to forecast joint motion in amputees. The findings in this study have significant implications for advancing assistive tech and amputee mobility. △ Less

Submitted 12 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Journal ref: ICLR 2024 Workshop: Learning from Time Series for Health

arXiv:2403.04974 [pdf, other]

Embracing Large Language and Multimodal Models for Prosthetic Technologies

Authors: Sharmita Dey, Arndt F. Schilling

Abstract: This article presents a vision for the future of prosthetic devices, leveraging the advancements in large language models (LLMs) and Large Multimodal Models (LMMs) to revolutionize the interaction between humans and assistive technologies. Unlike traditional prostheses, which rely on limited and predefined commands, this approach aims to develop intelligent prostheses that understand and respond t… ▽ More This article presents a vision for the future of prosthetic devices, leveraging the advancements in large language models (LLMs) and Large Multimodal Models (LMMs) to revolutionize the interaction between humans and assistive technologies. Unlike traditional prostheses, which rely on limited and predefined commands, this approach aims to develop intelligent prostheses that understand and respond to users' needs through natural language and multimodal inputs. The realization of this vision involves developing a control system capable of understanding and translating a wide array of natural language and multimodal inputs into actionable commands for prosthetic devices. This includes the creation of models that can extract and interpret features from both textual and multimodal data, ensuring devices not only follow user commands but also respond intelligently to the environment and user intent, thus marking a significant leap forward in prosthetic technology. △ Less

Submitted 11 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.14889 [pdf]

COBIAS: Contextual Reliability in Bias Assessment

Authors: Priyanshul Govil, Hemang Jain, Vamshi Krishna Bonagiri, Aman Chadha, Ponnurangam Kumaraguru, Manas Gaur, Sanorita Dey

Abstract: Large Language Models (LLMs) are trained on extensive web corpora, which enable them to understand and generate human-like text. However, this training process also results in inherent biases within the models. These biases arise from web data's diverse and often uncurated nature, containing various stereotypes and prejudices. Previous works on debiasing models rely on benchmark datasets to measur… ▽ More Large Language Models (LLMs) are trained on extensive web corpora, which enable them to understand and generate human-like text. However, this training process also results in inherent biases within the models. These biases arise from web data's diverse and often uncurated nature, containing various stereotypes and prejudices. Previous works on debiasing models rely on benchmark datasets to measure their method's performance. However, these datasets suffer from several pitfalls due to the highly subjective understanding of bias, highlighting a critical need for contextual exploration. We propose understanding the context of inputs by considering the diverse situations in which they may arise. Our contribution is two-fold: (i) we augment 2,291 stereotyped statements from two existing bias-benchmark datasets with points for adding context; (ii) we develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to assess a statement's contextual reliability in measuring bias. Our metric aligns with human judgment on contextual reliability of statements (Spearman's $ρ= 0.65, p = 3.4 * 10^{-60}$) and can be used to create reliable datasets, which would assist bias mitigation works. △ Less

Submitted 17 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.11955 [pdf, other]

Analysis of Multidomain Abstractive Summarization Using Salience Allocation

Authors: Tohida Rehman, Raghubir Bose, Soumik Dey, Samiran Chattopadhyay

Abstract: This paper explores the realm of abstractive text summarization through the lens of the SEASON (Salience Allocation as Guidance for Abstractive SummarizatiON) technique, a model designed to enhance summarization by leveraging salience allocation techniques. The study evaluates SEASON's efficacy by comparing it with prominent models like BART, PEGASUS, and ProphetNet, all fine-tuned for various tex… ▽ More This paper explores the realm of abstractive text summarization through the lens of the SEASON (Salience Allocation as Guidance for Abstractive SummarizatiON) technique, a model designed to enhance summarization by leveraging salience allocation techniques. The study evaluates SEASON's efficacy by comparing it with prominent models like BART, PEGASUS, and ProphetNet, all fine-tuned for various text summarization tasks. The assessment is conducted using diverse datasets including CNN/Dailymail, SAMSum, and Financial-news based Event-Driven Trading (EDT), with a specific focus on a financial dataset containing a substantial volume of news articles from 2020/03/01 to 2021/05/06. This paper employs various evaluation metrics such as ROUGE, METEOR, BERTScore, and MoverScore to evaluate the performance of these models fine-tuned for generating abstractive summaries. The analysis of these metrics offers a thorough insight into the strengths and weaknesses demonstrated by each model in summarizing news dataset, dialogue dataset and financial text dataset. The results presented in this paper not only contribute to the evaluation of the SEASON model's effectiveness but also illuminate the intricacies of salience allocation techniques across various types of datasets. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 11 pages, 1 figure, 4 tables

arXiv:2401.16424 [pdf, other]

Computer Vision for Primate Behavior Analysis in the Wild

Authors: Richard Vogg, Timo Lüddecke, Jonathan Henrich, Sharmita Dey, Matthias Nuske, Valentin Hassler, Derek Murphy, Julia Fischer, Julia Ostner, Oliver Schülke, Peter M. Kappeler, Claudia Fichtel, Alexander Gail, Stefan Treue, Hansjörg Scherberger, Florentin Wörgötter, Alexander S. Ecker

Abstract: Advances in computer vision as well as increasingly widespread video-based behavioral monitoring have great potential for transforming how we study animal cognition and behavior. However, there is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today, especially in videos from the wild. With this perspective paper, we want to contribute towards… ▽ More Advances in computer vision as well as increasingly widespread video-based behavioral monitoring have great potential for transforming how we study animal cognition and behavior. However, there is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today, especially in videos from the wild. With this perspective paper, we want to contribute towards closing this gap, by guiding behavioral scientists in what can be expected from current methods and steering computer vision researchers towards problems that are relevant to advance research in animal behavior. We start with a survey of the state-of-the-art methods for computer vision problems that are directly relevant to the video-based study of animal behavior, including object detection, multi-individual tracking, (inter)action recognition and individual identification. We then review methods for effort-efficient learning, which is one of the biggest challenges from a practical perspective. Finally, we close with an outlook into the future of the emerging field of computer vision for animal behavior, where we argue that the field should move fast beyond the common frame-by-frame processing and treat video as a first-class citizen. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.11283 [pdf, other]

The 2010 Census Confidentiality Protections Failed, Here's How and Why

Authors: John M. Abowd, Tamara Adams, Robert Ashmead, David Darais, Sourya Dey, Simson L. Garfinkel, Nathan Goldschlag, Daniel Kifer, Philip Leclerc, Ethan Lew, Scott Moore, Rolando A. Rodríguez, Ramy N. Tadros, Lars Vilhuber

Abstract: Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can veri… ▽ More Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.00507 [pdf, other]

VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity

Authors: S. VenkataKeerthy, Soumya Banerjee, Sayan Dey, Yashas Andaluri, Raghul PS, Subrahmanyam Kalyanasundaram, Fernando Magno Quintão Pereira, Ramakrishna Upadrasta

Abstract: Binary similarity involves determining whether two binary programs exhibit similar functionality, often originating from the same source code. In this work, we propose VexIR2Vec, an approach for binary similarity using VEX-IR, an architecture-neutral Intermediate Representation (IR). We extract the embeddings from sequences of basic blocks, termed peepholes, derived by random walks on the control-… ▽ More Binary similarity involves determining whether two binary programs exhibit similar functionality, often originating from the same source code. In this work, we propose VexIR2Vec, an approach for binary similarity using VEX-IR, an architecture-neutral Intermediate Representation (IR). We extract the embeddings from sequences of basic blocks, termed peepholes, derived by random walks on the control-flow graph. The peepholes are normalized using transformations inspired by compiler optimizations. The VEX-IR Normalization Engine mitigates, with these transformations, the architectural and compiler-induced variations in binaries while exposing semantic similarities. We then learn the vocabulary of representations at the entity level of the IR using the knowledge graph embedding techniques in an unsupervised manner. This vocabulary is used to derive function embeddings for similarity assessment using VexNet, a feed-forward Siamese network designed to position similar functions closely and separate dissimilar ones in an n-dimensional space. This approach is amenable for both diffing and searching tasks, ensuring robustness against Out-Of-Vocabulary (OOV) issues. We evaluate VexIR2Vec on a dataset comprising 2.7M functions and 15.5K binaries from 7 projects compiled across 12 compilers targeting x86 and ARM architectures. In diffing experiments, VexIR2Vec outperforms the nearest baselines by $40\%$, $18\%$, $21\%$, and $60\%$ in cross-optimization, cross-compilation, cross-architecture, and obfuscation settings, respectively. In the searching experiment, VexIR2Vec achieves a mean average precision of $0.76$, outperforming the nearest baseline by $46\%$. Our framework is highly scalable and is built as a lightweight, multi-threaded, parallel library using only open-source tools. VexIR2Vec is $3.1$-$3.5 \times$ faster than the closest baselines and orders-of-magnitude faster than other tools. △ Less

Submitted 9 July, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.05870 [pdf]

Automated Heterogeneous Low-Bit Quantization of Multi-Model Deep Learning Inference Pipeline

Authors: Jayeeta Mondal, Swarnava Dey, Arijit Mukherjee

Abstract: Multiple Deep Neural Networks (DNNs) integrated into single Deep Learning (DL) inference pipelines e.g. Multi-Task Learning (MTL) or Ensemble Learning (EL), etc., albeit very accurate, pose challenges for edge deployment. In these systems, models vary in their quantization tolerance and resource demands, requiring meticulous tuning for accuracy-latency balance. This paper introduces an automated h… ▽ More Multiple Deep Neural Networks (DNNs) integrated into single Deep Learning (DL) inference pipelines e.g. Multi-Task Learning (MTL) or Ensemble Learning (EL), etc., albeit very accurate, pose challenges for edge deployment. In these systems, models vary in their quantization tolerance and resource demands, requiring meticulous tuning for accuracy-latency balance. This paper introduces an automated heterogeneous quantization approach for DL inference pipelines with multiple DNNs. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Journal ref: LBQNN@ICCV2023

arXiv:2311.00343 [pdf, other]

Analyzing Head Orientation of Neurotypical and Autistic Individuals in Triadic Conversations

Authors: Onur N. Tepencelik, Wenchuan Wei, Pamela C. Cosman, Sujit Dey

Abstract: We propose a system that estimates people's body and head orientations using low-resolution point cloud data from two LiDAR sensors. Our models make accurate estimations in real-world conversation settings where the subject moves naturally with varying head and body poses. The body orientation estimation model uses ellipse fitting while the head orientation estimation model is a pipeline of geomet… ▽ More We propose a system that estimates people's body and head orientations using low-resolution point cloud data from two LiDAR sensors. Our models make accurate estimations in real-world conversation settings where the subject moves naturally with varying head and body poses. The body orientation estimation model uses ellipse fitting while the head orientation estimation model is a pipeline of geometric feature extraction and an ensemble of neural network regressors. Compared with other body and head orientation estimation systems using RGB cameras, our proposed system uses LiDAR sensors to preserve user privacy, while achieving comparable accuracy. Unlike other body/head orientation estimation systems, our sensors do not require a specified placement in front of the subject. Our models achieve a mean absolute estimation error of 5.2 degrees for body orientation and 13.7 degrees for head orientation. We use our models to quantify behavioral differences between neurotypical and autistic individuals in triadic conversations. Tests of significance show that people with autism spectrum disorder display significantly different behavior compared to neurotypical individuals in terms of distributing attention between participants in a conversation, suggesting that the approach could be a component of a behavioral analysis or coaching system. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2310.16592 [pdf, other]

Over-the-air Federated Policy Gradient

Authors: Huiwen Yang, Lingying Huang, Subhrakanti Dey, Ling Shi

Abstract: In recent years, over-the-air aggregation has been widely considered in large-scale distributed learning, optimization, and sensing. In this paper, we propose the over-the-air federated policy gradient algorithm, where all agents simultaneously broadcast an analog signal carrying local information to a common wireless channel, and a central controller uses the received aggregated waveform to updat… ▽ More In recent years, over-the-air aggregation has been widely considered in large-scale distributed learning, optimization, and sensing. In this paper, we propose the over-the-air federated policy gradient algorithm, where all agents simultaneously broadcast an analog signal carrying local information to a common wireless channel, and a central controller uses the received aggregated waveform to update the policy parameters. We investigate the effect of noise and channel distortion on the convergence of the proposed algorithm, and establish the complexities of communication and sampling for finding an $ε$-approximate stationary point. Finally, we present some simulation results to show the effectiveness of the algorithm. △ Less

Submitted 25 February, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: To appear at IEEE ICC 2024

arXiv:2310.00659 [pdf, other]

Liveness Detection Competition -- Noncontact-based Fingerprint Algorithms and Systems (LivDet-2023 Noncontact Fingerprint)

Authors: Sandip Purnapatra, Humaira Rezaie, Bhavin Jawade, Yu Liu, Yue Pan, Luke Brosell, Mst Rumana Sumi, Lambert Igene, Alden Dimarco, Srirangaraj Setlur, Soumyabrata Dey, Stephanie Schuckers, Marco Huber, Jan Niklas Kolf, Meiling Fang, Naser Damer, Banafsheh Adami, Raul Chitic, Karsten Seelert, Vishesh Mistry, Rahul Parthe, Umit Kacar

Abstract: Liveness Detection (LivDet) is an international competition series open to academia and industry with the objec-tive to assess and report state-of-the-art in Presentation Attack Detection (PAD). LivDet-2023 Noncontact Fingerprint is the first edition of the noncontact fingerprint-based PAD competition for algorithms and systems. The competition serves as an important benchmark in noncontact-based… ▽ More Liveness Detection (LivDet) is an international competition series open to academia and industry with the objec-tive to assess and report state-of-the-art in Presentation Attack Detection (PAD). LivDet-2023 Noncontact Fingerprint is the first edition of the noncontact fingerprint-based PAD competition for algorithms and systems. The competition serves as an important benchmark in noncontact-based fingerprint PAD, offering (a) independent assessment of the state-of-the-art in noncontact-based fingerprint PAD for algorithms and systems, and (b) common evaluation protocol, which includes finger photos of a variety of Presentation Attack Instruments (PAIs) and live fingers to the biometric research community (c) provides standard algorithm and system evaluation protocols, along with the comparative analysis of state-of-the-art algorithms from academia and industry with both old and new android smartphones. The winning algorithm achieved an APCER of 11.35% averaged overall PAIs and a BPCER of 0.62%. The winning system achieved an APCER of 13.0.4%, averaged over all PAIs tested over all the smartphones, and a BPCER of 1.68% over all smartphones tested. Four-finger systems that make individual finger-based PAD decisions were also tested. The dataset used for competition will be available 1 to all researchers as per data share protocol △ Less

Submitted 1 October, 2023; originally announced October 2023.

arXiv:2310.00602 [pdf, ps, other]

Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification

Authors: Spandan Dey, Premjeet Singh, Goutam Saha

Abstract: Commonly used features in spoken language identification (LID), such as mel-spectrogram or MFCC, lose high-frequency information due to windowing. The loss further increases for longer temporal contexts. To improve generalization of the low-resourced LID systems, we investigate an alternate feature representation, wavelet scattering transform (WST), that compensates for the shortcomings. To our kn… ▽ More Commonly used features in spoken language identification (LID), such as mel-spectrogram or MFCC, lose high-frequency information due to windowing. The loss further increases for longer temporal contexts. To improve generalization of the low-resourced LID systems, we investigate an alternate feature representation, wavelet scattering transform (WST), that compensates for the shortcomings. To our knowledge, WST is not explored earlier in LID tasks. We first optimize WST features for multiple South Asian LID corpora. We show that LID requires low octave resolution and frequency-scattering is not useful. Further, cross-corpora evaluations show that the optimal WST hyper-parameters depend on both train and test corpora. Hence, we develop fused ECAPA-TDNN based LID systems with different sets of WST hyper-parameters to improve generalization for unknown data. Compared to MFCC, EER is reduced upto 14.05% and 6.40% for same-corpora and blind VoxLingua107 evaluations, respectively. △ Less

Submitted 3 October, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: Accepted and presented in INTERSPEECH 2023

arXiv:2309.17174 [pdf, other]

FedZeN: Towards superlinear zeroth-order federated learning via incremental Hessian estimation

Authors: Alessio Maritan, Subhrakanti Dey, Luca Schenato

Abstract: Federated learning is a distributed learning framework that allows a set of clients to collaboratively train a model under the orchestration of a central server, without sharing raw data samples. Although in many practical scenarios the derivatives of the objective function are not available, only few works have considered the federated zeroth-order setting, in which functions can only be accessed… ▽ More Federated learning is a distributed learning framework that allows a set of clients to collaboratively train a model under the orchestration of a central server, without sharing raw data samples. Although in many practical scenarios the derivatives of the objective function are not available, only few works have considered the federated zeroth-order setting, in which functions can only be accessed through a budgeted number of point evaluations. In this work we focus on convex optimization and design the first federated zeroth-order algorithm to estimate the curvature of the global objective, with the purpose of achieving superlinear convergence. We take an incremental Hessian estimator whose error norm converges linearly, and we adapt it to the federated zeroth-order setting, sampling the random search directions from the Stiefel manifold for improved performance. In particular, both the gradient and Hessian estimators are built at the central server in a communication-efficient and privacy-preserving way by leveraging synchronized pseudo-random number generators. We provide a theoretical analysis of our algorithm, named FedZeN, proving local quadratic convergence with high probability and global linear convergence up to zeroth-order precision. Numerical simulations confirm the superlinear convergence rate and show that our algorithm outperforms the federated zeroth-order methods available in the literature. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.03294 [pdf, other]

MALITE: Lightweight Malware Detection and Classification for Constrained Devices

Authors: Sidharth Anand, Barsha Mitra, Soumyadeep Dey, Abhinav Rao, Rupsa Dhar, Jaideep Vaidya

Abstract: Today, malware is one of the primary cyberthreats to organizations. Malware has pervaded almost every type of computing device including the ones having limited memory, battery and computation power such as mobile phones, tablets and embedded devices like Internet-of-Things (IoT) devices. Consequently, the privacy and security of the malware infected systems and devices have been heavily jeopardiz… ▽ More Today, malware is one of the primary cyberthreats to organizations. Malware has pervaded almost every type of computing device including the ones having limited memory, battery and computation power such as mobile phones, tablets and embedded devices like Internet-of-Things (IoT) devices. Consequently, the privacy and security of the malware infected systems and devices have been heavily jeopardized. In recent years, researchers have leveraged machine learning based strategies for malware detection and classification. Malware analysis approaches can only be employed in resource constrained environments if the methods are lightweight in nature. In this paper, we present MALITE, a lightweight malware analysis system, that can classify various malware families and distinguish between benign and malicious binaries. MALITE converts a binary into a gray scale or an RGB image and employs low memory and battery power consuming as well as computationally inexpensive malware analysis strategies. We have designed MALITE-MN, a lightweight neural network based architecture and MALITE-HRF, an ultra lightweight random forest based method that uses histogram features extracted by a sliding window. We evaluate the performance of both on six publicly available datasets (Malimg, Microsoft BIG, Dumpware10, MOTIF, Drebin and CICAndMal2017), and compare them to four state-of-the-art malware classification techniques. The results show that MALITE-MN and MALITE-HRF not only accurately identify and classify malware but also respectively consume several orders of magnitude lower resources (in terms of both memory as well as computation capabilities), making them much more suitable for resource constrained environments. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.00993

A Boosted Machine Learning Framework for the Improvement of Phase and Crystal Structure Prediction of High Entropy Alloys Using Thermodynamic and Configurational Parameters

Authors: Debsundar Dey, Suchandan Das, Anik Pal, Santanu Dey, Chandan Kumar Raul, Arghya Chatterjee

Abstract: The reason behind the remarkable properties of High-Entropy Alloys (HEAs) is rooted in the diverse phases and the crystal structures they contain. In the realm of material informatics, employing machine learning (ML) techniques to classify phases and crystal structures of HEAs has gained considerable significance. In this study, we assembled a new collection of 1345 HEAs with varying compositions… ▽ More The reason behind the remarkable properties of High-Entropy Alloys (HEAs) is rooted in the diverse phases and the crystal structures they contain. In the realm of material informatics, employing machine learning (ML) techniques to classify phases and crystal structures of HEAs has gained considerable significance. In this study, we assembled a new collection of 1345 HEAs with varying compositions to predict phases. Within this collection, there were 705 sets of data that were utilized to predict the crystal structures with the help of thermodynamics and electronic configuration. Our study introduces a methodical framework i.e., the Pearson correlation coefficient that helps in selecting the strongly co-related features to increase the prediction accuracy. This study employed five distinct boosting algorithms to predict phases and crystal structures, offering an enhanced guideline for improving the accuracy of these predictions. Among all these algorithms, XGBoost gives the highest accuracy of prediction (94.05%) for phases and LightGBM gives the highest accuracy of prediction of crystal structure of the phases (90.07%). The quantification of the influence exerted by parameters on the model's accuracy was conducted and a new approach was made to elucidate the contribution of individual parameters in the process of phase prediction and crystal structure prediction. △ Less

Submitted 31 December, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

Comments: We want to modify this paper and extend some parts of it

arXiv:2308.10015 [pdf, other]

DyFFPAD: Dynamic Fusion of Convolutional and Handcrafted Features for Fingerprint Presentation Attack Detection

Authors: Anuj Rai, Parsheel Kumar Tiwari, Jyotishna Baishya, Ram Prakash Sharma, Somnath Dey

Abstract: Automatic fingerprint recognition systems suffer from the threat of presentation attacks due to their wide range of applications in areas including national borders and commercial applications. Presentation attacks can be performed by fabricating the fake fingerprint of a user with or without the intention of the subject. This paper presents a dynamic ensemble of deep learning and handcrafted feat… ▽ More Automatic fingerprint recognition systems suffer from the threat of presentation attacks due to their wide range of applications in areas including national borders and commercial applications. Presentation attacks can be performed by fabricating the fake fingerprint of a user with or without the intention of the subject. This paper presents a dynamic ensemble of deep learning and handcrafted features to detect presentation attacks in known-material and unknown-material protocols. The proposed model is a dynamic ensemble of deep CNN and handcrafted features empowered deep neural networks both of which learn their parameters together. The proposed presentation attack detection model, in this way, utilizes the capabilities of both classification techniques and exhibits better performance than their individual results. The proposed model's performance is validated using benchmark LivDet 2015, 2017, and 2019 databases, with an overall accuracy of 96.10\%, 96.49\%, and 95.99\% attained on them, respectively. The proposed model outperforms state-of-the-art methods in benchmark protocols of presentation attack detection in terms of classification accuracy. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: arXiv admin note: text overlap with arXiv:2305.09397

arXiv:2308.08941 [pdf, other]

Automatic Signboard Recognition in Low Quality Night Images

Authors: Manas Kagde, Priyanka Choudhary, Rishi Joshi, Somnath Dey

Abstract: An essential requirement for driver assistance systems and autonomous driving technology is implementing a robust system for detecting and recognizing traffic signs. This system enables the vehicle to autonomously analyze the environment and make appropriate decisions regarding its movement, even when operating at higher frame rates. However, traffic sign images captured in inadequate lighting and… ▽ More An essential requirement for driver assistance systems and autonomous driving technology is implementing a robust system for detecting and recognizing traffic signs. This system enables the vehicle to autonomously analyze the environment and make appropriate decisions regarding its movement, even when operating at higher frame rates. However, traffic sign images captured in inadequate lighting and adverse weather conditions are poorly visible, blurred, faded, and damaged. Consequently, the recognition of traffic signs in such circumstances becomes inherently difficult. This paper addressed the challenges of recognizing traffic signs from images captured in low light, noise, and blurriness. To achieve this goal, a two-step methodology has been employed. The first step involves enhancing traffic sign images by applying a modified MIRNet model and producing enhanced images. In the second step, the Yolov4 model recognizes the traffic signs in an unconstrained environment. The proposed method has achieved 5.40% increment in [email protected] for low quality images on Yolov4. The overall [email protected] of 96.75% has been achieved on the GTSRB dataset. It has also attained [email protected] of 100% on the GTSDB dataset for the broad categories, comparable with the state-of-the-art work. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 13 pages, CVIP 2023

arXiv:2307.16710 [pdf, other]

Learning whom to trust in navigation: dynamically switching between classical and neural planning

Authors: Sombit Dey, Assem Sadek, Gianluca Monaci, Boris Chidlovskii, Christian Wolf

Abstract: Navigation of terrestrial robots is typically addressed either with localization and mapping (SLAM) followed by classical planning on the dynamically created maps, or by machine learning (ML), often through end-to-end training with reinforcement learning (RL) or imitation learning (IL). Recently, modular designs have achieved promising results, and hybrid algorithms that combine ML with classical… ▽ More Navigation of terrestrial robots is typically addressed either with localization and mapping (SLAM) followed by classical planning on the dynamically created maps, or by machine learning (ML), often through end-to-end training with reinforcement learning (RL) or imitation learning (IL). Recently, modular designs have achieved promising results, and hybrid algorithms that combine ML with classical planning have been proposed. Existing methods implement these combinations with hand-crafted functions, which cannot fully exploit the complementary nature of the policies and the complex regularities between scene structure and planning performance. Our work builds on the hypothesis that the strengths and weaknesses of neural planners and classical planners follow some regularities, which can be learned from training data, in particular from interactions. This is grounded on the assumption that, both, trained planners and the mapping algorithms underlying classical planning are subject to failure cases depending on the semantics of the scene and that this dependence is learnable: for instance, certain areas, objects or scene structures can be reconstructed easier than others. We propose a hierarchical method composed of a high-level planner dynamically switching between a classical and a neural planner. We fully train all neural policies in simulation and evaluate the method in both simulation and real experiments with a LoCoBot robot, showing significant gains in performance, in particular in the real environment. We also qualitatively conjecture on the nature of data regularities exploited by the high-level planner. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 8 pages including references. International Conference on Intelligent Robots and Systems (IROS 2023)

arXiv:2306.09239 [pdf, ps, other]

Exploiting the Brain's Network Structure for Automatic Identification of ADHD Subjects

Authors: Soumyabrata Dey, Ravishankar Rao, Mubarak Shah

Abstract: Attention Deficit Hyperactive Disorder (ADHD) is a common behavioral problem affecting children. In this work, we investigate the automatic classification of ADHD subjects using the resting state Functional Magnetic Resonance Imaging (fMRI) sequences of the brain. We show that the brain can be modeled as a functional network, and certain properties of the networks differ in ADHD subjects from cont… ▽ More Attention Deficit Hyperactive Disorder (ADHD) is a common behavioral problem affecting children. In this work, we investigate the automatic classification of ADHD subjects using the resting state Functional Magnetic Resonance Imaging (fMRI) sequences of the brain. We show that the brain can be modeled as a functional network, and certain properties of the networks differ in ADHD subjects from control subjects. We compute the pairwise correlation of brain voxels' activity over the time frame of the experimental protocol which helps to model the function of a brain as a network. Different network features are computed for each of the voxels constructing the network. The concatenation of the network features of all the voxels in a brain serves as the feature vector. Feature vectors from a set of subjects are then used to train a PCA-LDA (principal component analysis-linear discriminant analysis) based classifier. We hypothesized that ADHD-related differences lie in some specific regions of the brain and using features only from those regions is sufficient to discriminate ADHD and control subjects. We propose a method to create a brain mask that includes the useful regions only and demonstrate that using the feature from the masked regions improves classification accuracy on the test data set. We train our classifier with 776 subjects and test on 171 subjects provided by The Neuro Bureau for the ADHD-200 challenge. We demonstrate the utility of graph-motif features, specifically the maps that represent the frequency of participation of voxels in network cycles of length 3. The best classification performance (69.59%) is achieved using 3-cycle map features with masking. Our proposed approach holds promise in being able to diagnose and understand the disorder. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.09206 [pdf, other]

Concealing CAN Message Sequences to Prevent Schedule-based Bus-off Attacks

Authors: Sunandan Adhikary, Ipsita Koley, Arkaprava Sain, Soumyadeep das, Shuvam Saha, Soumyajit Dey

Abstract: This work focuses on eliminating timing-side channels in real-time safety-critical cyber-physical network protocols like Controller Area Networks (CAN). Automotive Electronic Control Units (ECUs) implement predictable scheduling decisions based on task level response time estimation. Such levels of determinism exposes timing information about task executions and therefore corresponding message tra… ▽ More This work focuses on eliminating timing-side channels in real-time safety-critical cyber-physical network protocols like Controller Area Networks (CAN). Automotive Electronic Control Units (ECUs) implement predictable scheduling decisions based on task level response time estimation. Such levels of determinism exposes timing information about task executions and therefore corresponding message transmissions via the network buses (that connect the ECUs and actuators). With proper analysis, such timing side channels can be utilized to launch several schedule-based attacks that can lead to eventual denial-of-service or man-in-the-middle-type attacks. To eliminate this determinism, we propose a novel schedule obfuscation strategy by skipping certain control task executions and related data transmissions along with random shifting of the victim task instance. While doing this, our strategy contemplates the performance of the control task as well by bounding the number of control execution skips. We analytically demonstrate how the attack success probability (ASP) is reduced under this proposed attack-aware skipping and randomization. We also demonstrate the efficacy and real-time applicability of our attack-aware schedule obfuscation strategy Hide-n-Seek by applying it to synthesized automotive task sets in a real-time Hardware-in-loop (HIL) setup. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.09057 [pdf, other]

A Learning Assisted Method for Uncovering Power Grid Generation and Distribution System Vulnerabilities

Authors: Suman Maiti, Anjana B, Sunandan Adhikary, Ipsita Koley, Soumyajit Dey

Abstract: Intelligent attackers can suitably tamper sensor/actuator data at various Smart grid surfaces causing intentional power oscillations, which if left undetected, can lead to voltage disruptions. We develop a novel combination of formal methods and machine learning tools that learns power system dynamics with the objective of generating unsafe yet stealthy false data based attack sequences. We enable… ▽ More Intelligent attackers can suitably tamper sensor/actuator data at various Smart grid surfaces causing intentional power oscillations, which if left undetected, can lead to voltage disruptions. We develop a novel combination of formal methods and machine learning tools that learns power system dynamics with the objective of generating unsafe yet stealthy false data based attack sequences. We enable the grid with anomaly detectors in a generalized manner so that it is difficult for an attacker to remain undetected. Our methodology, when applied on an IEEE 14 bus power grid model, uncovers stealthy attack vectors even in presence of such detectors. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.03577 [pdf, other]

An Open Patch Generator based Fingerprint Presentation Attack Detection using Generative Adversarial Network

Authors: Anuj Rai, Ashutosh Anshul, Ashwini Jha, Prayag Jain, Ramprakash Sharma, Somnath Dey

Abstract: The low-cost, user-friendly, and convenient nature of Automatic Fingerprint Recognition Systems (AFRS) makes them suitable for a wide range of applications. This spreading use of AFRS also makes them vulnerable to various security threats. Presentation Attack (PA) or spoofing is one of the threats which is caused by presenting a spoof of a genuine fingerprint to the sensor of AFRS. Fingerprint Pre… ▽ More The low-cost, user-friendly, and convenient nature of Automatic Fingerprint Recognition Systems (AFRS) makes them suitable for a wide range of applications. This spreading use of AFRS also makes them vulnerable to various security threats. Presentation Attack (PA) or spoofing is one of the threats which is caused by presenting a spoof of a genuine fingerprint to the sensor of AFRS. Fingerprint Presentation Attack Detection (FPAD) is a countermeasure intended to protect AFRS against fake or spoof fingerprints created using various fabrication materials. In this paper, we have proposed a Convolutional Neural Network (CNN) based technique that uses a Generative Adversarial Network (GAN) to augment the dataset with spoof samples generated from the proposed Open Patch Generator (OPG). This OPG is capable of generating realistic fingerprint samples which have no resemblance to the existing spoof fingerprint samples generated with other materials. The augmented dataset is fed to the DenseNet classifier which helps in increasing the performance of the Presentation Attack Detection (PAD) module for the various real-world attacks possible with unknown spoof materials. Experimental evaluations of the proposed approach are carried out on the Liveness Detection (LivDet) 2015, 2017, and 2019 competition databases. An overall accuracy of 96.20\%, 94.97\%, and 92.90\% has been achieved on the LivDet 2015, 2017, and 2019 databases, respectively under the LivDet protocol scenarios. The performance of the proposed PAD model is also validated in the cross-material and cross-sensor attack paradigm which further exhibits its capability to be used under real-world attack scenarios. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2305.17492 [pdf, other]

Dynamic User Segmentation and Usage Profiling

Authors: Animesh Mitra, Saswata Sahoo, Soumyabrata Dey

Abstract: Usage data of a group of users distributed across a number of categories, such as songs, movies, webpages, links, regular household products, mobile apps, games, etc. can be ultra-high dimensional and massive in size. More often this kind of data is categorical and sparse in nature making it even more difficult to interpret any underlying hidden patterns such as clusters of users. However, if this… ▽ More Usage data of a group of users distributed across a number of categories, such as songs, movies, webpages, links, regular household products, mobile apps, games, etc. can be ultra-high dimensional and massive in size. More often this kind of data is categorical and sparse in nature making it even more difficult to interpret any underlying hidden patterns such as clusters of users. However, if this information can be estimated accurately, it will have huge impacts in different business areas such as user recommendations for apps, songs, movies, and other similar products, health analytics using electronic health record (EHR) data, and driver profiling for insurance premium estimation or fleet management. In this work, we propose a clustering strategy of such categorical big data, utilizing the hidden sparsity of the dataset. Most traditional clustering methods fail to give proper clusters for such data and end up giving one big cluster with small clusters around it irrespective of the true structure of the data clusters. We propose a feature transformation, which maps the binary-valued usage vector to a lower dimensional continuous feature space in terms of groups of usage categories, termed as covariate classes. The lower dimensional feature representations in terms of covariate classes can be used for clustering. We implemented the proposed strategy and applied it to a large sized very high-dimensional song playlist dataset for the performance validation. The results are impressive as we achieved similar-sized user clusters with minimal between-cluster overlap in the feature space (8%) on average). As the proposed strategy has a very generic framework, it can be utilized as the analytic engine of many of the above-mentioned business use cases allowing an intelligent and dynamic personal recommendation system or a support system for smart business decision-making. △ Less

Submitted 27 May, 2023; originally announced May 2023.

arXiv:2305.10852 [pdf, other]

Q-SHED: Distributed Optimization at the Edge via Hessian Eigenvectors Quantization

Authors: Nicolò Dal Fabbro, Michele Rossi, Luca Schenato, Subhrakanti Dey

Abstract: Edge networks call for communication efficient (low overhead) and robust distributed optimization (DO) algorithms. These are, in fact, desirable qualities for DO frameworks, such as federated edge learning techniques, in the presence of data and system heterogeneity, and in scenarios where internode communication is the main bottleneck. Although computationally demanding, Newton-type (NT) methods… ▽ More Edge networks call for communication efficient (low overhead) and robust distributed optimization (DO) algorithms. These are, in fact, desirable qualities for DO frameworks, such as federated edge learning techniques, in the presence of data and system heterogeneity, and in scenarios where internode communication is the main bottleneck. Although computationally demanding, Newton-type (NT) methods have been recently advocated as enablers of robust convergence rates in challenging DO problems where edge devices have sufficient computational power. Along these lines, in this work we propose Q-SHED, an original NT algorithm for DO featuring a novel bit-allocation scheme based on incremental Hessian eigenvectors quantization. The proposed technique is integrated with the recent SHED algorithm, from which it inherits appealing features like the small number of required Hessian computations, while being bandwidth-versatile at a bit-resolution level. Our empirical evaluation against competing approaches shows that Q-SHED can reduce by up to 60% the number of communication rounds required for convergence. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.09397 [pdf, other]

EXPRESSNET: An Explainable Residual Slim Network for Fingerprint Presentation Attack Detection

Authors: Anuj Rai, Somnath Dey

Abstract: Presentation attack is a challenging issue that persists in the security of automatic fingerprint recognition systems. This paper proposes a novel explainable residual slim network that detects the presentation attack by representing the visual features in the input fingerprint sample. The encoder-decoder of this network along with the channel attention block converts the input sample into its hea… ▽ More Presentation attack is a challenging issue that persists in the security of automatic fingerprint recognition systems. This paper proposes a novel explainable residual slim network that detects the presentation attack by representing the visual features in the input fingerprint sample. The encoder-decoder of this network along with the channel attention block converts the input sample into its heatmap representation while the modified residual convolutional neural network classifier discriminates between live and spoof fingerprints. The entire architecture of the heatmap generator block and modified ResNet classifier works together in an end-to-end manner. The performance of the proposed model is validated on benchmark liveness detection competition databases i.e. Livdet 2011, 2013, 2015, 2017, and 2019 and the classification accuracy of 96.86\%, 99.84\%, 96.45\%, 96.07\%, 96.27\% are achieved on them, respectively. The performance of the proposed model is compared with the state-of-the-art techniques, and the proposed method outperforms state-of-the-art methods in benchmark protocols of presentation attack detection in terms of classification accuracy. △ Less

Submitted 6 June, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:2303.01465

arXiv:2305.07898 [pdf, other]

Network-GIANT: Fully distributed Newton-type optimization via harmonic Hessian consensus

Authors: Alessio Maritan, Ganesh Sharma, Luca Schenato, Subhrakanti Dey

Abstract: This paper considers the problem of distributed multi-agent learning, where the global aim is to minimize a sum of local objective (empirical loss) functions through local optimization and information exchange between neighbouring nodes. We introduce a Newton-type fully distributed optimization algorithm, Network-GIANT, which is based on GIANT, a Federated learning algorithm that relies on a centr… ▽ More This paper considers the problem of distributed multi-agent learning, where the global aim is to minimize a sum of local objective (empirical loss) functions through local optimization and information exchange between neighbouring nodes. We introduce a Newton-type fully distributed optimization algorithm, Network-GIANT, which is based on GIANT, a Federated learning algorithm that relies on a centralized parameter server. The Network-GIANT algorithm is designed via a combination of gradient-tracking and a Newton-type iterative algorithm at each node with consensus based averaging of local gradient and Newton updates. We prove that our algorithm guarantees semi-global and exponential convergence to the exact solution over the network assuming strongly convex and smooth loss functions. We provide empirical evidence of the superior convergence performance of Network-GIANT over other state-of-art distributed learning algorithms such as Network-DANE and Newton-Raphson Consensus. △ Less

Submitted 19 July, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

arXiv:2305.05274 [pdf, other]

doi 10.1109/IJCNN54540.2023.10191771

DietCNN: Multiplication-free Inference for Quantized CNNs

Authors: Swarnava Dey, Pallab Dasgupta, Partha P Chakrabarti

Abstract: The rising demand for networked embedded systems with machine intelligence has been a catalyst for sustained attempts by the research community to implement Convolutional Neural Networks (CNN) based inferencing on embedded resource-limited devices. Redesigning a CNN by removing costly multiplication operations has already shown promising results in terms of reducing inference energy usage. This pa… ▽ More The rising demand for networked embedded systems with machine intelligence has been a catalyst for sustained attempts by the research community to implement Convolutional Neural Networks (CNN) based inferencing on embedded resource-limited devices. Redesigning a CNN by removing costly multiplication operations has already shown promising results in terms of reducing inference energy usage. This paper proposes a new method for replacing multiplications in a CNN by table look-ups. Unlike existing methods that completely modify the CNN operations, the proposed methodology preserves the semantics of the major CNN operations. Conforming to the existing mechanism of the CNN layer operations ensures that the reliability of a standard CNN is preserved. It is shown that the proposed multiplication-free CNN, based on a single activation codebook, can achieve 4.7x, 5.6x, and 3.5x reduction in energy per inference in an FPGA implementation of MNIST-LeNet-5, CIFAR10-VGG-11, and Tiny ImageNet-ResNet-18 respectively. Our results show that the DietCNN approach significantly improves the resource consumption and latency of deep inference for smaller models, often used in embedded systems. Our code is available at: https://github.com/swadeykgp/DietCNN △ Less

Submitted 17 August, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: Supplementary for S. Dey, P. Dasgupta and P. P. Chakrabarti, "DietCNN: Multiplication-free Inference for Quantized CNNs," 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 2023, pp. 1-8, doi: 10.1109/IJCNN54540.2023.10191771

arXiv:2304.02208 [pdf]

doi 10.1007/s42979-021-00871-7

PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data

Authors: A. Ravishankar Rao, Subrata Garai, Soumyabrata Dey, Hang Peng

Abstract: With calls for increasing transparency, governments are releasing greater amounts of data in multiple domains including finance, education and healthcare. The efficient exploratory analysis of healthcare data constitutes a significant challenge. Key concerns in public health include the quick identification and analysis of trends, and the detection of outliers. This allows policies to be rapidly a… ▽ More With calls for increasing transparency, governments are releasing greater amounts of data in multiple domains including finance, education and healthcare. The efficient exploratory analysis of healthcare data constitutes a significant challenge. Key concerns in public health include the quick identification and analysis of trends, and the detection of outliers. This allows policies to be rapidly adapted to changing circumstances. We present an efficient outlier detection technique, termed PIKS (Pruned iterative-k means searchlight), which combines an iterative k-means algorithm with a pruned searchlight based scan. We apply this technique to identify outliers in two publicly available healthcare datasets from the New York Statewide Planning and Research Cooperative System, and California's Office of Statewide Health Planning and Development. We provide a comparison of our technique with three other existing outlier detection techniques, consisting of auto-encoders, isolation forests and feature bagging. We identified outliers in conditions including suicide rates, immunity disorders, social admissions, cardiomyopathies, and pregnancy in the third trimester. We demonstrate that the PIKS technique produces results consistent with other techniques such as the auto-encoder. However, the auto-encoder needs to be trained, which requires several parameters to be tuned. In comparison, the PIKS technique has far fewer parameters to tune. This makes it advantageous for fast, "out-of-the-box" data exploration. The PIKS technique is scalable and can readily ingest new datasets. Hence, it can provide valuable, up-to-date insights to citizens, patients and policy-makers. We have made our code open source, and with the availability of open data, other researchers can easily reproduce and extend our work. This will help promote a deeper understanding of healthcare policies and public health issues. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Journal ref: SN COMPUT. SCI. 2, 477 (2021)

arXiv:2304.02191 [pdf]

doi 10.1109/ICHI48887.2020.9374348

Building predictive models of healthcare costs with open healthcare data

Authors: A. Ravishankar Rao, Subrata Garai, Soumyabrata Dey, Hang Peng

Abstract: Due to rapidly rising healthcare costs worldwide, there is significant interest in controlling them. An important aspect concerns price transparency, as preliminary efforts have demonstrated that patients will shop for lower costs, driving efficiency. This requires the data to be made available, and models that can predict healthcare costs for a wide range of patient demographics and conditions. W… ▽ More Due to rapidly rising healthcare costs worldwide, there is significant interest in controlling them. An important aspect concerns price transparency, as preliminary efforts have demonstrated that patients will shop for lower costs, driving efficiency. This requires the data to be made available, and models that can predict healthcare costs for a wide range of patient demographics and conditions. We present an approach to this problem by developing a predictive model using machine-learning techniques. We analyzed de-identified patient data from New York State SPARCS (statewide planning and research cooperative system), consisting of 2.3 million records in 2016. We built models to predict costs from patient diagnoses and demographics. We investigated two model classes consisting of sparse regression and decision trees. We obtained the best performance by using a decision tree with depth 10. We obtained an R-square value of 0.76 which is better than the values reported in the literature for similar problems. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 2020 IEEE International Conference on Healthcare Informatics (ICHI)

arXiv:2304.02189 [pdf]

doi 10.1109/IJCNN.2018.8489448

A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data

Authors: A. Ravishankar Rao, Daniel Clarke, Subrata Garai, Soumyabrata Dey

Abstract: The interactive exploration of large and evolving datasets is challenging as relationships between underlying variables may not be fully understood. There may be hidden trends and patterns in the data that are worthy of further exploration and analysis. We present a system that methodically explores multiple combinations of variables using a searchlight technique and identifies outliers. An iterat… ▽ More The interactive exploration of large and evolving datasets is challenging as relationships between underlying variables may not be fully understood. There may be hidden trends and patterns in the data that are worthy of further exploration and analysis. We present a system that methodically explores multiple combinations of variables using a searchlight technique and identifies outliers. An iterative k-means clustering algorithm is applied to features derived through a split-apply-combine paradigm used in the database literature. Outliers are identified as singleton or small clusters. This algorithm is swept across the dataset in a searchlight manner. The dimensions that contain outliers are combined in pairs with other dimensions using a susbset scan technique to gain further insight into the outliers. We illustrate this system by anaylzing open health care data released by New York State. We apply our iterative k-means searchlight followed by subset scanning. Several anomalous trends in the data are identified, including cost overruns at specific hospitals, and increases in diagnoses such as suicides. These constitute novel findings in the literature, and are of potential use to regulatory agencies, policy makers and concerned citizens. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 2018 International Joint Conference on Neural Networks (IJCNN)

arXiv:2304.01488 [pdf, other]

doi 10.1109/MobileCloud55333.2022.00010

End-to-End Latency Optimization of Multi-view 3D Reconstruction for Disaster Response

Authors: Xiaojie Zhang, Mingjun Li, Andrew Hilton, Amitangshu Pal, Soumyabrata Dey, Saptarshi Debroy

Abstract: In order to plan rapid response during disasters, first responder agencies often adopt `bring your own device' (BYOD) model with inexpensive mobile edge devices (e.g., drones, robots, tablets) for complex video analytics applications, e.g., 3D reconstruction of a disaster scene. Unlike simpler video applications, widely used Multi-view Stereo (MVS) based 3D reconstruction applications (e.g., openM… ▽ More In order to plan rapid response during disasters, first responder agencies often adopt `bring your own device' (BYOD) model with inexpensive mobile edge devices (e.g., drones, robots, tablets) for complex video analytics applications, e.g., 3D reconstruction of a disaster scene. Unlike simpler video applications, widely used Multi-view Stereo (MVS) based 3D reconstruction applications (e.g., openMVG/openMVS) are exceedingly time consuming, especially when run on such computationally constrained mobile edge devices. Additionally, reducing the reconstruction latency of such inherently sequential algorithms is challenging as unintelligent, application-agnostic strategies can drastically degrade the reconstruction (i.e., application outcome) quality making them useless. In this paper, we aim to design a latency optimized MVS algorithm pipeline, with the objective to best balance the end-to-end latency and reconstruction quality by running the pipeline on a collaborative mobile edge environment. The overall optimization approach is two-pronged where: (a) application optimizations introduce data-level parallelism by splitting the pipeline into high frequency and low frequency reconstruction components and (b) system optimizations incorporate task-level parallelism to the pipelines by running them opportunistically on available resources with online quality control in order to balance both latency and quality. Our evaluation on a hardware testbed using publicly available datasets shows upto ~54% reduction in latency with negligible loss (~4-7%) in reconstruction quality. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 2022 10th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud)

arXiv:2304.00677 [pdf, other]

doi 10.1109/WAMICON53991.2022.9786207

DNN-based Denial of Quality of Service Attack on Software-defined Hybrid Edge-Cloud Systems

Authors: Minh Nguyen, Jacob Gately, Swati Kar, Soumyabrata Dey, Saptarshi Debroy

Abstract: In order to satisfy diverse quality-of-service (QoS) requirements of complex real-time video applications, civilian and tactical use cases are employing software-defined hybrid edge-cloud systems. One of the primary QoS requirements of such applications is ultra-low end-to-end latency for video applications that necessitates rapid frame transfer between end-devices and edge servers using software-… ▽ More In order to satisfy diverse quality-of-service (QoS) requirements of complex real-time video applications, civilian and tactical use cases are employing software-defined hybrid edge-cloud systems. One of the primary QoS requirements of such applications is ultra-low end-to-end latency for video applications that necessitates rapid frame transfer between end-devices and edge servers using software-defined networking (SDN). Failing to guarantee such strict requirements leads to quality degradation of video applications and subsequently mission failure. In this paper, we show how a collaborative group of attackers can exploit SDN's control communications to launch Denial of Quality of Service (DQoS) attack that artificially increases end-to-end latency of video frames and yet evades detection. In particular, we show how Deep Neural Network (DNN) model training on all or partial network state information can help predict network packet drop rates with reasonable accuracy. We also show how such predictions can help design an attack model that can inflict just the right amount of added latency to the end-to-end video processing that is enough to cause considerable QoS degradation but not too much to raise suspicion. We use a realistic edge-cloud testbed on GENI platform for training data collection and demonstration of high model accuracy and attack success rate. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: WAMICON 2022

arXiv:2303.05459 [pdf, other]

Presentation Attack Detection with Advanced CNN Models for Noncontact-based Fingerprint Systems

Authors: Sandip Purnapatra, Conor Miller-Lynch, Stephen Miner, Yu Liu, Keivan Bahmani, Soumyabrata Dey, Stephanie Schuckers

Abstract: Touch-based fingerprint biometrics is one of the most popular biometric modalities with applications in several fields. Problems associated with touch-based techniques such as the presence of latent fingerprints and hygiene issues due to many people touching the same surface motivated the community to look for non-contact-based solutions. For the last few years, contactless fingerprint systems are… ▽ More Touch-based fingerprint biometrics is one of the most popular biometric modalities with applications in several fields. Problems associated with touch-based techniques such as the presence of latent fingerprints and hygiene issues due to many people touching the same surface motivated the community to look for non-contact-based solutions. For the last few years, contactless fingerprint systems are on the rise and in demand because of the ability to turn any device with a camera into a fingerprint reader. Yet, before we can fully utilize the benefit of noncontact-based methods, the biometric community needs to resolve a few concerns such as the resiliency of the system against presentation attacks. One of the major obstacles is the limited publicly available data sets with inadequate spoof and live data. In this publication, we have developed a Presentation attack detection (PAD) dataset of more than 7500 four-finger images and more than 14,000 manually segmented single-fingertip images, and 10,000 synthetic fingertips (deepfakes). The PAD dataset was collected from six different Presentation Attack Instruments (PAI) of three different difficulty levels according to FIDO protocols, with five different types of PAI materials, and different smartphone cameras with manual focusing. We have utilized DenseNet-121 and NasNetMobile models and our proposed dataset to develop PAD algorithms and achieved PAD accuracy of Attack presentation classification error rate (APCER) 0.14\% and Bonafide presentation classification error rate (BPCER) 0.18\%. We have also reported the test results of the models against unseen spoof types to replicate uncertain real-world testing scenarios. △ Less

Submitted 9 March, 2023; originally announced March 2023.

arXiv:2303.02321 [pdf, other]

doi 10.1007/978-3-031-06430-2_41

Real-Time Hand Gesture Identification in Thermal Images

Authors: James Ballow, Soumyabrata Dey

Abstract: Hand gesture-based human-computer interaction is an important problem that is well explored using color camera data. In this work we proposed a hand gesture detection system using thermal images. Our system is capable of handling multiple hand regions in a frame and process it fast for real-time applications. Our system performs a series of steps including background subtraction-based hand mask ge… ▽ More Hand gesture-based human-computer interaction is an important problem that is well explored using color camera data. In this work we proposed a hand gesture detection system using thermal images. Our system is capable of handling multiple hand regions in a frame and process it fast for real-time applications. Our system performs a series of steps including background subtraction-based hand mask generation, k-means based hand region identification, hand segmentation to remove the forearm region, and a Convolutional Neural Network (CNN) based gesture classification. Our work introduces two novel algorithms, bubble growth and bubble search, for faster hand segmentation. We collected a new thermal image data set with 10 gestures and reported an end-to-end hand gesture recognition accuracy of 97%. △ Less

Submitted 4 March, 2023; originally announced March 2023.

Comments: 21st International Conference on Image Analysis and Processing

arXiv:2303.01614 [pdf, other]

doi 10.55417/fr.2024006

STEP: Stochastic Traversability Evaluation and Planning for Risk-Aware Off-road Navigation; Results from the DARPA Subterranean Challenge

Authors: Anushri Dixit, David D. Fan, Kyohei Otsu, Sharmita Dey, Ali-Akbar Agha-Mohammadi, Joel W. Burdick

Abstract: Although autonomy has gained widespread usage in structured and controlled environments, robotic autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, rubble, and other post-disaster sites pose unique and challenging problems for autonomous navigation. Based on our participation in the DARPA Sub… ▽ More Although autonomy has gained widespread usage in structured and controlled environments, robotic autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, rubble, and other post-disaster sites pose unique and challenging problems for autonomous navigation. Based on our participation in the DARPA Subterranean Challenge, we propose an approach to improve autonomous traversal of robots in subterranean environments that are perceptually degraded and completely unknown through a traversability and planning framework called STEP (Stochastic Traversability Evaluation and Planning). We present 1) rapid uncertainty-aware mapping and traversability evaluation, 2) tail risk assessment using the Conditional Value-at-Risk (CVaR), 3) efficient risk and constraint-aware kinodynamic motion planning using sequential quadratic programming-based (SQP) model predictive control (MPC), 4) fast recovery behaviors to account for unexpected scenarios that may cause failure, and 5) risk-based gait adaptation for quadrupedal robots. We illustrate and validate extensive results from our experiments on wheeled and legged robotic platforms in field studies at the Valentine Cave, CA (cave environment), Kentucky Underground, KY (mine environment), and Louisville Mega Cavern, KY (final competition site for the DARPA Subterranean Challenge with tunnel, urban, and cave environments). △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2103.02828

Journal ref: Field Robotics, 4, 2024, 182-210

arXiv:2303.01547 [pdf, other]

Simultaneous prediction of hand gestures, handedness, and hand keypoints using thermal images

Authors: Sichao Li, Sean Banerjee, Natasha Kholgade Banerjee, Soumyabrata Dey

Abstract: Hand gesture detection is a well-explored area in computer vision with applications in various forms of Human-Computer Interactions. In this work, we propose a technique for simultaneous hand gesture classification, handedness detection, and hand keypoints localization using thermal data captured by an infrared camera. Our method uses a novel deep multi-task learning architecture that includes sha… ▽ More Hand gesture detection is a well-explored area in computer vision with applications in various forms of Human-Computer Interactions. In this work, we propose a technique for simultaneous hand gesture classification, handedness detection, and hand keypoints localization using thermal data captured by an infrared camera. Our method uses a novel deep multi-task learning architecture that includes shared encoderdecoder layers followed by three branches dedicated for each mentioned task. We performed extensive experimental validation of our model on an in-house dataset consisting of 24 users data. The results confirm higher than 98 percent accuracy for gesture classification, handedness detection, and fingertips localization, and more than 91 percent accuracy for wrist points localization. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: ICDEC 2022

arXiv:2303.01465 [pdf, other]

MoSFPAD: An end-to-end Ensemble of MobileNet and Support Vector Classifier for Fingerprint Presentation Attack Detection

Authors: Anuj Rai, Somnath Dey, Pradeep Patidar, Prakhar Rai

Abstract: Automatic fingerprint recognition systems are the most extensively used systems for person authentication although they are vulnerable to Presentation attacks. Artificial artifacts created with the help of various materials are used to deceive these systems causing a threat to the security of fingerprint-based applications. This paper proposes a novel end-to-end model to detect fingerprint Present… ▽ More Automatic fingerprint recognition systems are the most extensively used systems for person authentication although they are vulnerable to Presentation attacks. Artificial artifacts created with the help of various materials are used to deceive these systems causing a threat to the security of fingerprint-based applications. This paper proposes a novel end-to-end model to detect fingerprint Presentation attacks. The proposed model incorporates MobileNet as a feature extractor and a Support Vector Classifier as a classifier to detect presentation attacks in cross-material and cross-sensor paradigms. The feature extractor's parameters are learned with the loss generated by the support vector classifier. The proposed model eliminates the need for intermediary data preparation procedures, unlike other static hybrid architectures. The performance of the proposed model has been validated on benchmark LivDet 2011, 2013, 2015, 2017, and 2019 databases, and overall accuracy of 98.64%, 99.50%, 97.23%, 95.06%, and 95.20% is achieved on these databases, respectively. The performance of the proposed model is compared with state-of-the-art methods and the proposed method outperforms in cross-material and cross-sensor paradigms in terms of average classification error. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 12 pages, 3 figures

arXiv:2302.05110 [pdf, ps, other]

doi 10.1016/j.csl.2023.101489

Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization

Authors: Spandan Dey, Md Sahidullah, Goutam Saha

Abstract: This work addresses the cross-corpora generalization issue for the low-resourced spoken language identification (LID) problem. We have conducted the experiments in the context of Indian LID and identified strikingly poor cross-corpora generalization due to corpora-dependent non-lingual biases. Our contribution to this work is twofold. First, we propose domain diversification, which diversifies the… ▽ More This work addresses the cross-corpora generalization issue for the low-resourced spoken language identification (LID) problem. We have conducted the experiments in the context of Indian LID and identified strikingly poor cross-corpora generalization due to corpora-dependent non-lingual biases. Our contribution to this work is twofold. First, we propose domain diversification, which diversifies the limited training data using different audio data augmentation methods. We then propose the concept of maximally diversity-aware cascaded augmentations and optimize the augmentation fold-factor for effective diversification of the training data. Second, we introduce the idea of domain generalization considering the augmentation methods as pseudo-domains. Towards this, we investigate both domain-invariant and domain-aware approaches. Our LID system is based on the state-of-the-art emphasized channel attention, propagation, and aggregation based time delay neural network (ECAPA-TDNN) architecture. We have conducted extensive experiments with three widely used corpora for Indian LID research. In addition, we conduct a final blind evaluation of our proposed methods on the Indian subset of VoxLingua107 corpus collected in the wild. Our experiments demonstrate that the proposed domain diversification is more promising over commonly used simple augmentation methods. The study also reveals that domain generalization is a more effective solution than domain diversification. We also notice that domain-aware learning performs better for same-corpora LID, whereas domain-invariant learning is more suitable for cross-corpora generalization. Compared to basic ECAPA-TDNN, its proposed domain-invariant extensions improve the cross-corpora EER up to 5.23%. In contrast, the proposed domain-aware extensions also improve performance for same-corpora test scenarios. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: Accepted for publication in Elsevier Computer Speech & Language

arXiv:2301.13308 [pdf, other]

Can't Touch This: Real-Time, Safe Motion Planning and Control for Manipulators Under Uncertainty

Authors: Jonathan Michaux, Patrick Holmes, Bohao Zhang, Che Chen, Baiyue Wang, Shrey Sahgal, Tiancheng Zhang, Sidhartha Dey, Shreyas Kousik, Ram Vasudevan

Abstract: Ensuring safe, real-time motion planning in arbitrary environments requires a robotic manipulator to avoid collisions, obey joint limits, and account for uncertainties in the mass and inertia of objects and the robot itself. This paper proposes Autonomous Robust Manipulation via Optimization with Uncertainty-aware Reachability (ARMOUR), a provably-safe, receding-horizon trajectory planner and trac… ▽ More Ensuring safe, real-time motion planning in arbitrary environments requires a robotic manipulator to avoid collisions, obey joint limits, and account for uncertainties in the mass and inertia of objects and the robot itself. This paper proposes Autonomous Robust Manipulation via Optimization with Uncertainty-aware Reachability (ARMOUR), a provably-safe, receding-horizon trajectory planner and tracking controller framework for robotic manipulators to address these challenges. ARMOUR first constructs a robust controller that tracks desired trajectories with bounded error despite uncertain dynamics. ARMOUR then uses a novel recursive Newton-Euler method to compute all inputs required to track any trajectory within a continuum of desired trajectories. Finally, ARMOUR over-approximates the swept volume of the manipulator; this enables one to formulate an optimization problem that can be solved in real-time to synthesize provably-safe motions. This paper compares ARMOUR to state of the art methods on a set of challenging manipulation examples in simulation and demonstrates its ability to ensure safety on real hardware in the presence of model uncertainty without sacrificing performance. Project page: https://roahmlab.github.io/armour/. △ Less

Submitted 1 November, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: 20 pages, 6 figures

Showing 1–50 of 207 results for author: Dey, S