Search | arXiv e-print repository

arXiv:2408.13945 [pdf, other]

Personalized Topology-Informed 12-Lead ECG Electrode Localization from Incomplete Cardiac MRIs for Efficient Cardiac Digital Twins

Authors: Lei Li, Hannah Smith, Yilin Lyu, Julia Camps, Blanca Rodriguez, Abhirup Banerjee, Vicente Grau

Abstract: Cardiac digital twins (CDTs) offer personalized \textit{in-silico} cardiac representations for the inference of multi-scale properties tied to cardiac mechanisms. The creation of CDTs requires precise information about the electrode position on the torso, especially for the personalized electrocardiogram (ECG) calibration. However, current studies commonly rely on additional acquisition of torso i… ▽ More Cardiac digital twins (CDTs) offer personalized \textit{in-silico} cardiac representations for the inference of multi-scale properties tied to cardiac mechanisms. The creation of CDTs requires precise information about the electrode position on the torso, especially for the personalized electrocardiogram (ECG) calibration. However, current studies commonly rely on additional acquisition of torso imaging and manual/semi-automatic methods for ECG electrode localization. In this study, we propose a novel and efficient topology-informed model to fully automatically extract personalized ECG electrode locations from 2D clinically standard cardiac MRIs. Specifically, we obtain the sparse torso contours from the cardiac MRIs and then localize the electrodes from the contours. Cardiac MRIs aim at imaging of the heart instead of the torso, leading to incomplete torso geometry within the imaging. To tackle the missing topology, we incorporate the electrodes as a subset of the keypoints, which can be explicitly aligned with the 3D torso topology. The experimental results demonstrate that the proposed model outperforms the time-consuming conventional method in terms of accuracy (Euclidean distance: $1.24 \pm 0.293$ cm vs. $1.48 \pm 0.362$ cm) and efficiency ($2$~s vs. $30$-$35$~min). We further demonstrate the effectiveness of using the detected electrodes for \textit{in-silico} ECG simulation, highlighting their potential for creating accurate and efficient CDT models. The code will be released publicly after the manuscript is accepted for publication. △ Less

Submitted 25 August, 2024; originally announced August 2024.

Comments: 12 pages

arXiv:2408.05950 [pdf, other]

Robust online reconstruction of continuous-time signals from a lean spike train ensemble code

Authors: Anik Chattopadhyay, Arunava Banerjee

Abstract: Sensory stimuli in animals are encoded into spike trains by neurons, offering advantages such as sparsity, energy efficiency, and high temporal resolution. This paper presents a signal processing framework that deterministically encodes continuous-time signals into biologically feasible spike trains, and addresses the questions about representable signal classes and reconstruction bounds. The fram… ▽ More Sensory stimuli in animals are encoded into spike trains by neurons, offering advantages such as sparsity, energy efficiency, and high temporal resolution. This paper presents a signal processing framework that deterministically encodes continuous-time signals into biologically feasible spike trains, and addresses the questions about representable signal classes and reconstruction bounds. The framework considers encoding of a signal through spike trains generated by an ensemble of neurons using a convolve-then-threshold mechanism with various convolution kernels. A closed-form solution to the inverse problem, from spike trains to signal reconstruction, is derived in the Hilbert space of shifted kernel functions, ensuring sparse representation of a generalized Finite Rate of Innovation (FRI) class of signals. Additionally, inspired by real-time processing in biological systems, an efficient iterative version of the optimal reconstruction is formulated that considers only a finite window of past spikes, ensuring robustness of the technique to ill-conditioned encoding; convergence guarantees of the windowed reconstruction to the optimal solution are then provided. Experiments on a large audio dataset demonstrate excellent reconstruction accuracy at spike rates as low as one-fifth of the Nyquist rate, while showing clear competitive advantage in comparison to state-of-the-art sparse coding techniques in the low spike rate regime. △ Less

Submitted 14 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

Comments: 22 pages, including a 9-page appendix, 8 figures. A GitHub link to the project implementation is embedded in the paper

arXiv:2408.01996 [pdf, other]

Configuring Safe Spiking Neural Controllers for Cyber-Physical Systems through Formal Verification

Authors: Arkaprava Gupta, Sumana Ghosh, Ansuman Banerjee, Swarup Kumar Mohalik

Abstract: Spiking Neural Networks (SNNs) are a subclass of neuromorphic models that have great potential to be used as controllers in Cyber-Physical Systems (CPSs) due to their energy efficiency. They can benefit from the prevalent approach of first training an Artificial Neural Network (ANN) and then translating to an SNN with subsequent hyperparameter tuning. The tuning is required to ensure that the resu… ▽ More Spiking Neural Networks (SNNs) are a subclass of neuromorphic models that have great potential to be used as controllers in Cyber-Physical Systems (CPSs) due to their energy efficiency. They can benefit from the prevalent approach of first training an Artificial Neural Network (ANN) and then translating to an SNN with subsequent hyperparameter tuning. The tuning is required to ensure that the resulting SNN is accurate with respect to the ANN in terms of metrics like Mean Squared Error (MSE). However, SNN controllers for safety-critical CPSs must also satisfy safety specifications, which are not guaranteed by the conversion approach. In this paper, we propose a solution which tunes the $temporal$ $window$ hyperparameter of the translated SNN to ensure both accuracy and compliance with the safe range specification that requires the SNN outputs to remain within a safe range. The core verification problem is modelled using mixed-integer linear programming (MILP) and is solved with Gurobi. When the controller fails to meet the range specification, we compute tight bounds on the SNN outputs as feedback for the CPS developer. To mitigate the high computational cost of verification, we integrate data-driven steps to minimize verification calls. Our approach provides designers with the confidence to safely integrate energy-efficient SNN controllers into modern CPSs. We demonstrate our approach with experimental results on five different benchmark neural controllers. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: This is the complete version of a paper with the same title that appeared at MEMOCODE 2024

arXiv:2407.14616 [pdf, other]

Deep Learning-based 3D Coronary Tree Reconstruction from Two 2D Non-simultaneous X-ray Angiography Projections

Authors: Yiying Wang, Abhirup Banerjee, Robin P. Choudhury, Vicente Grau

Abstract: Cardiovascular diseases (CVDs) are the most common cause of death worldwide. Invasive x-ray coronary angiography (ICA) is one of the most important imaging modalities for the diagnosis of CVDs. ICA typically acquires only two 2D projections, which makes the 3D geometry of coronary vessels difficult to interpret, thus requiring 3D coronary tree reconstruction from two projections. State-of-the-art… ▽ More Cardiovascular diseases (CVDs) are the most common cause of death worldwide. Invasive x-ray coronary angiography (ICA) is one of the most important imaging modalities for the diagnosis of CVDs. ICA typically acquires only two 2D projections, which makes the 3D geometry of coronary vessels difficult to interpret, thus requiring 3D coronary tree reconstruction from two projections. State-of-the-art approaches require significant manual interactions and cannot correct the non-rigid cardiac and respiratory motions between non-simultaneous projections. In this study, we propose a novel deep learning pipeline. We leverage the Wasserstein conditional generative adversarial network with gradient penalty, latent convolutional transformer layers, and a dynamic snake convolutional critic to implicitly compensate for the non-rigid motion and provide 3D coronary tree reconstruction. Through simulating projections from coronary computed tomography angiography (CCTA), we achieve the generalisation of 3D coronary tree reconstruction on real non-simultaneous ICA projections. We incorporate an application-specific evaluation metric to validate our proposed model on both a CCTA dataset and a real ICA dataset, together with Chamfer L1 distance. The results demonstrate the good performance of our model in vessel topology preservation, recovery of missing features, and generalisation ability to real ICA data. To the best of our knowledge, this is the first study that leverages deep learning to achieve 3D coronary tree reconstruction from two real non-simultaneous x-ray angiography projections. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: 16 pages, 13 figures, 3 tables

arXiv:2407.06727 [pdf, other]

Towards Physics-informed Cyclic Adversarial Multi-PSF Lensless Imaging

Authors: Abeer Banerjee, Sanjay Singh

Abstract: Lensless imaging has emerged as a promising field within inverse imaging, offering compact, cost-effective solutions with the potential to revolutionize the computational camera market. By circumventing traditional optical components like lenses and mirrors, novel approaches like mask-based lensless imaging eliminate the need for conventional hardware. However, advancements in lensless image recon… ▽ More Lensless imaging has emerged as a promising field within inverse imaging, offering compact, cost-effective solutions with the potential to revolutionize the computational camera market. By circumventing traditional optical components like lenses and mirrors, novel approaches like mask-based lensless imaging eliminate the need for conventional hardware. However, advancements in lensless image reconstruction, particularly those leveraging Generative Adversarial Networks (GANs), are hindered by the reliance on data-driven training processes, resulting in network specificity to the Point Spread Function (PSF) of the imaging system. This necessitates a complete retraining for minor PSF changes, limiting adaptability and generalizability across diverse imaging scenarios. In this paper, we introduce a novel approach to multi-PSF lensless imaging, employing a dual discriminator cyclic adversarial framework. We propose a unique generator architecture with a sparse convolutional PSF-aware auxiliary branch, coupled with a forward model integrated into the training loop to facilitate physics-informed learning to handle the substantial domain gap between lensless and lensed images. Comprehensive performance evaluation and ablation studies underscore the effectiveness of our model, offering robust and adaptable lensless image reconstruction capabilities. Our method achieves comparable performance to existing PSF-agnostic generative methods for single PSF cases and demonstrates resilience to PSF changes without the need for retraining. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2405.11458 [pdf, other]

CPS-LLM: Large Language Model based Safe Usage Plan Generator for Human-in-the-Loop Human-in-the-Plant Cyber-Physical System

Authors: Ayan Banerjee, Aranyak Maity, Payal Kamboj, Sandeep K. S. Gupta

Abstract: We explore the usage of large language models (LLM) in human-in-the-loop human-in-the-plant cyber-physical systems (CPS) to translate a high-level prompt into a personalized plan of actions, and subsequently convert that plan into a grounded inference of sequential decision-making automated by a real-world CPS controller to achieve a control goal. We show that it is relatively straightforward to c… ▽ More We explore the usage of large language models (LLM) in human-in-the-loop human-in-the-plant cyber-physical systems (CPS) to translate a high-level prompt into a personalized plan of actions, and subsequently convert that plan into a grounded inference of sequential decision-making automated by a real-world CPS controller to achieve a control goal. We show that it is relatively straightforward to contextualize an LLM so it can generate domain-specific plans. However, these plans may be infeasible for the physical system to execute or the plan may be unsafe for human users. To address this, we propose CPS-LLM, an LLM retrained using an instruction tuning framework, which ensures that generated plans not only align with the physical system dynamics of the CPS but are also safe for human users. The CPS-LLM consists of two innovative components: a) a liquid time constant neural network-based physical dynamics coefficient estimator that can derive coefficients of dynamical models with some unmeasured state variables; b) the model coefficients are then used to train an LLM with prompts embodied with traces from the dynamical system and the corresponding model coefficients. We show that when the CPS-LLM is integrated with a contextualized chatbot such as BARD it can generate feasible and safe plans to manage external events such as meals for automated insulin delivery systems used by Type 1 Diabetes subjects. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: Accepted for publication in AAAI 2024, Planning for Cyber Physical Systems

arXiv:2404.17045 [pdf, other]

Toward Automated Formation of Composite Micro-Structures Using Holographic Optical Tweezers

Authors: Tommy Zhang, Nicole Werner, Ashis G. Banerjee

Abstract: Holographic Optical Tweezers (HOT) are powerful tools that can manipulate micro and nano-scale objects with high accuracy and precision. They are most commonly used for biological applications, such as cellular studies, and more recently, micro-structure assemblies. Automation has been of significant interest in the HOT field, since human-run experiments are time-consuming and require skilled oper… ▽ More Holographic Optical Tweezers (HOT) are powerful tools that can manipulate micro and nano-scale objects with high accuracy and precision. They are most commonly used for biological applications, such as cellular studies, and more recently, micro-structure assemblies. Automation has been of significant interest in the HOT field, since human-run experiments are time-consuming and require skilled operator(s). Automated HOTs, however, commonly use point traps, which focus high intensity laser light at specific spots in fluid media to attract and move micro-objects. In this paper, we develop a novel automated system of tweezing multiple micro-objects more efficiently using multiplexed optical traps. Multiplexed traps enable the simultaneous trapping of multiple beads in various alternate multiplexing formations, such as annular rings and line patterns. Our automated system is realized by augmenting the capabilities of a commercially available HOT with real-time bead detection and tracking, and wavefront-based path planning. We demonstrate the usefulness of the system by assembling two different composite micro-structures, comprising 5 $μm$ polystyrene beads, using both annular and line shaped traps in obstacle-rich environments. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: To appear in the Proceedings of the 2024 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS)

arXiv:2403.10581 [pdf, other]

Large Language Model-informed ECG Dual Attention Network for Heart Failure Risk Prediction

Authors: Chen Chen, Lei Li, Marcel Beetz, Abhirup Banerjee, Ramneek Gupta, Vicente Grau

Abstract: Heart failure (HF) poses a significant public health challenge, with a rising global mortality rate. Early detection and prevention of HF could significantly reduce its impact. We introduce a novel methodology for predicting HF risk using 12-lead electrocardiograms (ECGs). We present a novel, lightweight dual-attention ECG network designed to capture complex ECG features essential for early HF ris… ▽ More Heart failure (HF) poses a significant public health challenge, with a rising global mortality rate. Early detection and prevention of HF could significantly reduce its impact. We introduce a novel methodology for predicting HF risk using 12-lead electrocardiograms (ECGs). We present a novel, lightweight dual-attention ECG network designed to capture complex ECG features essential for early HF risk prediction, despite the notable imbalance between low and high-risk groups. This network incorporates a cross-lead attention module and twelve lead-specific temporal attention modules, focusing on cross-lead interactions and each lead's local dynamics. To further alleviate model overfitting, we leverage a large language model (LLM) with a public ECG-Report dataset for pretraining on an ECG-report alignment task. The network is then fine-tuned for HF risk prediction using two specific cohorts from the UK Biobank study, focusing on patients with hypertension (UKB-HYP) and those who have had a myocardial infarction (UKB-MI).The results reveal that LLM-informed pre-training substantially enhances HF risk prediction in these cohorts. The dual-attention design not only improves interpretability but also predictive accuracy, outperforming existing competitive methods with C-index scores of 0.6349 for UKB-HYP and 0.5805 for UKB-MI. This demonstrates our method's potential in advancing HF risk assessment with clinical complex ECG data. △ Less

Submitted 22 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: Under journal revision

arXiv:2403.02909 [pdf, other]

Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks

Authors: Abeer Banerjee, Naval K. Mehta, Shyam S. Prasad, Himanshu, Sumeet Saurav, Sanjay Singh

Abstract: In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding metho… ▽ More In this paper, we address the intricate challenge of gaze vector prediction, a pivotal task with applications ranging from human-computer interaction to driver monitoring systems. Our innovative approach is designed for the demanding setting of extremely low-light conditions, leveraging a novel temporal event encoding scheme, and a dedicated neural network architecture. The temporal encoding method seamlessly integrates Dynamic Vision Sensor (DVS) events with grayscale guide frames, generating consecutively encoded images for input into our neural network. This unique solution not only captures diverse gaze responses from participants within the active age group but also introduces a curated dataset tailored for low-light conditions. The encoded temporal frames paired with our network showcase impressive spatial localization and reliable gaze direction in their predictions. Achieving a remarkable 100-pixel accuracy of 100%, our research underscores the potency of our neural network to work with temporally consecutive encoded images for precise gaze vector predictions in challenging low-light videos, contributing to the advancement of gaze prediction technologies. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2401.13345 [pdf]

FPGA Implementation of an Intelligent Traffic Light Controller (I-TLC) in Verilog

Authors: Apoorva Banerjee

Abstract: The objective of this paper is to design and implement an intelligent Traffic Light Controller system for a four way road intersection. The design is carried out using Verilog, and the hardware is implemented on a FPGA. The chosen intersection involves a 'main road' (heavy traffic flow) and a 'side road' (less traffic flow), which is equipped with sensors to detect the presence of traffic or pedes… ▽ More The objective of this paper is to design and implement an intelligent Traffic Light Controller system for a four way road intersection. The design is carried out using Verilog, and the hardware is implemented on a FPGA. The chosen intersection involves a 'main road' (heavy traffic flow) and a 'side road' (less traffic flow), which is equipped with sensors to detect the presence of traffic or pedestrians. The functionality of the system has undergone thorough verification through simulations conducted in the Xilinx ISE Design Studio software environment. Furthermore, it has been physically deployed on a Xilinx Spartan-3E FPGA board xc3s500e-4-fg320. A traffic light controller can be realized through the use of a microcontroller, Application-Specific Integrated Circuits (ASICs), or Field-Programmable Gate Arrays (FPGAs). FPGAs however offer significant advantages in terms of re-programmability, speed, and parallel processing capabilities, making them ideally suited for implementing complex, adaptive logic required by smart traffic management systems; thus, making this model of TLC extremely adaptive and cost efficient at the same time as compared to other existing models with reduced hardware usage and delay constraints. △ Less

Submitted 23 February, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: The nature of the changes in the updated version involves incorporating synthesis work. Additionally, hardware implementation results on the FPGA board have been added. Moderate changes have been made, as they introduce new aspects related to synthesis and provide valuable insights into the hardware implementation, but they do not alter or affect the existing simulation results

arXiv:2312.14844 [pdf, other]

doi 10.1007/s10162-024-00927-4

An Implantable Piezofilm Middle Ear Microphone: Performance in Human Cadaveric Temporal Bones

Authors: John Z. Zhang, Lukas Graf, Annesya Banerjee, Aaron Yeiser, Christopher I. McHugh, Ioannis Kymissis, Jeffrey H. Lang, Elizabeth S. Olson, Hideko Heidi Nakajima

Abstract: Purpose: One of the major reasons that totally implantable cochlear microphones are not readily available is the lack of good implantable microphones. An implantable microphone has the potential to provide a range of benefits over external microphones for cochlear implant users including the filtering ability of the outer ear, cosmetics, and usability in all situations. This paper presents results… ▽ More Purpose: One of the major reasons that totally implantable cochlear microphones are not readily available is the lack of good implantable microphones. An implantable microphone has the potential to provide a range of benefits over external microphones for cochlear implant users including the filtering ability of the outer ear, cosmetics, and usability in all situations. This paper presents results from experiments in human cadaveric ears of a piezofilm microphone concept under development as a possible component of a future implantable microphone system for use with cochlear implants. This microphone is referred to here as a drum microphone (DrumMic) that senses the robust and predictable motion of the umbo, the tip of the malleus. Methods: The performance was measured of five DrumMics inserted in four different human cadaveric temporal bones. Sensitivity, linearity, bandwidth, and equivalent input noise were measured during these experiments using a sound stimulus and measurement setup. Results: The sensitivity of the DrumMics was found to be tightly clustered across different microphones and ears despite differences in umbo and middle ear anatomy. The DrumMics were shown to behave linearly across a large dynamic range (46 dB SPL to 100 dB SPL) across a wide bandwidth (100 Hz to 8 kHz). The equivalent input noise (0.1-10 kHz) of the DrumMic and amplifier referenced to the ear canal was measured to be 54 dB SPL and estimated to be 46 dB SPL after accounting for the pressure gain of the outer ear. Conclusion: The results demonstrate that the DrumMic behaves robustly across ears and fabrication. The equivalent input noise performance was shown to approach that of commercial hearing aid microphones. To advance this demonstration of the DrumMic concept to a future prototype implantable in humans, work on encapsulation, biocompatibility, connectorization will be required. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.13976 [pdf]

Anatomical basis of human sex differences in ECG identified by automated torso-cardiac three-dimensional reconstruction

Authors: Hannah J. Smith, Blanca Rodriguez, Yuling Sang, Marcel Beetz, Robin Choudhury, Vicente Grau, Abhirup Banerjee

Abstract: Background and Aims: The electrocardiogram (ECG) is routinely used for diagnosis and risk stratification following myocardial infarction (MI), though its interpretation is confounded by anatomical variability and sex differences. Women have a higher incidence of missed MI diagnosis and poorer outcomes following infarction. Sex differences in ECG biomarkers and torso-ventricular anatomy have not be… ▽ More Background and Aims: The electrocardiogram (ECG) is routinely used for diagnosis and risk stratification following myocardial infarction (MI), though its interpretation is confounded by anatomical variability and sex differences. Women have a higher incidence of missed MI diagnosis and poorer outcomes following infarction. Sex differences in ECG biomarkers and torso-ventricular anatomy have not been well characterised, largely due to the absence of high-throughput torso reconstruction methods. Methods: This work presents quantification of sex differences in ECG versus anatomical biomarkers in healthy and post-MI subjects, enabled by a novel, end-to-end automated pipeline for torso-ventricular anatomical reconstruction from clinically standard cardiac magnetic resonance imaging. Personalised 3D torso-ventricular reconstructions were generated for 425 post-MI subjects and 1051 healthy controls from the UK Biobank. Regression models were created relating the extracted torso-ventricular and ECG parameters. Results: Half the sex difference in QRS durations is explained by smaller ventricles in women both in healthy ($3.4 \pm 1.3$ms of $6.0 \pm 1.5$ms) and post-MI ($4.5 \pm 1.4$ms of $8.3 \pm 2.5$ms) subjects. Lower baseline STj amplitude in women is also associated with smaller ventricles, and more superior and posterior cardiac position. Post-MI T wave amplitude and R axis deviations are more strongly associated with a more posterior and horizontal cardiac position in women rather than electrophysiology as in men. Conclusion: A novel computational pipeline enables the three-dimensional reconstruction of 1476 torso-cardiac geometries of healthy and post-myocardial infarction subjects, quantification of sex and BMI-related differences and association with ECG biomarkers. Any ECG-based tool should be reviewed considering anatomical sex differences to avoid sex-biased outcomes. △ Less

Submitted 17 July, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Paper under revision

arXiv:2312.13752 [pdf]

doi 10.1016/j.media.2024.103253

Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

Authors: Yang Nan, Xiaodan Xing, Shiyi Wang, Zeyu Tang, Federico N Felder, Sheng Zhang, Roberta Eufrasia Ledda, Xiaoliu Ding, Ruiqi Yu, Weiping Liu, Feng Shi, Tianyang Sun, Zehong Cao, Minghui Zhang, Yun Gu, Hanxiao Zhang, Jian Gao, Pingyu Wang, Wen Tang, Pengxin Yu, Han Kang, Junqiang Chen, Xing Lu, Boyu Zhang, Michail Mamalakis , et al. (16 additional authors not shown)

Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric… ▽ More Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers. △ Less

Submitted 16 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 19 pages

arXiv:2309.06558 [pdf, other]

High Fidelity Fast Simulation of Human in the Loop Human in the Plant (HIL-HIP) Systems

Authors: Ayan Banerjee, Payal Kamboj, Aranyak Maity, Riya Sudhakar Salian, Sandeep K. S. Gupta

Abstract: Non-linearities in simulation arise from the time variance in wireless mobile networks when integrated with human in the loop, human in the plant (HIL-HIP) physical systems under dynamic contexts, leading to simulation slowdown. Time variance is handled by deriving a series of piece wise linear time invariant simulations (PLIS) in intervals, which are then concatenated in time domain. In this pape… ▽ More Non-linearities in simulation arise from the time variance in wireless mobile networks when integrated with human in the loop, human in the plant (HIL-HIP) physical systems under dynamic contexts, leading to simulation slowdown. Time variance is handled by deriving a series of piece wise linear time invariant simulations (PLIS) in intervals, which are then concatenated in time domain. In this paper, we conduct a formal analysis of the impact of discretizing time-varying components in wireless network-controlled HIL-HIP systems on simulation accuracy and speedup, and evaluate trade-offs with reliable guarantees. We develop an accurate simulation framework for an artificial pancreas wireless network system that controls blood glucose in Type 1 Diabetes patients with time varying properties such as physiological changes associated with psychological stress and meal patterns. PLIS approach achieves accurate simulation with greater than 2.1 times speedup than a non-linear system simulation for the given dataset. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: To appear in ACM MSWIM 2023

arXiv:2309.04856 [pdf, other]

AmbientFlow: Invertible generative models from incomplete, noisy measurements

Authors: Varun A. Kelkar, Rucha Deshpande, Arindam Banerjee, Mark A. Anastasio

Abstract: Generative models have gained popularity for their potential applications in imaging science, such as image reconstruction, posterior sampling and data sharing. Flow-based generative models are particularly attractive due to their ability to tractably provide exact density estimates along with fast, inexpensive and diverse samples. Training such models, however, requires a large, high quality data… ▽ More Generative models have gained popularity for their potential applications in imaging science, such as image reconstruction, posterior sampling and data sharing. Flow-based generative models are particularly attractive due to their ability to tractably provide exact density estimates along with fast, inexpensive and diverse samples. Training such models, however, requires a large, high quality dataset of objects. In applications such as computed imaging, it is often difficult to acquire such data due to requirements such as long acquisition time or high radiation dose, while acquiring noisy or partially observed measurements of these objects is more feasible. In this work, we propose AmbientFlow, a framework for learning flow-based generative models directly from noisy and incomplete data. Using variational Bayesian methods, a novel framework for establishing flow-based generative models from noisy, incomplete data is proposed. Extensive numerical studies demonstrate the effectiveness of AmbientFlow in learning the object distribution. The utility of AmbientFlow in a downstream inference task of image reconstruction is demonstrated. △ Less

Submitted 13 December, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

Comments: Accepted to Transactions on Machine Learning Research (TMLR). OpenReview: https://openreview.net/forum?id=txpYITR8oa

arXiv:2309.02603 [pdf, other]

Detection of Unknown-Unknowns in Human-in-Plant Human-in-Loop Systems Using Physics Guided Process Models

Authors: Aranyak Maity, Ayan Banerjee, Sandeep Gupta

Abstract: Unknown-unknowns are operational scenarios in systems that are not accounted for in the design and test phase. In such scenarios, the operational behavior of the Human-in-loop (HIL) Human-in-Plant (HIP) systems is not guaranteed to meet requirements such as safety and efficacy. We propose a novel framework for analyzing the operational output characteristics of safety-critical HIL-HIP systems that… ▽ More Unknown-unknowns are operational scenarios in systems that are not accounted for in the design and test phase. In such scenarios, the operational behavior of the Human-in-loop (HIL) Human-in-Plant (HIP) systems is not guaranteed to meet requirements such as safety and efficacy. We propose a novel framework for analyzing the operational output characteristics of safety-critical HIL-HIP systems that can discover unknown-unknown scenarios and evaluate potential safety hazards. We propose dynamics-induced hybrid recurrent neural networks (DiH-RNN) to mine a physics-guided surrogate model (PGSM) that checks for deviation of the cyber-physical system (CPS) from safety-certified operational characteristics. The PGSM enables early detection of unknown-unknowns based on the physical laws governing the system. We demonstrate the detection of operational changes in an Artificial Pancreas(AP) due to unknown insulin cartridge errors. △ Less

Submitted 12 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.06382 [pdf, other]

Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion

Authors: Siyuan Shan, Yang Li, Amartya Banerjee, Junier B. Oliva

Abstract: Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of ta… ▽ More Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of target speaker voice data to achieve high intelligibility. In this work, we propose a novel method \textit{Phoneme Hallucinator} that achieves the best of both worlds. Phoneme Hallucinator is a one-shot VC model; it adopts a novel model to hallucinate diversified and high-fidelity target speaker phonemes based just on a short target speaker voice (e.g. 3 seconds). The hallucinated phonemes are then exploited to perform neighbor-based voice conversion. Our model is a text-free, any-to-any VC model that requires no text annotations and supports conversion to any unseen speaker. Objective and subjective evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity. △ Less

Submitted 30 December, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

Comments: AAAI 2024 Demo, Codes: https://phonemehallucinator.github.io/

arXiv:2307.11017 [pdf, other]

Multi-objective point cloud autoencoders for explainable myocardial infarction prediction

Authors: Marcel Beetz, Abhirup Banerjee, Vicente Grau

Abstract: Myocardial infarction (MI) is one of the most common causes of death in the world. Image-based biomarkers commonly used in the clinic, such as ejection fraction, fail to capture more complex patterns in the heart's 3D anatomy and thus limit diagnostic accuracy. In this work, we present the multi-objective point cloud autoencoder as a novel geometric deep learning approach for explainable infarctio… ▽ More Myocardial infarction (MI) is one of the most common causes of death in the world. Image-based biomarkers commonly used in the clinic, such as ejection fraction, fail to capture more complex patterns in the heart's 3D anatomy and thus limit diagnostic accuracy. In this work, we present the multi-objective point cloud autoencoder as a novel geometric deep learning approach for explainable infarction prediction, based on multi-class 3D point cloud representations of cardiac anatomy and function. Its architecture consists of multiple task-specific branches connected by a low-dimensional latent space to allow for effective multi-objective learning of both reconstruction and MI prediction, while capturing pathology-specific 3D shape information in an interpretable latent space. Furthermore, its hierarchical branch design with point cloud-based deep learning operations enables efficient multi-scale feature learning directly on high-resolution anatomy point clouds. In our experiments on a large UK Biobank dataset, the multi-objective point cloud autoencoder is able to accurately reconstruct multi-temporal 3D shapes with Chamfer distances between predicted and input anatomies below the underlying images' pixel resolution. Our method outperforms multiple machine learning and deep learning benchmarks for the task of incident MI prediction by 19% in terms of Area Under the Receiver Operating Characteristic curve. In addition, its task-specific compact latent space exhibits easily separable control and MI clusters with clinically plausible associations between subject encodings and corresponding 3D shapes, thus demonstrating the explainability of the prediction. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.10927 [pdf, other]

Modeling 3D cardiac contraction and relaxation with point cloud deformation networks

Authors: Marcel Beetz, Abhirup Banerjee, Vicente Grau

Abstract: Global single-valued biomarkers of cardiac function typically used in clinical practice, such as ejection fraction, provide limited insight on the true 3D cardiac deformation process and hence, limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D car… ▽ More Global single-valued biomarkers of cardiac function typically used in clinical practice, such as ejection fraction, provide limited insight on the true 3D cardiac deformation process and hence, limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D cardiac contraction and relaxation between the extreme ends of the cardiac cycle. It employs the recent advances in point cloud-based deep learning into an encoder-decoder structure, in order to enable efficient multi-scale feature learning directly on multi-class 3D point cloud representations of the cardiac anatomy. We evaluate our approach on a large dataset of over 10,000 cases from the UK Biobank study and find average Chamfer distances between the predicted and ground truth anatomies below the pixel resolution of the underlying image acquisition. Furthermore, we observe similar clinical metrics between predicted and ground truth populations and show that the PCD-Net can successfully capture subpopulation-specific differences between normal subjects and myocardial infarction (MI) patients. We then demonstrate that the learned 3D deformation patterns outperform multiple clinical benchmarks by 13% and 7% in terms of area under the receiver operating characteristic curve for the tasks of prevalent MI detection and incident MI prediction and by 7% in terms of Harrell's concordance index for MI survival analysis. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.08535 [pdf, other]

Multi-class point cloud completion networks for 3D cardiac anatomy reconstruction from cine magnetic resonance images

Authors: Marcel Beetz, Abhirup Banerjee, Julius Ossenberg-Engels, Vicente Grau

Abstract: Cine magnetic resonance imaging (MRI) is the current gold standard for the assessment of cardiac anatomy and function. However, it typically only acquires a set of two-dimensional (2D) slices of the underlying three-dimensional (3D) anatomy of the heart, thus limiting the understanding and analysis of both healthy and pathological cardiac morphology and physiology. In this paper, we propose a nove… ▽ More Cine magnetic resonance imaging (MRI) is the current gold standard for the assessment of cardiac anatomy and function. However, it typically only acquires a set of two-dimensional (2D) slices of the underlying three-dimensional (3D) anatomy of the heart, thus limiting the understanding and analysis of both healthy and pathological cardiac morphology and physiology. In this paper, we propose a novel fully automatic surface reconstruction pipeline capable of reconstructing multi-class 3D cardiac anatomy meshes from raw cine MRI acquisitions. Its key component is a multi-class point cloud completion network (PCCN) capable of correcting both the sparsity and misalignment issues of the 3D reconstruction task in a unified model. We first evaluate the PCCN on a large synthetic dataset of biventricular anatomies and observe Chamfer distances between reconstructed and gold standard anatomies below or similar to the underlying image resolution for multiple levels of slice misalignment. Furthermore, we find a reduction in reconstruction error compared to a benchmark 3D U-Net by 32% and 24% in terms of Hausdorff distance and mean surface distance, respectively. We then apply the PCCN as part of our automated reconstruction pipeline to 1000 subjects from the UK Biobank study in a cross-domain transfer setting and demonstrate its ability to reconstruct accurate and topologically plausible biventricular heart meshes with clinical metrics comparable to the previous literature. Finally, we investigate the robustness of our proposed approach and observe its capacity to successfully handle multiple common outlier conditions. △ Less

Submitted 18 July, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.07298 [pdf, other]

3D Shape-Based Myocardial Infarction Prediction Using Point Cloud Classification Networks

Authors: Marcel Beetz, Yilong Yang, Abhirup Banerjee, Lei Li, Vicente Grau

Abstract: Myocardial infarction (MI) is one of the most prevalent cardiovascular diseases with associated clinical decision-making typically based on single-valued imaging biomarkers. However, such metrics only approximate the complex 3D structure and physiology of the heart and hence hinder a better understanding and prediction of MI outcomes. In this work, we investigate the utility of complete 3D cardiac… ▽ More Myocardial infarction (MI) is one of the most prevalent cardiovascular diseases with associated clinical decision-making typically based on single-valued imaging biomarkers. However, such metrics only approximate the complex 3D structure and physiology of the heart and hence hinder a better understanding and prediction of MI outcomes. In this work, we investigate the utility of complete 3D cardiac shapes in the form of point clouds for an improved detection of MI events. To this end, we propose a fully automatic multi-step pipeline consisting of a 3D cardiac surface reconstruction step followed by a point cloud classification network. Our method utilizes recent advances in geometric deep learning on point clouds to enable direct and efficient multi-scale learning on high-resolution surface models of the cardiac anatomy. We evaluate our approach on 1068 UK Biobank subjects for the tasks of prevalent MI detection and incident MI prediction and find improvements of ~13% and ~5% respectively over clinical benchmarks. Furthermore, we analyze the role of each ventricle and cardiac phase for 3D shape-based MI detection and conduct a visual analysis of the morphological and physiological patterns typically associated with MI outcomes. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: Accepted at EMBC 2023

arXiv:2307.04421 [pdf, other]

Towards Enabling Cardiac Digital Twins of Myocardial Infarction Using Deep Computational Models for Inverse Inference

Authors: Lei Li, Julia Camps, Zhinuo, Wang, Abhirup Banerjee, Marcel Beetz, Blanca Rodriguez, Vicente Grau

Abstract: Cardiac digital twins (CDTs) have the potential to offer individualized evaluation of cardiac function in a non-invasive manner, making them a promising approach for personalized diagnosis and treatment planning of my-ocardial infarction (MI). The inference of accurate myocardial tissue properties is crucial in creating a reliable CDT of MI. In this work, we investigate the feasibility of inferrin… ▽ More Cardiac digital twins (CDTs) have the potential to offer individualized evaluation of cardiac function in a non-invasive manner, making them a promising approach for personalized diagnosis and treatment planning of my-ocardial infarction (MI). The inference of accurate myocardial tissue properties is crucial in creating a reliable CDT of MI. In this work, we investigate the feasibility of inferring myocardial tissue properties from the electrocardiogram (ECG) within a CDT platform. The platform integrates multi-modal data, such as cardiac MRI and ECG, to enhance the accuracy and reliability of the inferred tissue properties. We perform a sensitivity analysis based on computer simulations, systematically exploring the effects of infarct location, size, degree of transmurality, and electrical ac-tivity alteration on the simulated QRS complex of ECG, to establish the limits of the approach. We subsequently present a novel deep computational model, comprising a dual-branch variational autoencoder and an inference model, to infer infarct location and distribution from the simulated QRS. The proposed model achieves mean Dice scores of 0.457 \pm 0.317 and 0.302 \pm 0.273 for the inference of left ventricle scars and border zone, respectively. The sensitivity analysis enhances our understanding of the complex relationship between infarct characteristics and electrophysiological features. The in silico experimental results show that the model can effectively capture the relationship for the inverse inference, with promising potential for clinical application in the future. The code will be released publicly once the manuscript is accepted for publication. △ Less

Submitted 14 February, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: Cardiac digital twins; Inverse inference; Myocardial infarction

MSC Class: N/A

arXiv:2306.09424 [pdf, other]

SSL4EO-L: Datasets and Foundation Models for Landsat Imagery

Authors: Adam J. Stewart, Nils Lehmann, Isaac A. Corley, Yi Wang, Yi-Chia Chang, Nassim Ait Ali Braham, Shradha Sehgal, Caleb Robinson, Arindam Banerjee

Abstract: The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites. The multispectral imagery captured by sensors onboard these satellites is critical for a wide range of scientific fields. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests fo… ▽ More The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites. The multispectral imagery captured by sensors onboard these satellites is critical for a wide range of scientific fields. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests for Landsat image analysis due to the prevalence of small labeled datasets and lack of foundation models. In this paper, we introduce SSL4EO-L, the first ever dataset designed for Self-Supervised Learning for Earth Observation for the Landsat family of satellites (including 3 sensors and 2 product levels) and the largest Landsat dataset in history (5M image patches). Additionally, we modernize and re-release the L7 Irish and L8 Biome cloud detection datasets, and introduce the first ML benchmark datasets for Landsats 4-5 TM and Landsat 7 ETM+ SR. Finally, we pre-train the first foundation models for Landsat imagery using SSL4EO-L and evaluate their performance on multiple semantic segmentation tasks. All datasets and model weights are available via the TorchGeo (https://github.com/microsoft/torchgeo) library, making reproducibility and experimentation easy, and enabling scientific advancements in the burgeoning field of remote sensing for a multitude of downstream applications. △ Less

Submitted 22 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.02680 [pdf, other]

BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion

Authors: Ahana Deb, Sayan Nag, Ayan Mahapatra, Soumitri Chattopadhyay, Aritra Marik, Pijush Kanti Gayen, Shankha Sanyal, Archi Banerjee, Samir Karmakar

Abstract: Spoken languages often utilise intonation, rhythm, intensity, and structure, to communicate intention, which can be interpreted differently depending on the rhythm of speech of their utterance. These speech acts provide the foundation of communication and are unique in expression to the language. Recent advancements in attention-based models, demonstrating their ability to learn powerful represent… ▽ More Spoken languages often utilise intonation, rhythm, intensity, and structure, to communicate intention, which can be interpreted differently depending on the rhythm of speech of their utterance. These speech acts provide the foundation of communication and are unique in expression to the language. Recent advancements in attention-based models, demonstrating their ability to learn powerful representations from multilingual datasets, have performed well in speech tasks and are ideal to model specific tasks in low resource languages. Here, we develop a novel multimodal approach combining two models, wav2vec2.0 for audio and MarianMT for text translation, by using multimodal attention fusion to predict speech acts in our prepared Bengali speech corpus. We also show that our model BeAts ($\underline{\textbf{Be}}$ngali speech acts recognition using Multimodal $\underline{\textbf{At}}$tention Fu$\underline{\textbf{s}}$ion) significantly outperforms both the unimodal baseline using only speech data and a simpler bimodal fusion using both speech and text data. Project page: https://soumitri2001.github.io/BeAts △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted at INTERSPEECH 2023

arXiv:2211.03209 [pdf, other]

Robust Decentralized Secondary Control Scheme for Inverter-based Power Networks

Authors: Siddharth Bhela, Abhishek Banerjee, Ulrich Muenz, Joachim Bamberger

Abstract: Inverter-dominated microgrids are quickly becoming a key building block of future power systems. They rely on centralized controllers that can provide reliability and resiliency in extreme events. Nonetheless, communication failures due to cyber-physical attacks or natural disasters can make autonomous operation of islanded microgrids challenging. This paper examines a unified decentralized second… ▽ More Inverter-dominated microgrids are quickly becoming a key building block of future power systems. They rely on centralized controllers that can provide reliability and resiliency in extreme events. Nonetheless, communication failures due to cyber-physical attacks or natural disasters can make autonomous operation of islanded microgrids challenging. This paper examines a unified decentralized secondary control scheme that is robust to inverter clock synchronization errors and can be seamlessly applied to grid-following or grid-forming control architectures. The proposed scheme overcomes the well-known stability problem that arises from parallel operation of local integral controllers. Theoretical guarantees for stability are provided along with criteria to appropriately tune the secondary control gains to achieve good frequency regulation performance while ensuring fair power sharing. The efficacy of our approach in eliminating the steady-state frequency deviation is demonstrated through simulations on a 5-bus microgrid with four grid-forming inverters. △ Less

Submitted 9 July, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

Comments: 7 pages, 9 figures

arXiv:2209.06618 [pdf, ps, other]

Safe Autonomous Docking Maneuvers for a Floating Platform based on Input Sharing Control Barrier Functions

Authors: Akshit Saradagi, Avijit Banerjee, Sumeet Satpute, George Nikolakopoulos

Abstract: In this article, we present a control strategy for the problem of safe autonomous docking for a planar floating platform (Slider) that emulates the movement of a satellite. Employing the proposed strategy, Slider approaches a docking port with the right orientation, maintaining a safe distance, while always keeping a visual lock on the docking port throughout the docking maneuver. Control barrier… ▽ More In this article, we present a control strategy for the problem of safe autonomous docking for a planar floating platform (Slider) that emulates the movement of a satellite. Employing the proposed strategy, Slider approaches a docking port with the right orientation, maintaining a safe distance, while always keeping a visual lock on the docking port throughout the docking maneuver. Control barrier functions are designed to impose the safety, direction of approach and visual locking constraints. Three control inputs of the Slider are shared among three barrier functions in enforcing the constraints. It is proved that the control inputs are shared in a conflict-free manner in rendering the sets defining safety and visual locking constraints forward invariant and in establishing finite-time convergence to the visual locking mode. The conflict-free input-sharing ensures the feasibility of a quadratic program that generates minimally-invasive corrections for a nominal controller, that is designed to track the docking port, so that the barrier constraints are respected throughout the docking maneuver. The efficacy of the proposed control design approach is validated through various simulations. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: 8 Pages, 5 Figures, Accepted for presentation of 61st IEEE Conference on Decision and Control, Dec. 6-9, 2022, in Cancun, Mexico

arXiv:2106.02348 [pdf]

doi 10.1088/978-0-7503-3795-3ch6

A Residual Network based Deep Learning Model for Detection of COVID-19 from Cough Sounds

Authors: Annesya Banerjee, Achal Nilhani

Abstract: The present work proposes a deep-learning-based approach for the classification of COVID-19 coughs from non-COVID-19 coughs and that can be used as a low-resource-based tool for early detection of the onset of such respiratory diseases. The proposed system uses the ResNet-50 architecture, a popularly known Convolutional Neural Network (CNN) for image recognition tasks, fed with the log-Mel spectru… ▽ More The present work proposes a deep-learning-based approach for the classification of COVID-19 coughs from non-COVID-19 coughs and that can be used as a low-resource-based tool for early detection of the onset of such respiratory diseases. The proposed system uses the ResNet-50 architecture, a popularly known Convolutional Neural Network (CNN) for image recognition tasks, fed with the log-Mel spectrums of the audio data to discriminate between the two types of coughs. For the training and validation of the proposed deep learning model, this work utilizes the Track-1 dataset provided by the DiCOVA Challenge 2021 organizers. Additionally, to increase the number of COVID-positive samples and to enhance variability in the training data, it has also utilized a large open-source database of COVID-19 coughs collected by the EPFL CoughVid team. Our developed model has achieved an average validation AUC of 98.88%. Also, applying this model on the Blind Test Set released by the DiCOVA Challenge, the system has achieved a Test AUC of 75.91%, Test Specificity of 62.50%, and Test Sensitivity of 80.49%. Consequently, this submission has secured 16th position in the DiCOVA Challenge 2021 leader-board. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2105.02819 [pdf, other]

doi 10.1109/MM.2021.3137401

Evaluating Sensor Data Quality in Internet ofThings Smart Agriculture Applications

Authors: Kaneez Fizza, Prem Prakash Jayaraman, Abhik Banerjee, Dimitrios Georgakopoulos, Rajiv Ranjan

Abstract: The unprecedented growth of Internet of Things (IoT) and its applications in areas such as Smart Agriculture compels the need to devise newer ways for evaluating the quality of such applications. While existing models for application quality focus on the quality experienced by the end-user (captured using likert scale), IoT applications have minimal human involvement and rely on machine to machine… ▽ More The unprecedented growth of Internet of Things (IoT) and its applications in areas such as Smart Agriculture compels the need to devise newer ways for evaluating the quality of such applications. While existing models for application quality focus on the quality experienced by the end-user (captured using likert scale), IoT applications have minimal human involvement and rely on machine to machine communication and analytics to drive decision via actuations. In this paper, we first present a conceptual framework for the evaluation of IoT application quality. Subsequently, we propose, develop and validate via empirical evaluations a novel model for evaluating sensor data quality that is a key component in assessing IoT application quality. We present an implementation of the sensor data quality model and demonstrate how the IoT sensor data quality can be integrated with a Smart Agriculture application. Results of experimental evaluations conducted using data from a real-world testbed concludes the paper. △ Less

Submitted 28 April, 2021; originally announced May 2021.

Comments: Technical Report under review with IEEE micro

Report number: 1937-4143

Journal ref: IEEE Micro 21 December 2021

arXiv:2104.04006 [pdf, other]

doi 10.1016/j.compmedimag.2021.102008

DenResCov-19: A deep transfer learning network for robust automatic classification of COVID-19, pneumonia, and tuberculosis from X-rays

Authors: Michail Mamalakis, Andrew J. Swift, Bart Vorselaars, Surajit Ray, Simonne Weeks, Weiping Ding, Richard H. Clayton, Louise S. Mackenzie, Abhirup Banerjee

Abstract: The global pandemic of COVID-19 is continuing to have a significant effect on the well-being of global population, increasing the demand for rapid testing, diagnosis, and treatment. Along with COVID-19, other etiologies of pneumonia and tuberculosis constitute additional challenges to the medical system. In this regard, the objective of this work is to develop a new deep transfer learning pipeline… ▽ More The global pandemic of COVID-19 is continuing to have a significant effect on the well-being of global population, increasing the demand for rapid testing, diagnosis, and treatment. Along with COVID-19, other etiologies of pneumonia and tuberculosis constitute additional challenges to the medical system. In this regard, the objective of this work is to develop a new deep transfer learning pipeline to diagnose patients with COVID-19, pneumonia, and tuberculosis, based on chest x-ray images. We observed in some instances DenseNet and Resnet have orthogonal performances. In our proposed model, we have created an extra layer with convolutional neural network blocks to combine these two models to establish superior performance over either model. The same strategy can be useful in other applications where two competing networks with complementary performance are observed. We have tested the performance of our proposed network on two-class (pneumonia vs healthy), three-class (including COVID-19), and four-class (including tuberculosis) classification problems. The proposed network has been able to successfully classify these lung diseases in all four datasets and has provided significant improvement over the benchmark networks of DenseNet, ResNet, and Inception-V3. These novel findings can deliver a state-of-the-art pre-screening fast-track decision network to detect COVID-19 and other lung pathologies. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Report number: 102008, 0895-6111

Journal ref: 2021, Computerized Medical Imaging and Graphics

arXiv:2102.06038 [pdf]

A Fractal Approach to Characterize Emotions in Audio and Visual Domain: A Study on Cross-Modal Interaction

Authors: Sayan Nag, Uddalok Sarkar, Shankha Sanyal, Archi Banerjee, Souparno Roy, Samir Karmakar, Ranjan Sengupta, Dipak Ghosh

Abstract: It is already known that both auditory and visual stimulus is able to convey emotions in human mind to different extent. The strength or intensity of the emotional arousal vary depending on the type of stimulus chosen. In this study, we try to investigate the emotional arousal in a cross-modal scenario involving both auditory and visual stimulus while studying their source characteristics. A robus… ▽ More It is already known that both auditory and visual stimulus is able to convey emotions in human mind to different extent. The strength or intensity of the emotional arousal vary depending on the type of stimulus chosen. In this study, we try to investigate the emotional arousal in a cross-modal scenario involving both auditory and visual stimulus while studying their source characteristics. A robust fractal analytic technique called Detrended Fluctuation Analysis (DFA) and its 2D analogue has been used to characterize three (3) standardized audio and video signals quantifying their scaling exponent corresponding to positive and negative valence. It was found that there is significant difference in scaling exponents corresponding to the two different modalities. Detrended Cross Correlation Analysis (DCCA) has also been applied to decipher degree of cross-correlation among the individual audio and visual stimulus. This is the first of its kind study which proposes a novel algorithm with which emotional arousal can be classified in cross-modal scenario using only the source audio and visual signals while also attempting a correlation between them. △ Less

Submitted 11 February, 2021; originally announced February 2021.

arXiv:2102.06003 [pdf]

Language Independent Emotion Quantification using Non linear Modelling of Speech

Authors: Uddalok Sarkar, Sayan Nag, Chirayata Bhattacharya, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

Abstract: At present emotion extraction from speech is a very important issue due to its diverse applications. Hence, it becomes absolutely necessary to obtain models that take into consideration the speaking styles of a person, vocal tract information, timbral qualities and other congenital information regarding his voice. Our speech production system is a nonlinear system like most other real world system… ▽ More At present emotion extraction from speech is a very important issue due to its diverse applications. Hence, it becomes absolutely necessary to obtain models that take into consideration the speaking styles of a person, vocal tract information, timbral qualities and other congenital information regarding his voice. Our speech production system is a nonlinear system like most other real world systems. Hence the need arises for modelling our speech information using nonlinear techniques. In this work we have modelled our articulation system using nonlinear multifractal analysis. The multifractal spectral width and scaling exponents reveals essentially the complexity associated with the speech signals taken. The multifractal spectrums are well distinguishable the in low fluctuation region in case of different emotions. The source characteristics have been quantified with the help of different non-linear models like Multi-Fractal Detrended Fluctuation Analysis, Wavelet Transform Modulus Maxima. The Results obtained from this study gives a very good result in emotion clustering. △ Less

Submitted 11 February, 2021; originally announced February 2021.

arXiv:2102.00616 [pdf]

Neural Network architectures to classify emotions in Indian Classical Music

Authors: Uddalok Sarkar, Sayan Nag, Medha Basu, Archi Banerjee, Shankha Sanyal, Ranjan Sengupta, Dipak Ghosh

Abstract: Music is often considered as the language of emotions. It has long been known to elicit emotions in human being and thus categorizing music based on the type of emotions they induce in human being is a very intriguing topic of research. When the task comes to classify emotions elicited by Indian Classical Music (ICM), it becomes much more challenging because of the inherent ambiguity associated wi… ▽ More Music is often considered as the language of emotions. It has long been known to elicit emotions in human being and thus categorizing music based on the type of emotions they induce in human being is a very intriguing topic of research. When the task comes to classify emotions elicited by Indian Classical Music (ICM), it becomes much more challenging because of the inherent ambiguity associated with ICM. The fact that a single musical performance can evoke a variety of emotional response in the audience is implicit to the nature of ICM renditions. With the rapid advancements in the field of Deep Learning, this Music Emotion Recognition (MER) task is becoming more and more relevant and robust, hence can be applied to one of the most challenging test case i.e. classifying emotions elicited from ICM. In this paper we present a new dataset called JUMusEmoDB which presently has 400 audio clips (30 seconds each) where 200 clips correspond to happy emotions and the remaining 200 clips correspond to sad emotion. For supervised classification purposes, we have used 4 existing deep Convolutional Neural Network (CNN) based architectures (resnet18, mobilenet v2.0, squeezenet v1.0 and vgg16) on corresponding music spectrograms of the 2000 sub-clips (where every clip was segmented into 5 sub-clips of about 5 seconds each) which contain both time as well as frequency domain information. The initial results are quite inspiring, and we look forward to setting the baseline values for the dataset using this architecture. This type of CNN based classification algorithm using a rich corpus of Indian Classical Music is unique even in the global perspective and can be replicated in other modalities of music also. This dataset is still under development and we plan to include more data containing other emotional features as well. We plan to make the dataset publicly available soon. △ Less

Submitted 31 January, 2021; originally announced February 2021.

arXiv:2101.06335 [pdf, other]

Slider: On the Design and Modeling of a 2D Floating Satellite Platform

Authors: Avijit Banerjee, Jakub Haluska, Sumeet G. Satpute, Dariusz Kominiak, George Nikolakopoulos

Abstract: In this article, a floating robotic emulation platform for a virtual demonstration of satellite motion in space is presented. The robotic platform design is characterized by its friction-less, levitating, yet planar motion over a hyper-smooth surface. The robotic platform, integrated with sensor and actuator units, is fully designed and manufactured from the Robotics and Artificial Intelligence Te… ▽ More In this article, a floating robotic emulation platform for a virtual demonstration of satellite motion in space is presented. The robotic platform design is characterized by its friction-less, levitating, yet planar motion over a hyper-smooth surface. The robotic platform, integrated with sensor and actuator units, is fully designed and manufactured from the Robotics and Artificial Intelligence Team at Luleå University of Technology. A detailed design description along with the mathematical modeling describing the platform's dynamic motion is formulated. Finally, the proposed design is validated in extensive simulation studies, while the overall test bed experimental setup, as well as the vehicle hardware and software architectures, are discussed in detail. Furthermore, the entire design, including 3D printing CAD model and different testbed elements, is provided in an open-source repository and a test campaign is used to showcase its capabilities and illustrate its operations. △ Less

Submitted 15 January, 2021; originally announced January 2021.

arXiv:2008.00247 [pdf, other]

Meta-DRN: Meta-Learning for 1-Shot Image Segmentation

Authors: Atmadeep Banerjee

Abstract: Modern deep learning models have revolutionized the field of computer vision. But, a significant drawback of most of these models is that they require a large number of labelled examples to generalize properly. Recent developments in few-shot learning aim to alleviate this requirement. In this paper, we propose a novel lightweight CNN architecture for 1-shot image segmentation. The proposed model… ▽ More Modern deep learning models have revolutionized the field of computer vision. But, a significant drawback of most of these models is that they require a large number of labelled examples to generalize properly. Recent developments in few-shot learning aim to alleviate this requirement. In this paper, we propose a novel lightweight CNN architecture for 1-shot image segmentation. The proposed model is created by taking inspiration from well-performing architectures for semantic segmentation and adapting it to the 1-shot domain. We train our model using 4 meta-learning algorithms that have worked well for image classification and compare the results. For the chosen dataset, our proposed model has a 70% lower parameter count than the benchmark, while having better or comparable mean IoU scores using all 4 of the meta-learning algorithms. △ Less

Submitted 1 August, 2020; originally announced August 2020.

arXiv:2006.14718 [pdf, other]

Asynchronous Multi Agent Active Search

Authors: Ramina Ghods, Arundhati Banerjee, Jeff Schneider

Abstract: Active search refers to the problem of efficiently locating targets in an unknown environment by actively making data-collection decisions, and has many applications including detecting gas leaks, radiation sources or human survivors of disasters using aerial and/or ground robots (agents). Existing active search methods are in general only amenable to a single agent, or if they extend to multi age… ▽ More Active search refers to the problem of efficiently locating targets in an unknown environment by actively making data-collection decisions, and has many applications including detecting gas leaks, radiation sources or human survivors of disasters using aerial and/or ground robots (agents). Existing active search methods are in general only amenable to a single agent, or if they extend to multi agent they require a central control system to coordinate the actions of all agents. However, such control systems are often impractical in robotics applications. In this paper, we propose two distinct active search algorithms called SPATS (Sparse Parallel Asynchronous Thompson Sampling) and LATSI (LAplace Thompson Sampling with Information gain) that allow for multiple agents to independently make data-collection decisions without a central coordinator. Throughout we consider that targets are sparsely located around the environment in keeping with compressive sensing assumptions and its applicability in real world scenarios. Additionally, while most common search algorithms assume that agents can sense the entire environment (e.g. compressive sensing) or sense point-wise (e.g. Bayesian Optimization) at all times, we make a realistic assumption that each agent can only sense a contiguous region of space at a time. We provide simulation results as well as theoretical analysis to demonstrate the efficacy of our proposed algorithms. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: Preprint under review

arXiv:2004.08248 [pdf]

Acoustical classification of different speech acts using nonlinear methods

Authors: Chirayata Bhattacharyya, Sourya Sengupta, Sayan Nag, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

Abstract: A recitation is a way of combining the words together so that they have a sense of rhythm and thus an emotional content is imbibed within. In this study we envisaged to answer these questions in a scientific manner taking into consideration 5 (five) well known Bengali recitations of different poets conveying a variety of moods ranging from joy to sorrow. The clips were recited as well as read (in… ▽ More A recitation is a way of combining the words together so that they have a sense of rhythm and thus an emotional content is imbibed within. In this study we envisaged to answer these questions in a scientific manner taking into consideration 5 (five) well known Bengali recitations of different poets conveying a variety of moods ranging from joy to sorrow. The clips were recited as well as read (in the form of flat speech without any rhythm) by the same person to avoid any perceptual difference arising out of timbre variation. Next, the emotional content from the 5 recitations were standardized with the help of listening test conducted on a pool of 50 participants. The recitations as well as the speech were analyzed with the help of a latest non linear technique called Detrended Fluctuation Analysis (DFA) that gives a scaling exponent α, which is essentially the measure of long range correlations present in the signal. Similar pieces (the parts which have the exact lyrical content in speech as well as in the recital) were extracted from the complete signal and analyzed with the help of DFA technique. Our analysis shows that the scaling exponent for all parts of recitation were much higher in general as compared to their counterparts in speech. We have also established a critical value from our analysis, above which a mere speech may become a recitation. The case may be similar to the conventional phase transition, wherein the measurement of external condition at which the transformation occurs (generally temperature) is called phase transition. Further, we have also categorized the 5 recitations on the basis of their emotional content with the help of the same DFA technique. Analysis with a greater variety of recitations is being carried out to yield more interesting results. △ Less

Submitted 5 August, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

Comments: 6 pages, 2 figures; Proceedings of WESPAC 2018, New Delhi, India, November 11-15, 2018

arXiv:2004.07820 [pdf]

Speaker Recognition in Bengali Language from Nonlinear Features

Authors: Uddalok Sarkar, Soumyadeep Pal, Sayan Nag, Chirayata Bhattacharya, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

Abstract: At present Automatic Speaker Recognition system is a very important issue due to its diverse applications. Hence, it becomes absolutely necessary to obtain models that take into consideration the speaking style of a person, vocal tract information, timbral qualities of his voice and other congenital information regarding his voice. The study of Bengali speech recognition and speaker identification… ▽ More At present Automatic Speaker Recognition system is a very important issue due to its diverse applications. Hence, it becomes absolutely necessary to obtain models that take into consideration the speaking style of a person, vocal tract information, timbral qualities of his voice and other congenital information regarding his voice. The study of Bengali speech recognition and speaker identification is scarce in the literature. Hence the need arises for involving Bengali subjects in modelling our speaker identification engine. In this work, we have extracted some acoustic features of speech using non linear multifractal analysis. The Multifractal Detrended Fluctuation Analysis reveals essentially the complexity associated with the speech signals taken. The source characteristics have been quantified with the help of different techniques like Correlation Matrix, skewness of MFDFA spectrum etc. The Results obtained from this study gives a good recognition rate for Bengali Speakers. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: arXiv admin note: text overlap with arXiv:1612.00171, arXiv:1601.07709

arXiv:2004.07003 [pdf, other]

MXR-U-Nets for Real Time Hyperspectral Reconstruction

Authors: Atmadeep Banerjee, Akash Palrecha

Abstract: In recent times, CNNs have made significant contributions to applications in image generation, super-resolution and style transfer. In this paper, we build upon the work of Howard and Gugger, He et al. and Misra, D. and propose a CNN architecture that accurately reconstructs hyperspectral images from their RGB counterparts. We also propose a much shallower version of our best model with a 10% rela… ▽ More In recent times, CNNs have made significant contributions to applications in image generation, super-resolution and style transfer. In this paper, we build upon the work of Howard and Gugger, He et al. and Misra, D. and propose a CNN architecture that accurately reconstructs hyperspectral images from their RGB counterparts. We also propose a much shallower version of our best model with a 10% relative memory footprint and 3x faster inference, thus enabling real-time video applications while still experiencing only about a 0.5% decrease in performance. △ Less

Submitted 15 April, 2020; originally announced April 2020.

ACM Class: I.4.5; I.4.10

arXiv:1910.11090 [pdf, other]

Emotion Generation and Recognition: A StarGAN Approach

Authors: Aritra Banerjee, Dimitrios Kollias

Abstract: The main idea of this ISO is to use StarGAN (A type of GAN model) to perform training and testing on an emotion dataset resulting in a emotion recognition which can be generated by the valence arousal score of the 7 basic expressions. We have created an entirely new dataset consisting of 4K videos. This dataset consists of all the basic 7 types of emotions: Happy, Sad, Angry, Surprised, Fear, Disg… ▽ More The main idea of this ISO is to use StarGAN (A type of GAN model) to perform training and testing on an emotion dataset resulting in a emotion recognition which can be generated by the valence arousal score of the 7 basic expressions. We have created an entirely new dataset consisting of 4K videos. This dataset consists of all the basic 7 types of emotions: Happy, Sad, Angry, Surprised, Fear, Disgust, Neutral. We have performed face detection and alignment followed by annotating basic valence arousal values to the frames/images in the dataset depending on the emotions manually. Then the existing StarGAN model is trained on our created dataset after which some manual subjects were chosen to test the efficiency of the trained StarGAN model. △ Less

Submitted 12 October, 2019; originally announced October 2019.

arXiv:1907.03898 [pdf]

doi 10.3390/s19183954

Parametrically Amplified Low-Power MEMS Capacitive Humidity Sensor

Authors: Rugved Likhite, Aishwaryadev Banerjee, Apratim Majumder, Hanseup Kim and, Carlos H. Mastrangelo

Abstract: We present the design, fabrication, and response of a polymer-based Laterally Amplified Chemo-Mechanical (LACM) humidity sensor based on mechanical leveraging and parametric amplification. The device consists of a sense cantilever asymmetrically patterned with a polymer and flanked by two stationary electrodes on the sides. When exposed to a humidity change, the polymer swells after absorbing the… ▽ More We present the design, fabrication, and response of a polymer-based Laterally Amplified Chemo-Mechanical (LACM) humidity sensor based on mechanical leveraging and parametric amplification. The device consists of a sense cantilever asymmetrically patterned with a polymer and flanked by two stationary electrodes on the sides. When exposed to a humidity change, the polymer swells after absorbing the analyte and causes the central cantilever to bend laterally towards one side, causing a change in the measured capacitance. The device features an intrinsic gain due to parametric amplification resulting in an enhanced signal-to-noise ratio (SNR). 11-fold magnification in sensor response was observed via voltage biasing of the side electrodes without the use of conventional electronic amplifiers. The sensor showed a repeatable and recoverable capacitance change of 11% when exposed to a change in relative humidity from 25-85%. The dynamic characterization of the device also revealed a response time ~1s and demonstrated a competitive response with respect to a commercially available reference chip. △ Less

Submitted 8 July, 2019; originally announced July 2019.

arXiv:1907.03576 [pdf, other]

Deep Learning-Based Semantic Segmentation of Microscale Objects

Authors: Ekta U. Samani, Wei Guo, Ashis G. Banerjee

Abstract: Accurate estimation of the positions and shapes of microscale objects is crucial for automated imaging-guided manipulation using a non-contact technique such as optical tweezers. Perception methods that use traditional computer vision algorithms tend to fail when the manipulation environments are crowded. In this paper, we present a deep learning model for semantic segmentation of the images repre… ▽ More Accurate estimation of the positions and shapes of microscale objects is crucial for automated imaging-guided manipulation using a non-contact technique such as optical tweezers. Perception methods that use traditional computer vision algorithms tend to fail when the manipulation environments are crowded. In this paper, we present a deep learning model for semantic segmentation of the images representing such environments. Our model successfully performs segmentation with a high mean Intersection Over Union score of 0.91. △ Less

Submitted 3 July, 2019; originally announced July 2019.

Comments: A condensed version of the paper is published in the Proceedings of the 2019 International Conference on Manipulation, Automation and Robotics at Small Scales

arXiv:1805.08865 [pdf]

Speaker Recognition using Deep Belief Networks

Authors: Adrish Banerjee, Akash Dubey, Abhishek Menon, Shubham Nanda, Gora Chand Nandi

Abstract: Short time spectral features such as mel frequency cepstral coefficients(MFCCs) have been previously deployed in state of the art speaker recognition systems, however lesser heed has been paid to short term spectral features that can be learned by generative learning models from speech signals. Higher dimensional encoders such as deep belief networks (DBNs) could improve performance in speaker rec… ▽ More Short time spectral features such as mel frequency cepstral coefficients(MFCCs) have been previously deployed in state of the art speaker recognition systems, however lesser heed has been paid to short term spectral features that can be learned by generative learning models from speech signals. Higher dimensional encoders such as deep belief networks (DBNs) could improve performance in speaker recognition tasks by better modelling the statistical structure of sound waves. In this paper, we use short term spectral features learnt from the DBN augmented with MFCC features to perform the task of speaker recognition. Using our features, we achieved a recognition accuracy of 0.95 as compared to 0.90 when using standalone MFCC features on the ELSDSR dataset. △ Less

Submitted 9 May, 2018; originally announced May 2018.

arXiv:1712.08336 [pdf]

Music of Brain and Music on Brain: A Novel EEG Sonification approach

Authors: Sayan Nag, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

Abstract: Can we hear the sound of our brain? Is there any technique which can enable us to hear the neuro-electrical impulses originating from the different lobes of brain? The answer to all these questions is YES. In this paper we present a novel method with which we can sonify the Electroencephalogram (EEG) data recorded in rest state as well as under the influence of a simplest acoustical stimuli - a ta… ▽ More Can we hear the sound of our brain? Is there any technique which can enable us to hear the neuro-electrical impulses originating from the different lobes of brain? The answer to all these questions is YES. In this paper we present a novel method with which we can sonify the Electroencephalogram (EEG) data recorded in rest state as well as under the influence of a simplest acoustical stimuli - a tanpura drone. The tanpura drone has a very simple yet very complex acoustic features, which is generally used for creation of an ambiance during a musical performance. Hence, for this pilot project we chose to study the correlation between a simple acoustic stimuli (tanpura drone) and sonified EEG data. Till date, there have been no study which deals with the direct correlation between a bio-signal and its acoustic counterpart and how that correlation varies under the influence of different types of stimuli. This is the first of its kind study which bridges this gap and looks for a direct correlation between music signal and EEG data using a robust mathematical microscope called Multifractal Detrended Cross Correlation Analysis (MFDXA). For this, we took EEG data of 10 participants in 2 min 'rest state' (i.e. with white noise) and in 2 min 'tanpura drone' (musical stimulus) listening condition. Next, the EEG signals from different electrodes were sonified and MFDXA technique was used to assess the degree of correlation (or the cross correlation coefficient) between tanpura signal and EEG signals. The variation of γx for different lobes during the course of the experiment also provides major interesting new information. Only music stimuli has the ability to engage several areas of the brain significantly unlike other stimuli (which engages specific domains only). △ Less

Submitted 22 December, 2017; originally announced December 2017.

Comments: 6 pages, 4 figures; Presented in the International Symposium on Frontiers of Research in speech and Music (FRSM)-2017, held at NIT, Rourkela in 15-16 December 2017

Showing 1–43 of 43 results for author: Banerjee, A