Search | arXiv e-print repository

Size biased Multinomial Modelling of detection data in Software testing

Authors: Pallabi Ghosh, Ashis Kr. Chakraborty, Soumen Dey

Abstract: Estimation of software reliability often poses a considerable challenge, particularly for critical softwares. Several methods of estimation of reliability of software are already available in the literature. But, so far almost nobody used the concept of size of a bug for estimating software reliability. In this article we make used of the bug size or the eventual bug size which helps us to determi… ▽ More Estimation of software reliability often poses a considerable challenge, particularly for critical softwares. Several methods of estimation of reliability of software are already available in the literature. But, so far almost nobody used the concept of size of a bug for estimating software reliability. In this article we make used of the bug size or the eventual bug size which helps us to determine reliability of software more precisely. The size-biased model developed here can also be used for similar fields like hydrocarbon exploration. The model has been validated through simulation and subsequently used for a critical space application software testing data. The estimated results match the actual observations to a large extent. △ Less

Submitted 24 May, 2024; originally announced June 2024.

Comments: Submitted to OPSEARCH

arXiv:2405.14835 [pdf, other]

Polynomial Pass Semi-Streaming Lower Bounds for K-Cores and Degeneracy

Authors: Sepehr Assadi, Prantar Ghosh, Bruno Loff, Parth Mittal, Sagnik Mukhopadhyay

Abstract: The following question arises naturally in the study of graph streaming algorithms: "Is there any graph problem which is "not too hard", in that it can be solved efficiently with total communication (nearly) linear in the number $n$ of vertices, and for which, nonetheless, any streaming algorithm with $\tilde{O}(n)$ space (i.e., a semi-streaming algorithm) needs a polynomial $n^{Ω(1)}$ number of… ▽ More The following question arises naturally in the study of graph streaming algorithms: "Is there any graph problem which is "not too hard", in that it can be solved efficiently with total communication (nearly) linear in the number $n$ of vertices, and for which, nonetheless, any streaming algorithm with $\tilde{O}(n)$ space (i.e., a semi-streaming algorithm) needs a polynomial $n^{Ω(1)}$ number of passes?" Assadi, Chen, and Khanna [STOC 2019] were the first to prove that this is indeed the case. However, the lower bounds that they obtained are for rather non-standard graph problems. Our first main contribution is to present the first polynomial-pass lower bounds for natural "not too hard" graph problems studied previously in the streaming model: $k$-cores and degeneracy. We devise a novel communication protocol for both problems with near-linear communication, thus showing that $k$-cores and degeneracy are natural examples of "not too hard" problems. Indeed, previous work have developed single-pass semi-streaming algorithms for approximating these problems. In contrast, we prove that any semi-streaming algorithm for exactly solving these problems requires (almost) $Ω(n^{1/3})$ passes. Our second main contribution is improved round-communication lower bounds for the underlying communication problems at the basis of these reductions: * We improve the previous lower bound of Assadi, Chen, and Khanna for hidden pointer chasing (HPC) to achieve optimal bounds. * We observe that all current reductions from HPC can also work with a generalized version of this problem that we call MultiHPC, and prove an even stronger and optimal lower bound for this generalization. These two results collectively allow us to improve the resulting pass lower bounds for semi-streaming algorithms by a polynomial factor, namely, from $n^{1/5}$ to $n^{1/3}$ passes. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Accepted at CCC 2024

arXiv:2405.05952 [pdf, other]

New Algorithms and Lower Bounds for Streaming Tournaments

Authors: Prantar Ghosh, Sahil Kuchlous

Abstract: We study fundamental directed graph (digraph) problems in the streaming model. An initial investigation by Chakrabarti, Ghosh, McGregor, and Vorotnikova [SODA'20] on streaming digraphs showed that while most of these problems are provably hard in general, some of them become tractable when restricted to the well-studied class of tournament graphs where every pair of nodes shares exactly one direct… ▽ More We study fundamental directed graph (digraph) problems in the streaming model. An initial investigation by Chakrabarti, Ghosh, McGregor, and Vorotnikova [SODA'20] on streaming digraphs showed that while most of these problems are provably hard in general, some of them become tractable when restricted to the well-studied class of tournament graphs where every pair of nodes shares exactly one directed edge. Thus, we focus on tournaments and improve the state of the art for multiple problems in terms of both upper and lower bounds. Our primary upper bound is a deterministic single-pass semi-streaming algorithm (using $\tilde{O}(n)$ space for $n$-node graphs, where $\tilde{O}(.)$ hides polylog$(n)$ factors) for decomposing a tournament into strongly connected components (SCC). it improves upon the previously best-known algorithm by Baweja, Jia, and Woodruff [ITCS'22] in terms of both space and passes: for $p\geq 1$, they used $(p+1)$-passes and $\tilde{O}(n^{1+1/p})$-space. We further extend our algorithm to digraphs that are close to tournaments and establish tight bounds demonstrating that the problem's complexity grows smoothly with the "distance" from tournaments. Applying our framework, we obtain improved tournament algorithms for $s,t$-reachability, strong connectivity, Hamiltonian paths and cycles, and feedback arc set. On the other hand, we prove the first $Ω(n^2)$-space lower bounds for this class, exhibiting that some well-studied problems -- such as (exact) feedback arc set on tournaments (FAST) and $s,t$-distance -- remain hard here. We obtain a generalized lower bound on space-approximation tradeoffs for FAST: any single-pass $(1\pm \varepsilon)$-approximation algorithm requires $Ω(n/\sqrt{\varepsilon})$ space. As a whole, our collection of results contributes significantly to the growing literature on streaming digraphs. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04603 [pdf, other]

doi 10.1016/j.jsv.2024.118476

Neural network based approach for solving problems in plane wave duct acoustics

Authors: D. Veerababu, Prasanta K. Ghosh

Abstract: Neural networks have emerged as a tool for solving differential equations in many branches of engineering and science. But their progress in frequency domain acoustics is limited by the vanishing gradient problem that occurs at higher frequencies. This paper discusses a formulation that can address this issue. The problem of solving the governing differential equation along with the boundary condi… ▽ More Neural networks have emerged as a tool for solving differential equations in many branches of engineering and science. But their progress in frequency domain acoustics is limited by the vanishing gradient problem that occurs at higher frequencies. This paper discusses a formulation that can address this issue. The problem of solving the governing differential equation along with the boundary conditions is posed as an unconstrained optimization problem. The acoustic field is approximated to the output of a neural network which is constructed in such a way that it always satisfies the boundary conditions. The applicability of the formulation is demonstrated on popular problems in plane wave acoustic theory. The predicted solution from the neural network formulation is compared with those obtained from the analytical solution. A good agreement is observed between the two solutions. The method of transfer learning to calculate the particle velocity from the existing acoustic pressure field is demonstrated with and without mean flow effects. The sensitivity of the training process to the choice of the activation function and the number of collocation points is studied. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Published Journal Article

ACM Class: G.1.6; I.6.4; J.2

Journal ref: Journal of Sound and Vibration, 585, 2024:118476

arXiv:2404.04530 [pdf, other]

A Morphology-Based Investigation of Positional Encodings

Authors: Poulami Ghosh, Shikhar Vashishth, Raj Dabre, Pushpak Bhattacharyya

Abstract: Contemporary deep learning models effectively handle languages with diverse morphology despite not being directly integrated into them. Morphology and word order are closely linked, with the latter incorporated into transformer-based models through positional encodings. This prompts a fundamental inquiry: Is there a correlation between the morphological complexity of a language and the utilization… ▽ More Contemporary deep learning models effectively handle languages with diverse morphology despite not being directly integrated into them. Morphology and word order are closely linked, with the latter incorporated into transformer-based models through positional encodings. This prompts a fundamental inquiry: Is there a correlation between the morphological complexity of a language and the utilization of positional encoding in pre-trained language models? In pursuit of an answer, we present the first study addressing this question, encompassing 22 languages and 5 downstream tasks. Our findings reveal that the importance of positional encoding diminishes with increasing morphological complexity in languages. Our study motivates the need for a deeper understanding of positional encoding, augmenting them to better reflect the different languages under consideration. △ Less

Submitted 30 May, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: Work in Progress

arXiv:2402.14252 [pdf, other]

doi 10.1145/3613904.3642013

Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology

Authors: Nur Yildirim, Hannah Richardson, Maria T. Wetscherek, Junaid Bajwa, Joseph Jacob, Mark A. Pinnock, Stephen Harris, Daniel Coelho de Castro, Shruthi Bannur, Stephanie L. Hyland, Pratik Ghosh, Mercy Ranjit, Kenza Bouzid, Anton Schwaighofer, Fernando Pérez-García, Harshita Sharma, Ozan Oktay, Matthew Lungren, Javier Alvarez-Valle, Aditya Nori, Anja Thieme

Abstract: Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual que… ▽ More Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual questions (e.g., 'Where are the nodules in this chest X-ray?'). However, the clinical utility of potential applications of these capabilities is currently underexplored. We engaged in an iterative, multidisciplinary design process to envision clinically relevant VLM interactions, and co-designed four VLM use concepts: Draft Report Generation, Augmented Report Review, Visual Search and Querying, and Patient Imaging History Highlights. We studied these concepts with 13 radiologists and clinicians who assessed the VLM concepts as valuable, yet articulated many design considerations. Reflecting on our findings, we discuss implications for integrating VLM capabilities in radiology, and for healthcare AI more generally. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: to appear at CHI 2024

arXiv:2402.02052 [pdf]

doi 10.5121/ijcnc.2024.16104

Feature Selection using the concept of Peafowl Mating in IDS

Authors: Partha Ghosh, Joy Sharma, Nilesh Pandey

Abstract: Cloud computing has high applicability as an Internet based service that relies on sharing computing resources. Cloud computing provides services that are Infrastructure based, Platform based and Software based. The popularity of this technology is due to its superb performance, high level of computing ability, low cost of services, scalability, availability and flexibility. The obtainability and… ▽ More Cloud computing has high applicability as an Internet based service that relies on sharing computing resources. Cloud computing provides services that are Infrastructure based, Platform based and Software based. The popularity of this technology is due to its superb performance, high level of computing ability, low cost of services, scalability, availability and flexibility. The obtainability and openness of data in cloud environment make it vulnerable to the world of cyber-attacks. To detect the attacks Intrusion Detection System is used, that can identify the attacks and ensure information security. Such a coherent and proficient Intrusion Detection System is proposed in this paper to achieve higher certainty levels regarding safety in cloud environment. In this paper, the mating behavior of peafowl is incorporated into an optimization algorithm which in turn is used as a feature selection algorithm. The algorithm is used to reduce the huge size of cloud data so that the IDS can work efficiently on the cloud to detect intrusions. The proposed model has been experimented with NSL-KDD dataset as well as Kyoto dataset and have proved to be a better as well as an efficient IDS. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Journal ref: International Journal of Computer Networks & Communications (IJCNC) Vol.16, No.1, January 2024

arXiv:2401.06378 [pdf, ps, other]

New Lower Bounds in Merlin-Arthur Communication and Graph Streaming Verification

Authors: Prantar Ghosh, Vihan Shah

Abstract: We show new lower bounds in the \emph{Merlin-Arthur} (MA) communication model and the related \emph{annotated streaming} or stream verification model. The MA communication model is an enhancement of the classical communication model, where in addition to the usual players Alice and Bob, there is an all-powerful but untrusted player Merlin who knows their inputs and tries to convince them about the… ▽ More We show new lower bounds in the \emph{Merlin-Arthur} (MA) communication model and the related \emph{annotated streaming} or stream verification model. The MA communication model is an enhancement of the classical communication model, where in addition to the usual players Alice and Bob, there is an all-powerful but untrusted player Merlin who knows their inputs and tries to convince them about the output. Most functions have MA protocols with total communication significantly smaller than what would be needed without Merlin. We focus on the online MA (OMA) model, which is the MA analogue of one-way communication, and introduce the notion of \emph{non-trivial-OMA} complexity of a function. This is the minimum total communication needed by any non-trivial OMA protocol computing that function, where a trivial OMA protocol is one where Alice sends Bob roughly as many bits as she would have sent without Merlin. We prove a lower bound on the non-trivial-OMA complexity of a natural function \emph{Equals-Index} (basically the well-known Index problem on large domains) and identify it as a canonical problem for proving strong lower bounds on this complexity: reductions from it (i) reproduce and/or improve upon the lower bounds for all functions that were previously known to have large non-trivial-OMA complexity, (ii) exhibit the first explicit functions whose non-trivial-OMA complexity is superlinear, and even exponential, in their classical one-way complexity, and (iii) show functions on input size $n$ for which this complexity is as large as $n/\log n$. While exhibiting a function with $ω(\sqrt{n})$ (standard) OMA complexity is a longstanding open problem, we did not even know of any function with $ω(\sqrt{n})$ non-trivial-OMA complexity. We further extend the lower bounds to a related streaming model called annotated streaming. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: To appear in ITCS 2024

arXiv:2401.06035 [pdf, other]

RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks

Authors: Partha Ghosh, Soubhik Sanyal, Cordelia Schmid, Bernhard Schölkopf

Abstract: We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies, with attention to computational and dataset efficiency. To capture long spatio-temporal dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation an… ▽ More We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies, with attention to computational and dataset efficiency. To capture long spatio-temporal dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a single latent code to model an entire video clip. Individual video frames are then synthesized from an intermediate tri-plane representation, which itself is derived from the primary latent code. This novel strategy more than halves the computational complexity measured in FLOPs compared to the most efficient state-of-the-art methods. Consequently, our approach facilitates the efficient and temporally coherent generation of videos. Moreover, our joint frame modeling approach, in contrast to autoregressive methods, mitigates the generation of visual artifacts. We further enhance the model's capabilities by integrating an optical flow-based module within our Generative Adversarial Network (GAN) based generator architecture, thereby compensating for the constraints imposed by a smaller generator size. As a result, our model synthesizes high-fidelity video clips at a resolution of $256\times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps. The efficacy and versatility of our approach are empirically validated through qualitative and quantitative assessments across three different datasets comprising both synthetic and real video clips. We will make our training and inference code public. △ Less

Submitted 11 August, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2312.01275 [pdf, other]

A Review of Link Prediction Applications in Network Biology

Authors: Ahmad F. Al Musawi, Satyaki Roy, Preetam Ghosh

Abstract: In the domain of network biology, the interactions among heterogeneous genomic and molecular entities are represented through networks. Link prediction (LP) methodologies are instrumental in inferring missing or prospective associations within these biological networks. In this review, we systematically dissect the attributes of local, centrality, and embedding-based LP approaches, applied to stat… ▽ More In the domain of network biology, the interactions among heterogeneous genomic and molecular entities are represented through networks. Link prediction (LP) methodologies are instrumental in inferring missing or prospective associations within these biological networks. In this review, we systematically dissect the attributes of local, centrality, and embedding-based LP approaches, applied to static and dynamic biological networks. We undertake an examination of the current applications of LP metrics for predicting links between diseases, genes, proteins, RNA, microbiomes, drugs, and neurons. We carry out comprehensive performance evaluations on established biological network datasets to show the practical applications of standard LP models. Moreover, we compare the similarity in prediction trends among the models and the specific network attributes that contribute to effective link prediction, before underscoring the role of LP in addressing the formidable challenges prevalent in biological systems, ranging from noise, bias, and data sparseness to interpretability. We conclude the review with an exploration of the essential characteristics expected from future LP models, poised to advance our comprehension of the intricate interactions governing biological systems. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.05435 [pdf]

Parkinson's Disease Detection through Vocal Biomarkers and Advanced Machine Learning Algorithms

Authors: Md Abu Sayed, Maliha Tayaba, MD Tanvir Islam, Md Eyasin Ul Islam Pavel, Md Tuhin Mia, Eftekhar Hossain Ayon, Nur Nob, Bishnu Padh Ghosh

Abstract: Parkinson's disease (PD) is a prevalent neurodegenerative disorder known for its impact on motor neurons, causing symptoms like tremors, stiffness, and gait difficulties. This study explores the potential of vocal feature alterations in PD patients as a means of early disease prediction. This research aims to predict the onset of Parkinson's disease. Utilizing a variety of advanced machine-learnin… ▽ More Parkinson's disease (PD) is a prevalent neurodegenerative disorder known for its impact on motor neurons, causing symptoms like tremors, stiffness, and gait difficulties. This study explores the potential of vocal feature alterations in PD patients as a means of early disease prediction. This research aims to predict the onset of Parkinson's disease. Utilizing a variety of advanced machine-learning algorithms, including XGBoost, LightGBM, Bagging, AdaBoost, and Support Vector Machine, among others, the study evaluates the predictive performance of these models using metrics such as accuracy, area under the curve (AUC), sensitivity, and specificity. The findings of this comprehensive analysis highlight LightGBM as the most effective model, achieving an impressive accuracy rate of 96% alongside a matching AUC of 96%. LightGBM exhibited a remarkable sensitivity of 100% and specificity of 94.43%, surpassing other machine learning algorithms in accuracy and AUC scores. Given the complexities of Parkinson's disease and its challenges in early diagnosis, this study underscores the significance of leveraging vocal biomarkers coupled with advanced machine-learning techniques for precise and timely PD detection. △ Less

Submitted 2 December, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

arXiv:2309.11805 [pdf, other]

JobRecoGPT -- Explainable job recommendations using LLMs

Authors: Preetam Ghosh, Vaishali Sadaphal

Abstract: In today's rapidly evolving job market, finding the right opportunity can be a daunting challenge. With advancements in the field of AI, computers can now recommend suitable jobs to candidates. However, the task of recommending jobs is not same as recommending movies to viewers. Apart from must-have criteria, like skills and experience, there are many subtle aspects to a job which can decide if it… ▽ More In today's rapidly evolving job market, finding the right opportunity can be a daunting challenge. With advancements in the field of AI, computers can now recommend suitable jobs to candidates. However, the task of recommending jobs is not same as recommending movies to viewers. Apart from must-have criteria, like skills and experience, there are many subtle aspects to a job which can decide if it is a good fit or not for a given candidate. Traditional approaches can capture the quantifiable aspects of jobs and candidates, but a substantial portion of the data that is present in unstructured form in the job descriptions and resumes is lost in the process of conversion to structured format. As of late, Large Language Models (LLMs) have taken over the AI field by storm with extraordinary performance in fields where text-based data is available. Inspired by the superior performance of LLMs, we leverage their capability to understand natural language for capturing the information that was previously getting lost during the conversion of unstructured data to structured form. To this end, we compare performance of four different approaches for job recommendations namely, (i) Content based deterministic, (ii) LLM guided, (iii) LLM unguided, and (iv) Hybrid. In this study, we present advantages and limitations of each method and evaluate their performance in terms of time requirements. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 10 pages, 29 figures

arXiv:2308.10638 [pdf, other]

SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes

Authors: Soubhik Sanyal, Partha Ghosh, Jinlong Yang, Michael J. Black, Justus Thies, Timo Bolkart

Abstract: We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-s… ▽ More We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-sized 3D scan datasets like CAPE, as well as large-scale 2D image datasets of clothed humans and multiple appearances can be mapped to a single geometry. To effectively learn from the two data modalities, we propose an unpaired learning procedure for pose-dependent clothed and textured human meshes. Specifically, we learn a pose-dependent geometry space from 3D scan data. We represent this as per vertex displacements w.r.t. the SMPL model. Next, we train a geometry conditioned texture generator in an unsupervised way using the 2D image data. We use intermediate activations of the learned geometry model to condition our texture generator. To alleviate entanglement between pose and clothing type, and pose and clothing appearance, we condition both the texture and geometry generators with attribute labels such as clothing types for the geometry, and clothing colors for the texture generator. We automatically generated these conditioning labels for the 2D images based on the visual question answering model BLIP and CLIP. We validate our method on the SCULPT dataset, and compare to state-of-the-art 3D generative models for clothed human bodies. Our code and data can be found at https://sculpt.is.tue.mpg.de. △ Less

Submitted 6 May, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: Updated to camera ready version of CVPR 2024

arXiv:2307.09882 [pdf, other]

Adversarial Likelihood Estimation With One-Way Flows

Authors: Omri Ben-Dov, Pravir Singh Gupta, Victoria Abrevaya, Michael J. Black, Partha Ghosh

Abstract: Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incor… ▽ More Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incorporate importance sampling, and show that 1) Wasserstein GAN performs a biased estimate of the partition function, and we propose instead to use an unbiased estimator; and 2) when optimizing for likelihood, one must maximize generator entropy. This is hypothesized to provide a better mode coverage. Different from previous works, we explicitly compute the density of the generated samples. This is the key enabler to designing an unbiased estimator of the partition function and computation of the generator entropy term. The generator density is obtained via a new type of flow network, called one-way flow network, that is less constrained in terms of architecture, as it does not require a tractable inverse function. Our experimental results show that our method converges faster, produces comparable sample quality to GANs with similar architecture, successfully avoids over-fitting to commonly used datasets and produces smooth low-dimensional latent representations of the training data. △ Less

Submitted 2 October, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.07948 [pdf, ps, other]

Model Adaptation for ASR in low-resource Indian Languages

Authors: Abhayjeet Singh, Arjun Singh Mehta, Ashish Khuraishi K S, Deekshitha G, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary, Karthika P, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Savitha, Prasanta Kumar Ghosh, Prashanthi V, Priyanka Pai, Raoul Nanavati, Rohan Saxena, Sai Praneeth Reddy Mora, Srinivasa Raghavan

Abstract: Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models such as wav2vec2 and large-scale multi-lingual training like Whisper. A huge challenge still exists for low-resource languages where the availability of both audio and text is limited. This is further complicated by the presence of multiple… ▽ More Automatic speech recognition (ASR) performance has improved drastically in recent years, mainly enabled by self-supervised learning (SSL) based acoustic models such as wav2vec2 and large-scale multi-lingual training like Whisper. A huge challenge still exists for low-resource languages where the availability of both audio and text is limited. This is further complicated by the presence of multiple dialects like in Indian languages. However, many Indian languages can be grouped into the same families and share the same script and grammatical structure. This is where a lot of adaptation and fine-tuning techniques can be applied to overcome the low-resource nature of the data by utilising well-resourced similar languages. In such scenarios, it is important to understand the extent to which each modality, like acoustics and text, is important in building a reliable ASR. It could be the case that an abundance of acoustic data in a language reduces the need for large text-only corpora. Or, due to the availability of various pretrained acoustic models, the vice-versa could also be true. In this proposed special session, we encourage the community to explore these ideas with the data in two low-resource Indian languages of Bengali and Bhojpuri. These approaches are not limited to Indian languages, the solutions are potentially applicable to various languages spoken around the world. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: ASRU Special session overview paper

arXiv:2304.14807 [pdf, other]

Deep Learning assisted microwave-plasma interaction based technique for plasma density estimation

Authors: Pratik Ghosh, Bhaskar Chaudhury, Shishir Purohit, Vishv Joshi, Ashray Kothari, Devdeep Shetranjiwala

Abstract: The electron density is a key parameter to characterize any plasma. Most of the plasma applications and research in the area of low-temperature plasmas (LTPs) are based on the accurate estimations of plasma density and plasma temperature. The conventional methods for electron density measurements offer axial and radial profiles for any given linear LTP device. These methods have major disadvantage… ▽ More The electron density is a key parameter to characterize any plasma. Most of the plasma applications and research in the area of low-temperature plasmas (LTPs) are based on the accurate estimations of plasma density and plasma temperature. The conventional methods for electron density measurements offer axial and radial profiles for any given linear LTP device. These methods have major disadvantages of operational range (not very wide), cumbersome instrumentation, and complicated data analysis procedures. The article proposes a Deep Learning (DL) assisted microwave-plasma interaction-based non-invasive strategy, which can be used as a new alternative approach to address some of the challenges associated with existing plasma density measurement techniques. The electric field pattern due to microwave scattering from plasma is utilized to estimate the density profile. The proof of concept is tested for a simulated training data set comprising a low-temperature, unmagnetized, collisional plasma. Different types of symmetric (Gaussian-shaped) and asymmetrical density profiles, in the range $10^{16}-10^{19}$ m$^{-3}$, addressing a range of experimental configurations have been considered in our study. Real-life experimental issues such as the presence of noise and the amount of measured data (dense vs sparse) have been taken into consideration while preparing the synthetic training data-sets. The DL-based technique has the capability to determine the electron density profile within the plasma. The performance of the proposed deep learning-based approach has been evaluated using three metrics- SSIM, RMSLE, and MAPE. The obtained results show promising performance in estimating the 2D radial profile of the density for the given linear plasma device and affirms the potential of the proposed ML-based approach in plasma diagnostics. △ Less

Submitted 28 June, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

arXiv:2304.12285 [pdf, ps, other]

Low-Memory Algorithms for Online and W-Streaming Edge Coloring

Authors: Prantar Ghosh, Manuel Stoeckl

Abstract: For edge coloring, the online and the W-streaming models seem somewhat orthogonal: the former needs edges to be assigned colors immediately after insertion, typically without any space restrictions, while the latter limits memory to sublinear in the input size but allows an edge's color to be announced any time after its insertion. We aim for the best of both worlds by designing small-space online… ▽ More For edge coloring, the online and the W-streaming models seem somewhat orthogonal: the former needs edges to be assigned colors immediately after insertion, typically without any space restrictions, while the latter limits memory to sublinear in the input size but allows an edge's color to be announced any time after its insertion. We aim for the best of both worlds by designing small-space online algorithms for edge coloring. We study the problem under both (adversarial) edge arrivals and vertex arrivals. Our results significantly improve upon the memory used by prior online algorithms while achieving an $O(1)$-competitive ratio. In particular, for $n$-node graphs with maximum vertex-degree $Δ$ under edge arrivals, we obtain an online $O(Δ)$-coloring in $\tilde{O}(n\sqrtΔ)$ space. This is also the first W-streaming edge-coloring algorithm using $O(Δ)$ colors (in sublinear memory). All prior works either used linear memory or $ω(Δ)$ colors. We also achieve a smooth color-space tradeoff: for any $t=O(Δ)$, we get an $O(Δt (\log^2 Δ))$-coloring in $\tilde{O}(n\sqrt{Δ/t})$ space, improving upon the state of the art that used $\tilde{O}(nΔ/t)$ space for the same number of colors (the $\tilde{O}(.)$ notation hides polylog$(n)$ factors). The improvements stem from extensive use of random permutations that enable us to avoid previously used colors. Most of our algorithms can be derandomized and extended to multigraphs, where edge coloring is known to be considerably harder than for simple graphs. △ Less

Submitted 31 May, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 36 pages, 1 figure; improvements to Thm 1.8 and minor edits since v1

arXiv:2303.12343 [pdf, other]

LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation

Authors: Koutilya Pnvr, Bharat Singh, Pallabi Ghosh, Behjat Siddiquie, David Jacobs

Abstract: Large-scale pre-training tasks like image classification, captioning, or self-supervised techniques do not incentivize learning the semantic boundaries of objects. However, recent generative foundation models built using text-based latent diffusion techniques may learn semantic boundaries. This is because they have to synthesize intricate details about all objects in an image based on a text descr… ▽ More Large-scale pre-training tasks like image classification, captioning, or self-supervised techniques do not incentivize learning the semantic boundaries of objects. However, recent generative foundation models built using text-based latent diffusion techniques may learn semantic boundaries. This is because they have to synthesize intricate details about all objects in an image based on a text description. Therefore, we present a technique for segmenting real and AI-generated images using latent diffusion models (LDMs) trained on internet-scale datasets. First, we show that the latent space of LDMs (z-space) is a better input representation compared to other feature representations like RGB images or CLIP encodings for text-based image segmentation. By training the segmentation models on the latent z-space, which creates a compressed representation across several domains like different forms of art, cartoons, illustrations, and photographs, we are also able to bridge the domain gap between real and AI-generated images. We show that the internal features of LDMs contain rich semantic information and present a technique in the form of LD-ZNet to further boost the performance of text-based segmentation. Overall, we show up to 6% improvement over standard baselines for text-to-image segmentation on natural images. For AI-generated imagery, we show close to 20% improvement compared to state-of-the-art techniques. The project is available at https://koutilya-pnvr.github.io/LD-ZNet/. △ Less

Submitted 23 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: Supplementary material is included in the paper following the references section

arXiv:2303.11434 [pdf, other]

ResDTA: Predicting Drug-Target Binding Affinity Using Residual Skip Connections

Authors: Partho Ghosh, Md. Aynal Haque

Abstract: The discovery of novel drug target (DT) interactions is an important step in the drug development process. The majority of computer techniques for predicting DT interactions have focused on binary classification, with the goal of determining whether or not a DT pair interacts. Protein ligand interactions, on the other hand, assume a continuous range of binding strength values, also known as bindin… ▽ More The discovery of novel drug target (DT) interactions is an important step in the drug development process. The majority of computer techniques for predicting DT interactions have focused on binary classification, with the goal of determining whether or not a DT pair interacts. Protein ligand interactions, on the other hand, assume a continuous range of binding strength values, also known as binding affinity, and forecasting this value remains a difficulty. As the amount of affinity data in DT knowledge-bases grows, advanced learning techniques such as deep learning architectures can be used to predict binding affinities. In this paper, we present a deep-learning-based methodology for predicting DT binding affinities using just sequencing information from both targets and drugs. The results show that the proposed deep learning-based model that uses the 1D representations of targets and drugs is an effective approach for drug target binding affinity prediction and it does not require additional chemical domain knowledge to work with. The model in which high-level representations of a drug and a target are constructed via CNNs that uses residual skip connections and also with an additional stream to create a high-level combined representation of the drug-target pair achieved the best Concordance Index (CI) performance in one of the largest benchmark datasets, outperforming the recent state-of-the-art method AttentionDTA and many other machine-learning and deep-learning based baseline methods for DT binding affinity prediction that uses the 1D representations of targets and drugs. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 40 pages, 10 figures, 2 tables. arXiv admin note: substantial text overlap with arXiv:1801.10193, arXiv:1902.04166 by other authors

arXiv:2302.01374 [pdf, other]

Neural Network Architecture for Database Augmentation Using Shared Features

Authors: William C. Sleeman IV, Rishabh Kapoor, Preetam Ghosh

Abstract: The popularity of learning from data with machine learning and neural networks has lead to the creation of many new datasets for almost every problem domain. However, even within a single domain, these datasets are often collected with disparate features, sampled from different sub-populations, and recorded at different time points. Even with the plethora of individual datasets, large data science… ▽ More The popularity of learning from data with machine learning and neural networks has lead to the creation of many new datasets for almost every problem domain. However, even within a single domain, these datasets are often collected with disparate features, sampled from different sub-populations, and recorded at different time points. Even with the plethora of individual datasets, large data science projects can be difficult as it is often not trivial to merge these smaller datasets. Inherent challenges in some domains such as medicine also makes it very difficult to create large single source datasets or multi-source datasets with identical features. Instead of trying to merge these non-matching datasets directly, we propose a neural network architecture that can provide data augmentation using features common between these datasets. Our results show that this style of data augmentation can work for both image and tabular data. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Comments: 22 pages, 8 figures, 4 tables

ACM Class: I.5.1; I.5.2

arXiv:2212.10641 [pdf, ps, other]

Coloring in Graph Streams via Deterministic and Adversarially Robust Algorithms

Authors: Sepehr Assadi, Amit Chakrabarti, Prantar Ghosh, Manuel Stoeckl

Abstract: In recent years, there has been a growing interest in solving various graph coloring problems in the streaming model. The initial algorithms in this line of work are all crucially randomized, raising natural questions about how important a role randomization plays in streaming graph coloring. A couple of very recent works have made progress on this question: they prove that deterministic or even a… ▽ More In recent years, there has been a growing interest in solving various graph coloring problems in the streaming model. The initial algorithms in this line of work are all crucially randomized, raising natural questions about how important a role randomization plays in streaming graph coloring. A couple of very recent works have made progress on this question: they prove that deterministic or even adversarially robust coloring algorithms (that work on streams whose updates may depend on the algorithm's past outputs) are considerably weaker than standard randomized ones. However, there is still a significant gap between the upper and lower bounds for the number of colors needed (as a function of the maximum degree $Δ$) for robust coloring and multipass deterministic coloring. We contribute to this line of work by proving the following results. In the deterministic semi-streaming (i.e., $O(n \cdot \text{polylog } n)$ space) regime, we present an algorithm that achieves a combinatorially optimal $(Δ+1)$-coloring using $O(\logΔ \log\logΔ)$ passes. This improves upon the prior $O(Δ)$-coloring algorithm of Assadi, Chen, and Sun (STOC 2022) at the cost of only an $O(\log\logΔ)$ factor in the number of passes. In the adversarially robust semi-streaming regime, we design an $O(Δ^{5/2})$-coloring algorithm that improves upon the previously best $O(Δ^{3})$-coloring algorithm of Chakrabarti, Ghosh, and Stoeckl (ITCS 2022). Further, we obtain a smooth colors/space tradeoff that improves upon another algorithm of the said work: whereas their algorithm uses $O(Δ^2)$ colors and $O(nΔ^{1/2})$ space, ours, in particular, achieves (i)~$O(Δ^2)$ colors in $O(nΔ^{1/3})$ space, and (ii)~$O(Δ^{7/4})$ colors in $O(nΔ^{1/2})$ space. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: 29 pages

arXiv:2212.09284 [pdf, other]

An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations

Authors: Shelly Jain, Priyanshi Pal, Anil Vuppala, Prasanta Ghosh, Chiranjeevi Yarra

Abstract: Speech systems are sensitive to accent variations. This is especially challenging in the Indian context, with an abundance of languages but a dearth of linguistic studies characterising pronunciation variations. The growing number of L2 English speakers in India reinforces the need to study accents and L1-L2 interactions. We investigate the accents of Indian English (IE) speakers and report in det… ▽ More Speech systems are sensitive to accent variations. This is especially challenging in the Indian context, with an abundance of languages but a dearth of linguistic studies characterising pronunciation variations. The growing number of L2 English speakers in India reinforces the need to study accents and L1-L2 interactions. We investigate the accents of Indian English (IE) speakers and report in detail our observations, both specific and common to all regions. In particular, we observe the phonemic variations and phonotactics occurring in the speakers' native languages and apply this to their English pronunciations. We demonstrate the influence of 18 Indian languages on IE by comparing the native language pronunciations with IE pronunciations obtained jointly from existing literature studies and phonetically annotated speech of 80 speakers. Consequently, we are able to validate the intuitions of Indian language influences on IE pronunciations by justifying pronunciation rules from the perspective of Indian language phonology. We obtain a comprehensive description in terms of universal and region-specific characteristics of IE, which facilitates accent conversion and adaptation of existing ASR and TTS systems to different Indian accents. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: 9 pages, 1 figure

arXiv:2211.03109 [pdf, other]

A Sequence Agnostic Multimodal Preprocessing for Clogged Blood Vessel Detection in Alzheimer's Diagnosis

Authors: Partho Ghosh, Md. Abrar Istiak, Mir Sayeed Mohammad, Swapnil Saha, Uday Kamal

Abstract: Successful identification of blood vessel blockage is a crucial step for Alzheimer's disease diagnosis. These blocks can be identified from the spatial and time-depth variable Two-Photon Excitation Microscopy (TPEF) images of the brain blood vessels using machine learning methods. In this study, we propose several preprocessing schemes to improve the performance of these methods. Our method includ… ▽ More Successful identification of blood vessel blockage is a crucial step for Alzheimer's disease diagnosis. These blocks can be identified from the spatial and time-depth variable Two-Photon Excitation Microscopy (TPEF) images of the brain blood vessels using machine learning methods. In this study, we propose several preprocessing schemes to improve the performance of these methods. Our method includes 3D-point cloud data extraction from image modality and their feature-space fusion to leverage complementary information inherent in different modalities. We also enforce the learned representation to be sequence-order invariant by utilizing bi-direction dataflow. Experimental results on The Clog Loss dataset show that our proposed method consistently outperforms the state-of-the-art preprocessing methods in stalled and non-stalled vessel classification. △ Less

Submitted 6 November, 2022; originally announced November 2022.

Comments: 5 pages, 4 figures

arXiv:2210.16881 [pdf, other]

Real-Time MRI Video synthesis from time aligned phonemes with sequence-to-sequence networks

Authors: Sathvik Udupa, Prasanta Kumar Ghosh

Abstract: Real-Time Magnetic resonance imaging (rtMRI) of the midsagittal plane of the mouth is of interest for speech production research. In this work, we focus on estimating utterance level rtMRI video from the spoken phoneme sequence. We obtain time-aligned phonemes from forced alignment, to obtain frame-level phoneme sequences which are aligned with rtMRI frames. We propose a sequence-to-sequence learn… ▽ More Real-Time Magnetic resonance imaging (rtMRI) of the midsagittal plane of the mouth is of interest for speech production research. In this work, we focus on estimating utterance level rtMRI video from the spoken phoneme sequence. We obtain time-aligned phonemes from forced alignment, to obtain frame-level phoneme sequences which are aligned with rtMRI frames. We propose a sequence-to-sequence learning model with a transformer phoneme encoder and convolutional frame decoder. We then modify the learning by using intermediary features obtained from sampling from a pretrained phoneme-conditioned variational autoencoder (CVAE). We train on 8 subjects in a subject-specific manner and demonstrate the performance with a subjective test. We also use an auxiliary task of air tissue boundary (ATB) segmentation to obtain the objective scores on the proposed models. We show that the proposed method is able to generate realistic rtMRI video for unseen utterances, and adding CVAE is beneficial for learning the sequence-to-sequence mapping for subjects where the mapping is hard to learn. △ Less

Submitted 30 October, 2022; originally announced October 2022.

Comments: submitted to ICASSP 2023

arXiv:2210.16871 [pdf, other]

Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models

Authors: Sathvik Udupa, Siddarth C, Prasanta Kumar Ghosh

Abstract: In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the mapping for acoustic to articulatory inversion (AAI). Signal processing-based acoustic features such as MFCCs have been predominantly used for the AAI task with deep neural networks. With SSL features working well for various other speech tasks such as speech recognition, emotion c… ▽ More In this work, we investigate the effectiveness of pretrained Self-Supervised Learning (SSL) features for learning the mapping for acoustic to articulatory inversion (AAI). Signal processing-based acoustic features such as MFCCs have been predominantly used for the AAI task with deep neural networks. With SSL features working well for various other speech tasks such as speech recognition, emotion classification, etc., we experiment with its efficacy for AAI. We train on SSL features with transformer neural networks-based AAI models of 3 different model complexities and compare its performance with MFCCs in subject-specific (SS), pooled and fine-tuned (FT) configurations with data from 10 subjects, and evaluate with correlation coefficient (CC) score on the unseen sentence test set. We find that acoustic feature reconstruction objective-based SSL features such as TERA and DeCoAR work well for AAI, with SS CCs of these SSL features reaching close to the best FT CCs of MFCC. We also find the results consistent across different model sizes. △ Less

Submitted 30 October, 2022; originally announced October 2022.

Comments: submitted to ICASSP 2023

arXiv:2210.12921 [pdf]

Investigating self-supervised, weakly supervised and fully supervised training approaches for multi-domain automatic speech recognition: a study on Bangladeshi Bangla

Authors: Ahnaf Mozib Samin, M. Humayon Kobir, Md. Mushtaq Shahriyar Rafee, M. Firoz Ahmed, Mehedi Hasan, Partha Ghosh, Shafkat Kibria, M. Shahidur Rahman

Abstract: Despite huge improvements in automatic speech recognition (ASR) employing neural networks, ASR systems still suffer from a lack of robustness and generalizability issues due to domain shifting. This is mainly because principal corpus design criteria are often not identified and examined adequately while compiling ASR datasets. In this study, we investigate the robustness of the state-of-the-art tr… ▽ More Despite huge improvements in automatic speech recognition (ASR) employing neural networks, ASR systems still suffer from a lack of robustness and generalizability issues due to domain shifting. This is mainly because principal corpus design criteria are often not identified and examined adequately while compiling ASR datasets. In this study, we investigate the robustness of the state-of-the-art transfer learning approaches such as self-supervised wav2vec 2.0 and weakly supervised Whisper as well as fully supervised convolutional neural networks (CNNs) for multi-domain ASR. We also demonstrate the significance of domain selection while building a corpus by assessing these models on a novel multi-domain Bangladeshi Bangla ASR evaluation benchmark - BanSpeech, which contains approximately 6.52 hours of human-annotated speech and 8085 utterances from 13 distinct domains. SUBAK.KO, a mostly read speech corpus for the morphologically rich language Bangla, has been used to train the ASR systems. Experimental evaluation reveals that self-supervised cross-lingual pre-training is the best strategy compared to weak supervision and full supervision to tackle the multi-domain ASR task. Moreover, the ASR models trained on SUBAK.KO face difficulty recognizing speech from domains with mostly spontaneous speech. The BanSpeech will be publicly available to meet the need for a challenging evaluation benchmark for Bangla ASR. △ Less

Submitted 10 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

arXiv:2207.08930 [pdf, other]

Cooperative Infrastructure Perception

Authors: Fawad Ahmad, Christina Suyong Shin, Weiwu Pang, Branden Leong, Pradipta Ghosh, Ramesh Govindan

Abstract: Recent works have considered two qualitatively different approaches to overcome line-of-sight limitations of 3D sensors used for perception: cooperative perception and infrastructure-augmented perception. In this paper, motivated by increasing deployments of infrastructure LiDARs, we explore a third approach, cooperative infrastructure perception. This approach generates perception outputs by fusi… ▽ More Recent works have considered two qualitatively different approaches to overcome line-of-sight limitations of 3D sensors used for perception: cooperative perception and infrastructure-augmented perception. In this paper, motivated by increasing deployments of infrastructure LiDARs, we explore a third approach, cooperative infrastructure perception. This approach generates perception outputs by fusing outputs of multiple infrastructure sensors, but, to be useful, must do so quickly and accurately. We describe the design, implementation and evaluation of Cooperative Infrastructure Perception (CIP), which uses a combination of novel algorithms and systems optimizations. It produces perception outputs within 100 ms using modest computing resources and with accuracy comparable to the state-of-the-art. CIP, when used to augment vehicle perception, can improve safety. When used in conjunction with offloaded planning, CIP can increase traffic throughput at intersections. △ Less

Submitted 26 June, 2024; v1 submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.13000 [pdf, other]

doi 10.1109/COINS54846.2022.9854934

An Automated Deployment and Testing Framework for Resilient Distributed Smart Grid Applications

Authors: Purboday Ghosh, Hao Tu, Timothy Krentz, Gabor Karsai, Srdjan Lukic

Abstract: Executing distributed cyber-physical software processes on edge devices that maintains the resiliency of the overall system while adhering to resource constraints is quite a challenging trade-off to consider for developers. Current approaches do not solve this problem of deploying software components to devices in a way that satisfies different resilience requirements that can be encoded by develo… ▽ More Executing distributed cyber-physical software processes on edge devices that maintains the resiliency of the overall system while adhering to resource constraints is quite a challenging trade-off to consider for developers. Current approaches do not solve this problem of deploying software components to devices in a way that satisfies different resilience requirements that can be encoded by developers at design time. This paper introduces a resilient deployment framework that can achieve that by accepting user-defined constraints to optimize redundancy or cost for a given application deployment. Experiments with a microgrid energy management application developed using a decentralized software platform show that the deployment configuration can play an important role in enhancing the resilience capabilities of distributed applications as well as reducing the resource demands on individual nodes even without modifying the control logic. △ Less

Submitted 26 June, 2022; originally announced June 2022.

Comments: accepted, pending publication

arXiv:2206.12952 [pdf, other]

Nonwatertight Mesh Reconstruction

Authors: Partha Ghosh

Abstract: Reconstructing 3D non-watertight mesh from an unoriented point cloud is an unexplored area in computer vision and computer graphics. In this project, we tried to tackle this problem by extending the learning-based watertight mesh reconstruction pipeline presented in the paper 'Shape as Points'. The core of our approach is to cast the problem as a semantic segmentation problem that identifies the r… ▽ More Reconstructing 3D non-watertight mesh from an unoriented point cloud is an unexplored area in computer vision and computer graphics. In this project, we tried to tackle this problem by extending the learning-based watertight mesh reconstruction pipeline presented in the paper 'Shape as Points'. The core of our approach is to cast the problem as a semantic segmentation problem that identifies the region in the 3D volume where the mesh surface lies and extracts the surfaces from the detected regions. Our approach achieves compelling results compared to the baseline techniques. △ Less

Submitted 26 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2106.03452 by other authors

arXiv:2206.11563 [pdf, other]

LED: Latent Variable-based Estimation of Density

Authors: Omri Ben-Dov, Pravir Singh Gupta, Victoria Fernandez Abrevaya, Michael J. Black, Partha Ghosh

Abstract: Modern generative models are roughly divided into two main categories: (1) models that can produce high-quality random samples, but cannot estimate the exact density of new data points and (2) those that provide exact density estimation, at the expense of sample quality and compactness of the latent space. In this work we propose LED, a new generative model closely related to GANs, that allows not… ▽ More Modern generative models are roughly divided into two main categories: (1) models that can produce high-quality random samples, but cannot estimate the exact density of new data points and (2) those that provide exact density estimation, at the expense of sample quality and compactness of the latent space. In this work we propose LED, a new generative model closely related to GANs, that allows not only efficient sampling but also efficient density estimation. By maximizing log-likelihood on the output of the discriminator, we arrive at an alternative adversarial optimization objective that encourages generated data diversity. This formulation provides insights into the relationships between several popular generative models. Additionally, we construct a flow-based generator that can compute exact probabilities for generated samples, while allowing low-dimensional latent variables as input. Our experimental results, on various datasets, show that our density estimator produces accurate estimates, while retaining good quality in the generated samples. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2206.01263 [pdf, other]

doi 10.1109/TMTT.2022.3217138

Deep Learning Architecture Based Approach For 2D-Simulation of Microwave Plasma Interaction

Authors: Mihir Desai, Pratik Ghosh, Ahlad Kumar, Bhaskar Chaudhury

Abstract: This paper presents a convolutional neural network (CNN)-based deep learning model, inspired from UNet with series of encoder and decoder units with skip connections, for the simulation of microwave-plasma interaction. The microwave propagation characteristics in complex plasma medium pertaining to transmission, absorption and reflection primarily depends on the ratio of electromagnetic (EM) wave… ▽ More This paper presents a convolutional neural network (CNN)-based deep learning model, inspired from UNet with series of encoder and decoder units with skip connections, for the simulation of microwave-plasma interaction. The microwave propagation characteristics in complex plasma medium pertaining to transmission, absorption and reflection primarily depends on the ratio of electromagnetic (EM) wave frequency and electron plasma frequency, and the plasma density profile. The scattering of a plane EM wave with fixed frequency (1 GHz) and amplitude incident on a plasma medium with different gaussian density profiles (in the range of $1\times 10^{17}-1\times 10^{22}{m^{-3}}$) have been considered. The training data associated with microwave-plasma interaction has been generated using 2D-FDTD (Finite Difference Time Domain) based simulations. The trained deep learning model is then used to reproduce the scattered electric field values for the 1GHz incident microwave on different plasma profiles with error margin of less than 2\%. We propose a complete deep learning (DL) based pipeline to train, validate and evaluate the model. We compare the results of the network, using various metrics like SSIM index, average percent error and mean square error, with the physical data obtained from well-established FDTD based EM solvers. To the best of our knowledge, this is the first effort towards exploring a DL based approach for the simulation of complex microwave plasma interaction. The deep learning technique proposed in this work is significantly fast as compared to the existing computational techniques, and can be used as a new, prospective and alternative computational approach for investigating microwave-plasma interaction in a real time scenario. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2204.08106 [pdf, other]

A New Dynamic Algorithm for Densest Subhypergraphs

Authors: Suman K. Bera, Sayan Bhattacharya, Jayesh Choudhari, Prantar Ghosh

Abstract: Computing a dense subgraph is a fundamental problem in graph mining, with a diverse set of applications ranging from electronic commerce to community detection in social networks. In many of these applications, the underlying context is better modelled as a weighted hypergraph that keeps evolving with time. This motivates the problem of maintaining the densest subhypergraph of a weighted hypergr… ▽ More Computing a dense subgraph is a fundamental problem in graph mining, with a diverse set of applications ranging from electronic commerce to community detection in social networks. In many of these applications, the underlying context is better modelled as a weighted hypergraph that keeps evolving with time. This motivates the problem of maintaining the densest subhypergraph of a weighted hypergraph in a {\em dynamic setting}, where the input keeps changing via a sequence of updates (hyperedge insertions/deletions). Previously, the only known algorithm for this problem was due to Hu et al. [HWC17]. This algorithm worked only on unweighted hypergraphs, and had an approximation ratio of $(1+ε)r^2$ and an update time of $O(\text{poly} (r, \log n))$, where $r$ denotes the maximum rank of the input across all the updates. We obtain a new algorithm for this problem, which works even when the input hypergraph is weighted. Our algorithm has a significantly improved (near-optimal) approximation ratio of $(1+ε)$ that is independent of $r$, and a similar update time of $O(\text{poly} (r, \log n))$. It is the first $(1+ε)$-approximation algorithm even for the special case of weighted simple graphs. To complement our theoretical analysis, we perform experiments with our dynamic algorithm on large-scale, real-world data-sets. Our algorithm significantly outperforms the state of the art [HWC17] both in terms of accuracy and efficiency. △ Less

Submitted 17 April, 2022; originally announced April 2022.

Comments: Extended abstract appears in TheWebConf (previously WWW) 2022

arXiv:2204.06502 [pdf, ps, other]

Study of Indian English Pronunciation Variabilities relative to Received Pronunciation

Authors: Priyanshi Pal, Shelly Jain, Anil Vuppala, Chiranjeevi Yarra, Prasanta Ghosh

Abstract: Analysis of Indian English (IE) pronunciation variabilities are useful in building systems for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) synthesis in the Indian context. Typically, these pronunciation variabilities have been explored by comparing IE pronunciation with Received Pronunciation (RP). However, to explore these variabilities, it is required to have labelled pronunciati… ▽ More Analysis of Indian English (IE) pronunciation variabilities are useful in building systems for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) synthesis in the Indian context. Typically, these pronunciation variabilities have been explored by comparing IE pronunciation with Received Pronunciation (RP). However, to explore these variabilities, it is required to have labelled pronunciation data at the phonetic level, which is scarce for IE. Moreover, versatility of IE stems from the influence of a large diversity of the speakers' mother tongues and demographic region differences. Prior linguistic works have characterised features of IE variabilities qualitatively by reporting phonetic rules that represent such variations relative to RP. The qualitative descriptions often lack quantitative descriptors and data-driven analysis of diverse IE pronunciation data to characterise IE on the phonetic level. To address these issues, in this work, we consider a corpus, Indic TIMIT, containing a large set of IE varieties from 80 speakers from various regions of India. We present an analysis to obtain the new set of phonetic rules representing IE pronunciation variabilities relative to RP in a data-driven manner. We do this using 15,974 phonetic transcriptions, of which 13,632 were obtained manually in addition to those part of the corpus. Furthermore, we validate the rules obtained from the analysis against the existing phonetic rules to identify the relevance of the obtained phonetic rules and test the efficacy of Grapheme-to-Phoneme (G2P) conversion developed based on the obtained rules considering Phoneme Error Rate (PER) as the metric for performance. △ Less

Submitted 9 December, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

arXiv:2203.17076 [pdf, other]

doi 10.1109/TGRS.2022.3196057

Deep Hyperspectral Unmixing using Transformer Network

Authors: Preetam Ghosh, Swalpa Kumar Roy, Bikram Koirala, Behnood Rasti, Paul Scheunders

Abstract: Currently, this paper is under review in IEEE. Transformers have intrigued the vision research community with their state-of-the-art performance in natural language processing. With their superior performance, transformers have found their way in the field of hyperspectral image classification and achieved promising results. In this article, we harness the power of transformers to conquer the task… ▽ More Currently, this paper is under review in IEEE. Transformers have intrigued the vision research community with their state-of-the-art performance in natural language processing. With their superior performance, transformers have found their way in the field of hyperspectral image classification and achieved promising results. In this article, we harness the power of transformers to conquer the task of hyperspectral unmixing and propose a novel deep unmixing model with transformers. We aim to utilize the ability of transformers to better capture the global feature dependencies in order to enhance the quality of the endmember spectra and the abundance maps. The proposed model is a combination of a convolutional autoencoder and a transformer. The hyperspectral data is encoded by the convolutional encoder. The transformer captures long-range dependencies between the representations derived from the encoder. The data are reconstructed using a convolutional decoder. We applied the proposed unmixing model to three widely used unmixing datasets, i.e., Samson, Apex, and Washington DC mall and compared it with the state-of-the-art in terms of root mean squared error and spectral angle distance. The source code for the proposed model will be made publicly available at \url{https://github.com/preetam22n/DeepTrans-HSU}. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: Currently, this paper is under review in IEEE

arXiv:2203.06004 [pdf, other]

An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production

Authors: Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh

Abstract: The best performance in Air-tissue boundary (ATB) segmentation of real-time Magnetic Resonance Imaging (rtMRI) videos in speech production is known to be achieved by a 3-dimensional convolutional neural network (3D-CNN) model. However, the evaluation of this model, as well as other ATB segmentation techniques reported in the literature, is done using Dynamic Time Warping (DTW) distance between the… ▽ More The best performance in Air-tissue boundary (ATB) segmentation of real-time Magnetic Resonance Imaging (rtMRI) videos in speech production is known to be achieved by a 3-dimensional convolutional neural network (3D-CNN) model. However, the evaluation of this model, as well as other ATB segmentation techniques reported in the literature, is done using Dynamic Time Warping (DTW) distance between the entire original and predicted contours. Such an evaluation measure may not capture local errors in the predicted contour. Careful analysis of predicted contours reveals errors in regions like the velum part of contour1 (ATB comprising of upper lip, hard palate, and velum) and tongue base section of contour2 (ATB covering jawline, lower lip, tongue base, and epiglottis), which are not captured in a global evaluation metric like DTW distance. In this work, we automatically detect such errors and propose a correction scheme for the same. We also propose two new evaluation metrics for ATB segmentation separately in contour1 and contour2 to explicitly capture two types of errors in these contours. The proposed detection and correction strategies result in an improvement of these two evaluation metrics by 61.8% and 61.4% for contour1 and by 67.8% and 28.4% for contour2. Traditional DTW distance, on the other hand, improves by 44.6% for contour1 and 4.0% for contour2. △ Less

Submitted 8 March, 2022; originally announced March 2022.

Comments: accepted for ICASSP 2022

arXiv:2202.12440 [pdf, other]

On Learning and Testing of Counterfactual Fairness through Data Preprocessing

Authors: Haoyu Chen, Wenbin Lu, Rui Song, Pulak Ghosh

Abstract: Machine learning has become more important in real-life decision-making but people are concerned about the ethical problems it may bring when used improperly. Recent work brings the discussion of machine learning fairness into the causal framework and elaborates on the concept of Counterfactual Fairness. In this paper, we develop the Fair Learning through dAta Preprocessing (FLAP) algorithm to lea… ▽ More Machine learning has become more important in real-life decision-making but people are concerned about the ethical problems it may bring when used improperly. Recent work brings the discussion of machine learning fairness into the causal framework and elaborates on the concept of Counterfactual Fairness. In this paper, we develop the Fair Learning through dAta Preprocessing (FLAP) algorithm to learn counterfactually fair decisions from biased training data and formalize the conditions where different data preprocessing procedures should be used to guarantee counterfactual fairness. We also show that Counterfactual Fairness is equivalent to the conditional independence of the decisions and the sensitive attributes given the processed non-sensitive attributes, which enables us to detect discrimination in the original decision using the processed data. The performance of our algorithm is illustrated using simulated data and real-world applications. △ Less

Submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.07448 [pdf, other]

Towards a Unified Pandemic Management Architecture: Survey, Challenges and Future Directions

Authors: Satyaki Roy, Nirnay Ghosh, Nitish Uplavikar, Preetam Ghosh

Abstract: The pandemic caused by SARS-CoV-2 has left an unprecedented impact on health, economy and society worldwide. Emerging strains are making pandemic management increasingly challenging. There is an urge to collect epidemiological, clinical, and physiological data to make an informed decision on mitigation measures. Advances in the Internet of Things (IoT) and edge computing provide solutions for pand… ▽ More The pandemic caused by SARS-CoV-2 has left an unprecedented impact on health, economy and society worldwide. Emerging strains are making pandemic management increasingly challenging. There is an urge to collect epidemiological, clinical, and physiological data to make an informed decision on mitigation measures. Advances in the Internet of Things (IoT) and edge computing provide solutions for pandemic management through data collection and intelligent computation. While existing data-driven architectures attempt to automate decision-making, they do not capture the multifaceted interaction among computational models, communication infrastructure, and the generated data. In this paper, we perform a survey of the existing approaches for pandemic management, including online data repositories and contact-tracing applications. We then envision a unified pandemic management architecture that leverages the IoT and edge computing to automate recommendations on vaccine distribution, dynamic lockdown, mobility scheduling and pandemic prediction. We elucidate the flow of data among the layers of the architecture, namely, cloud, edge and end device layers. Moreover, we address the privacy implications, threats, regulations, and existing solutions that may be adapted to optimize the utility of health data with security guarantees. The paper ends with a lowdown on the limitations of the architecture and research directions to enhance its practicality. △ Less

Submitted 3 February, 2022; originally announced February 2022.

Comments: 30 pages and 10 figures

arXiv:2112.06848 [pdf, ps, other]

Peer-to-Peer Communication Trade-Offs for Smart Grid Applications

Authors: Purboday Ghosh, Shashank Shekhar, Yashen Lin, Ulrich Muenz, Gabor Karsai

Abstract: Virtual topologies in peer-to-peer networks can reduce the traffic consumed by altering the logical connectivity of peers without altering the underlying network. However, such sparsely connected virtual topologies do not focus on the needs for smart grid applications, which is information dissemination throughout the network, and in turn degrade the performance of distributed control algorithms r… ▽ More Virtual topologies in peer-to-peer networks can reduce the traffic consumed by altering the logical connectivity of peers without altering the underlying network. However, such sparsely connected virtual topologies do not focus on the needs for smart grid applications, which is information dissemination throughout the network, and in turn degrade the performance of distributed control algorithms running on peer-to-peer networks. This paper provides a flexible solution for application developers to prototype and deploy different virtual topologies that balances these trade-offs. First, it introduces a configurable virtual communication topology framework, TopLinkMgr, which enables users to specify any chosen connectivity configuration and deploy peer-to-peer applications using it. Second, it proposes a novel fault-tolerant self-adaptive virtual topology management algorithm, Bounded Path Dissemination, that can ensure the dissemination of information to all peers within a specified threshold. Experiments show that the algorithm improves on convergence speed and accuracy over state-of-the-art methods and is also robust against node failures while consuming significantly less communication bandwidth. △ Less

Submitted 4 May, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: 10 pages, 6 figures

arXiv:2112.04598 [pdf, other]

InvGAN: Invertible GANs

Authors: Partha Ghosh, Dominik Zietlow, Michael J. Black, Larry S. Davis, Xiaochen Hu

Abstract: Generation of photo-realistic images, semantic editing and representation learning are a few of many potential applications of high resolution generative models. Recent progress in GANs have established them as an excellent choice for such tasks. However, since they do not provide an inference model, image editing or downstream tasks such as classification can not be done on real images using the… ▽ More Generation of photo-realistic images, semantic editing and representation learning are a few of many potential applications of high resolution generative models. Recent progress in GANs have established them as an excellent choice for such tasks. However, since they do not provide an inference model, image editing or downstream tasks such as classification can not be done on real images using the GAN latent space. Despite numerous efforts to train an inference model or design an iterative method to invert a pre-trained generator, previous methods are dataset (e.g. human face images) and architecture (e.g. StyleGAN) specific. These methods are nontrivial to extend to novel datasets or architectures. We propose a general framework that is agnostic to architecture and datasets. Our key insight is that, by training the inference and the generative model together, we allow them to adapt to each other and to converge to a better quality model. Our \textbf{InvGAN}, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model. This allows us to perform image inpainting, merging, interpolation and online data augmentation. We demonstrate this with extensive qualitative and quantitative experiments. △ Less

Submitted 10 December, 2021; v1 submitted 8 December, 2021; originally announced December 2021.

arXiv:2112.04151 [pdf, ps, other]

A study on native American English speech recognition by Indian listeners with varying word familiarity level

Authors: Abhayjeet Singh, Achuth Rao MV, Rakesh Vaideeswaran, Chiranjeevi Yarra, Prasanta Kumar Ghosh

Abstract: In this study, listeners of varied Indian nativities are asked to listen and recognize TIMIT utterances spoken by American speakers. We have three kinds of responses from each listener while they recognize an utterance: 1. Sentence difficulty ratings, 2. Speaker difficulty ratings, and 3. Transcription of the utterance. From these transcriptions, word error rate (WER) is calculated and used as a m… ▽ More In this study, listeners of varied Indian nativities are asked to listen and recognize TIMIT utterances spoken by American speakers. We have three kinds of responses from each listener while they recognize an utterance: 1. Sentence difficulty ratings, 2. Speaker difficulty ratings, and 3. Transcription of the utterance. From these transcriptions, word error rate (WER) is calculated and used as a metric to evaluate the similarity between the recognized and the original sentences.The sentences selected in this study are categorized into three groups: Easy, Medium and Hard, based on the frequency ofoccurrence of the words in them. We observe that the sentence, speaker difficulty ratings and the WERs increase from easy to hard categories of sentences. We also compare the human speech recognition performance with that using three automatic speech recognition (ASR) under following three combinations of acoustic model (AM) and language model(LM): ASR1) AM trained with recordings from speakers of Indian origin and LM built on TIMIT text, ASR2) AM using recordings from native American speakers and LM built ontext from LIBRI speech corpus, and ASR3) AM using recordings from native American speakers and LM build on LIBRI speech and TIMIT text. We observe that HSR performance is similar to that of ASR1 whereas ASR3 achieves the best performance. Speaker nativity wise analysis shows that utterances from speakers of some nativity are more difficult to recognize by Indian listeners compared to few other nativities △ Less

Submitted 8 December, 2021; originally announced December 2021.

Comments: 6 pages, 5 figues, COCOSDA 2021

arXiv:2109.11130 [pdf, other]

Adversarially Robust Coloring for Graph Streams

Authors: Amit Chakrabarti, Prantar Ghosh, Manuel Stoeckl

Abstract: A streaming algorithm is considered to be adversarially robust if it provides correct outputs with high probability even when the stream updates are chosen by an adversary who may observe and react to the past outputs of the algorithm. We grow the burgeoning body of work on such algorithms in a new direction by studying robust algorithms for the problem of maintaining a valid vertex coloring of an… ▽ More A streaming algorithm is considered to be adversarially robust if it provides correct outputs with high probability even when the stream updates are chosen by an adversary who may observe and react to the past outputs of the algorithm. We grow the burgeoning body of work on such algorithms in a new direction by studying robust algorithms for the problem of maintaining a valid vertex coloring of an $n$-vertex graph given as a stream of edges. Following standard practice, we focus on graphs with maximum degree at most $Δ$ and aim for colorings using a small number $f(Δ)$ of colors. A recent breakthrough (Assadi, Chen, and Khanna; SODA~2019) shows that in the standard, non-robust, streaming setting, $(Δ+1)$-colorings can be obtained while using only $\widetilde{O}(n)$ space. Here, we prove that an adversarially robust algorithm running under a similar space bound must spend almost $Ω(Δ^2)$ colors and that robust $O(Δ)$-coloring requires a linear amount of space, namely $Ω(nΔ)$. We in fact obtain a more general lower bound, trading off the space usage against the number of colors used. From a complexity-theoretic standpoint, these lower bounds provide (i)~the first significant separation between adversarially robust algorithms and ordinary randomized algorithms for a natural problem on insertion-only streams and (ii)~the first significant separation between randomized and deterministic coloring algorithms for graph streams, since deterministic streaming algorithms are automatically robust. We complement our lower bounds with a suite of positive results, giving adversarially robust coloring algorithms using sublinear space. In particular, we can maintain an $O(Δ^2)$-coloring using $\widetilde{O}(n \sqrtΔ)$ space and an $O(Δ^3)$-coloring using $\widetilde{O}(n)$ space. △ Less

Submitted 22 September, 2021; originally announced September 2021.

arXiv:2109.09020 [pdf, other]

Multimodal Classification: Current Landscape, Taxonomy and Future Directions

Authors: William C. Sleeman IV, Rishabh Kapoor, Preetam Ghosh

Abstract: Multimodal classification research has been gaining popularity in many domains that collect more data from multiple sources including satellite imagery, biometrics, and medicine. However, the lack of consistent terminology and architectural descriptions makes it difficult to compare different existing solutions. We address these challenges by proposing a new taxonomy for describing such systems ba… ▽ More Multimodal classification research has been gaining popularity in many domains that collect more data from multiple sources including satellite imagery, biometrics, and medicine. However, the lack of consistent terminology and architectural descriptions makes it difficult to compare different existing solutions. We address these challenges by proposing a new taxonomy for describing such systems based on trends found in recent publications on multimodal classification. Many of the most difficult aspects of unimodal classification have not yet been fully addressed for multimodal datasets including big data, class imbalance, and instance level difficulty. We also provide a discussion of these challenges and future directions. △ Less

Submitted 18 September, 2021; originally announced September 2021.

Comments: 24 pages, 3 tables, 7 figures

ACM Class: I.5.2

arXiv:2107.07875 [pdf, other]

A Penalized Shared-parameter Algorithm for Estimating Optimal Dynamic Treatment Regimens

Authors: Palash Ghosh, Trikay Nalamada, Shruti Agarwal, Maria Jahja, Bibhas Chakraborty

Abstract: A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-lea… ▽ More A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-learning setup, and identify the condition in which Q-shared fails. Leveraging properties from expansion-constrained ordinary least-squares, we give a penalized Q-shared algorithm that not only converges in settings that violate the condition, but can outperform the original Q-shared algorithm even when the condition is satisfied. We give evidence for the proposed method in a real-world application and several synthetic simulations. △ Less

Submitted 26 May, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:2106.14292 [pdf, other]

Knee Osteoarthritis Severity Prediction using an Attentive Multi-Scale Deep Convolutional Neural Network

Authors: Rohit Kumar Jain, Prasen Kumar Sharma, Sibaji Gaj, Arijit Sur, Palash Ghosh

Abstract: Knee Osteoarthritis (OA) is a destructive joint disease identified by joint stiffness, pain, and functional disability concerning millions of lives across the globe. It is generally assessed by evaluating physical symptoms, medical history, and other joint screening tests like radiographs, Magnetic Resonance Imaging (MRI), and Computed Tomography (CT) scans. Unfortunately, the conventional methods… ▽ More Knee Osteoarthritis (OA) is a destructive joint disease identified by joint stiffness, pain, and functional disability concerning millions of lives across the globe. It is generally assessed by evaluating physical symptoms, medical history, and other joint screening tests like radiographs, Magnetic Resonance Imaging (MRI), and Computed Tomography (CT) scans. Unfortunately, the conventional methods are very subjective, which forms a barrier in detecting the disease progression at an early stage. This paper presents a deep learning-based framework, namely OsteoHRNet, that automatically assesses the Knee OA severity in terms of Kellgren and Lawrence (KL) grade classification from X-rays. As a primary novelty, the proposed approach is built upon one of the most recent deep models, called the High-Resolution Network (HRNet), to capture the multi-scale features of knee X-rays. In addition, we have also incorporated an attention mechanism to filter out the counterproductive features and boost the performance further. Our proposed model has achieved the best multiclass accuracy of 71.74% and MAE of 0.311 on the baseline cohort of the OAI dataset, which is a remarkable gain over the existing best-published works. We have also employed the Gradient-based Class Activation Maps (Grad-CAMs) visualization to justify the proposed network learning. △ Less

Submitted 27 June, 2021; originally announced June 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2106.00639 [pdf, other]

Multi-modal Point-of-Care Diagnostics for COVID-19 Based On Acoustics and Symptoms

Authors: Srikanth Raj Chetupalli, Prashant Krishnan, Neeraj Sharma, Ananya Muguli, Rohit Kumar, Viral Nanda, Lancelot Mark Pinto, Prasanta Kumar Ghosh, Sriram Ganapathy

Abstract: The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a… ▽ More The research direction of identifying acoustic bio-markers of respiratory diseases has received renewed interest following the onset of COVID-19 pandemic. In this paper, we design an approach to COVID-19 diagnostic using crowd-sourced multi-modal data. The data resource, consisting of acoustic signals like cough, breathing, and speech signals, along with the data of symptoms, are recorded using a web-application over a period of ten months. We investigate the use of statistical descriptors of simple time-frequency features for acoustic signals and binary features for the presence of symptoms. Unlike previous works, we primarily focus on the application of simple linear classifiers like logistic regression and support vector machines for acoustic data while decision tree models are employed on the symptoms data. We show that a multi-modal integration of acoustics and symptoms classifiers achieves an area-under-curve (AUC) of 92.40, a significant improvement over any individual modality. Several ablation experiments are also provided which highlight the acoustic and symptom dimensions that are important for the task of COVID-19 diagnostics. △ Less

Submitted 5 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: The Manuscript is submitted to IEEE-EMBS Journal of Biomedical and Health Informatics on June 1, 2021

arXiv:2105.08215 [pdf, ps, other]

Vertex Ordering Problems in Directed Graph Streams

Authors: Amit Chakrabarti, Prantar Ghosh, Andrew McGregor, Sofya Vorotnikova

Abstract: We consider directed graph algorithms in a streaming setting, focusing on problems concerning orderings of the vertices. This includes such fundamental problems as topological sorting and acyclicity testing. We also study the related problems of finding a minimum feedback arc set (edges whose removal yields an acyclic graph), and finding a sink vertex. We are interested in both adversarially-order… ▽ More We consider directed graph algorithms in a streaming setting, focusing on problems concerning orderings of the vertices. This includes such fundamental problems as topological sorting and acyclicity testing. We also study the related problems of finding a minimum feedback arc set (edges whose removal yields an acyclic graph), and finding a sink vertex. We are interested in both adversarially-ordered and randomly-ordered streams. For arbitrary input graphs with edges ordered adversarially, we show that most of these problems have high space complexity, precluding sublinear-space solutions. Some lower bounds also apply when the stream is randomly ordered: e.g., in our most technical result we show that testing acyclicity in the $p$-pass random-order model requires roughly $n^{1+1/p}$ space. For other problems, random ordering can make a dramatic difference: e.g., it is possible to find a sink in an acyclic tournament in the one-pass random-order model using polylog$(n)$ space whereas under adversarial ordering roughly $n^{1/p}$ space is necessary and sufficient given $Θ(p)$ passes. We also design sublinear algorithms for the feedback arc set problem in tournament graphs; for random graphs; and for randomly ordered streams. In some cases, we give lower bounds establishing that our algorithms are essentially space-optimal. Together, our results complement the much maturer body of work on algorithms for undirected graph streams. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: Appeared in SODA 2020

arXiv:2104.05017 [pdf, other]

Estimating articulatory movements in speech production with transformer networks

Authors: Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh

Abstract: We estimate articulatory movements in speech production from different modalities - acoustics and phonemes. Acoustic-to articulatory inversion (AAI) is a sequence-to-sequence task. On the other hand, phoneme to articulatory (PTA) motion estimation faces a key challenge in reliably aligning the text and the articulatory movements. To address this challenge, we explore the use of a transformer archi… ▽ More We estimate articulatory movements in speech production from different modalities - acoustics and phonemes. Acoustic-to articulatory inversion (AAI) is a sequence-to-sequence task. On the other hand, phoneme to articulatory (PTA) motion estimation faces a key challenge in reliably aligning the text and the articulatory movements. To address this challenge, we explore the use of a transformer architecture - FastSpeech, with explicit duration modelling to learn hard alignments between the phonemes and articulatory movements. We also train a transformer model on AAI. We use correlation coefficient (CC) and root mean squared error (rMSE) to assess the estimation performance in comparison to existing methods on both tasks. We observe 154%, 11.8% & 4.8% relative improvement in CC with subject-dependent, pooled and fine-tuning strategies, respectively, for PTA estimation. Additionally, on the AAI task, we obtain 1.5%, 3% and 3.1% relative gain in CC on the same setups compared to the state-of-the-art baseline. We further present the computational benefits of having transformer architecture as representation blocks. △ Less

Submitted 12 June, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

Comments: accepted for oral presentation at INTERSPEECH 2021

arXiv:2104.00235 [pdf, ps, other]

doi 10.21437/Interspeech.2021-1339

Multilingual and code-switching ASR challenges for low resource Indian languages

Authors: Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

Abstract: Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple language… ▽ More Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today's world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple languages are freely interchanged within a single sentence or between sentences. The success of low-resource multilingual and code-switching ASR often depends on the variety of languages in terms of their acoustics, linguistic characteristics as well as the amount of data available and how these are carefully considered in building the ASR system. In this challenge, we would like to focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages, namely Hindi, Marathi, Odia, Tamil, Telugu, Gujarati and Bengali. For this purpose, we provide a total of ~600 hours of transcribed speech data, comprising train and test sets, in these languages including two code-switched language pairs, Hindi-English and Bengali-English. We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively. △ Less

Submitted 31 March, 2021; originally announced April 2021.

Comments: 6 pages

arXiv:2103.09148 [pdf, other]

DiCOVA Challenge: Dataset, task, and baseline system for COVID-19 diagnosis using acoustics

Authors: Ananya Muguli, Lancelot Pinto, Nirmala R., Neeraj Sharma, Prashant Krishnan, Prasanta Kumar Ghosh, Rohit Kumar, Shrirama Bhat, Srikanth Raj Chetupalli, Sriram Ganapathy, Shreyas Ramoji, Viral Nanda

Abstract: The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification. These… ▽ More The DiCOVA challenge aims at accelerating research in diagnosing COVID-19 using acoustics (DiCOVA), a topic at the intersection of speech and audio processing, respiratory health diagnosis, and machine learning. This challenge is an open call for researchers to analyze a dataset of sound recordings collected from COVID-19 infected and non-COVID-19 individuals for a two-class classification. These recordings were collected via crowdsourcing from multiple countries, through a website application. The challenge features two tracks, one focusing on cough sounds, and the other on using a collection of breath, sustained vowel phonation, and number counting speech recordings. In this paper, we introduce the challenge and provide a detailed description of the task, and present a baseline system for the task. △ Less

Submitted 17 June, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: To appear in Proceedings of Interspeech, 2021

arXiv:2012.11581 [pdf, other]

Populating 3D Scenes by Learning Human-Scene Interaction

Authors: Mohamed Hassan, Partha Ghosh, Joachim Tesch, Dimitrios Tzionas, Michael J. Black

Abstract: Humans live within a 3D space and constantly interact with it to perform tasks. Such interactions involve physical contact between surfaces that is semantically meaningful. Our goal is to learn how humans interact with scenes and leverage this to enable virtual characters to do the same. To that end, we introduce a novel Human-Scene Interaction (HSI) model that encodes proximal relationships, call… ▽ More Humans live within a 3D space and constantly interact with it to perform tasks. Such interactions involve physical contact between surfaces that is semantically meaningful. Our goal is to learn how humans interact with scenes and leverage this to enable virtual characters to do the same. To that end, we introduce a novel Human-Scene Interaction (HSI) model that encodes proximal relationships, called POSA for "Pose with prOximitieS and contActs". The representation of interaction is body-centric, which enables it to generalize to new scenes. Specifically, POSA augments the SMPL-X parametric human body model such that, for every mesh vertex, it encodes (a) the contact probability with the scene surface and (b) the corresponding semantic scene label. We learn POSA with a VAE conditioned on the SMPL-X vertices, and train on the PROX dataset, which contains SMPL-X meshes of people interacting with 3D scenes, and the corresponding scene semantics from the PROX-E dataset. We demonstrate the value of POSA with two applications. First, we automatically place 3D scans of people in scenes. We use a SMPL-X model fit to the scan as a proxy and then find its most likely placement in 3D. POSA provides an effective representation to search for "affordances" in the scene that match the likely contact relationships for that pose. We perform a perceptual study that shows significant improvement over the state of the art on this task. Second, we show that POSA's learned representation of body-scene interaction supports monocular human pose estimation that is consistent with a 3D scene, improving on the state of the art. Our model and code are available for research purposes at https://posa.is.tue.mpg.de. △ Less

Submitted 5 April, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

Journal ref: CVPR2021

Showing 1–50 of 100 results for author: Ghosh, P