Search | arXiv e-print repository

arXiv:2405.19675 [pdf, other]

Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training

Authors: Aisha Urooj Khan, John Garrett, Tyler Bradshaw, Lonie Salkowski, Jiwoong Jason Jeong, Amara Tariq, Imon Banerjee

Abstract: A visual-language model (VLM) pre-trained on natural images and text pairs poses a significant barrier when applied to medical contexts due to domain shift. Yet, adapting or fine-tuning these VLMs for medical use presents considerable hurdles, including domain misalignment, limited access to extensive datasets, and high-class imbalances. Hence, there is a pressing need for strategies to effectivel… ▽ More A visual-language model (VLM) pre-trained on natural images and text pairs poses a significant barrier when applied to medical contexts due to domain shift. Yet, adapting or fine-tuning these VLMs for medical use presents considerable hurdles, including domain misalignment, limited access to extensive datasets, and high-class imbalances. Hence, there is a pressing need for strategies to effectively adapt these VLMs to the medical domain, as such adaptations would prove immensely valuable in healthcare applications. In this study, we propose a framework designed to adeptly tailor VLMs to the medical domain, employing selective sampling and hard-negative mining techniques for enhanced performance in retrieval tasks. We validate the efficacy of our proposed approach by implementing it across two distinct VLMs: the in-domain VLM (MedCLIP) and out-of-domain VLMs (ALBEF). We assess the performance of these models both in their original off-the-shelf state and after undergoing our proposed training strategies, using two extensive datasets containing mammograms and their corresponding reports. Our evaluation spans zero-shot, few-shot, and supervised scenarios. Through our approach, we observe a notable enhancement in Recall@K performance for the image-text retrieval task. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.16402 [pdf, other]

Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions

Authors: Man Luo, Christopher J. Warren, Lu Cheng, Haidar M. Abdul-Muhsin, Imon Banerjee

Abstract: The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support through the development of empathetic, patient-facing chatbots. This study investigates an intriguing question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians? To answer this question, we collect a de-identifi… ▽ More The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support through the development of empathetic, patient-facing chatbots. This study investigates an intriguing question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians? To answer this question, we collect a de-identified dataset of patient messages and physician responses from Mayo Clinic and generate alternative replies using ChatGPT. Our analyses incorporate novel empathy ranking evaluation (EMRank) involving both automated metrics and human assessments to gauge the empathy level of responses. Our findings indicate that LLM-powered chatbots have the potential to surpass human physicians in delivering empathetic communication, suggesting a promising avenue for enhancing patient care and reducing professional burnout. The study not only highlights the importance of empathy in patient interactions but also proposes a set of effective automatic empathy ranking metrics, paving the way for the broader adoption of LLMs in healthcare. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2312.12442 [pdf]

Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC) -- an end-to-end model for characterizing severity and diagnosis

Authors: Thiago Santos, Harish Kamath, Christopher R. McAdams, Mary S. Newell, Marina Mosunjac, Gabriela Oprea-Ilies, Geoffrey Smith, Constance Lehman, Judy Gichoya, Imon Banerjee, Hari Trivedi

Abstract: Automated classification of cancer pathology reports can extract information from unstructured reports and categorize each report into structured diagnosis and severity categories. Thus, such system can reduce the burden for populating tumor registries, help registration for clinical trial as well as developing large dataset for deep learning model development using true pathologic ground truth. H… ▽ More Automated classification of cancer pathology reports can extract information from unstructured reports and categorize each report into structured diagnosis and severity categories. Thus, such system can reduce the burden for populating tumor registries, help registration for clinical trial as well as developing large dataset for deep learning model development using true pathologic ground truth. However, the content of breast pathology reports can be difficult for categorize due to the high linguistic variability in content and wide variety of potential diagnoses >50. Existing NLP models are primarily focused on developing classifier for primary breast cancer types (e.g. IDC, DCIS, ILC) and tumor characteristics, and ignore the rare diagnosis of cancer subtypes. We then developed a hierarchical hybrid transformer-based pipeline (59 labels) - Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC), which utilizes the potential of the transformer context-preserving NLP technique and compared our model to several state of the art ML and DL models. We trained the model on the EUH data and evaluated our model's performance on two external datasets - MGH and Mayo Clinic. We publicly release the code and a live application under Huggingface spaces repository △ Less

Submitted 2 November, 2023; originally announced December 2023.

arXiv:2305.04422 [pdf]

Multivariate Analysis on Performance Gaps of Artificial Intelligence Models in Screening Mammography

Authors: Linglin Zhang, Beatrice Brown-Mulry, Vineela Nalla, InChan Hwang, Judy Wawira Gichoya, Aimilia Gastounioti, Imon Banerjee, Laleh Seyyed-Kalantari, MinJae Woo, Hari Trivedi

Abstract: Although deep learning models for abnormality classification can perform well in screening mammography, the demographic, imaging, and clinical characteristics associated with increased risk of model failure remain unclear. This retrospective study uses the Emory BrEast Imaging Dataset(EMBED) containing mammograms from 115931 patients imaged at Emory Healthcare between 2013-2020, with BI-RADS asses… ▽ More Although deep learning models for abnormality classification can perform well in screening mammography, the demographic, imaging, and clinical characteristics associated with increased risk of model failure remain unclear. This retrospective study uses the Emory BrEast Imaging Dataset(EMBED) containing mammograms from 115931 patients imaged at Emory Healthcare between 2013-2020, with BI-RADS assessment, region of interest coordinates for abnormalities, imaging features, pathologic outcomes, and patient demographics. Multiple deep learning models were trained to distinguish between abnormal tissue patches and randomly selected normal tissue patches from screening mammograms. We assessed model performance by subgroups defined by age, race, pathologic outcome, tissue density, and imaging characteristics and investigated their associations with false negatives (FN) and false positives (FP). We also performed multivariate logistic regression to control for confounding between subgroups. The top-performing model, ResNet152V2, achieved accuracy of 92.6%(95%CI=92.0-93.2%), and AUC 0.975(95%CI=0.972-0.978). Before controlling for confounding, nearly all subgroups showed statistically significant differences in model performance. However, after controlling for confounding, we found lower FN risk associates with Other race(RR=0.828;p=.050), biopsy-proven benign lesions(RR=0.927;p=.011), and mass(RR=0.921;p=.010) or asymmetry(RR=0.854;p=.040); higher FN risk associates with architectural distortion (RR=1.037;p<.001). Higher FP risk associates to BI-RADS density C(RR=1.891;p<.001) and D(RR=2.486;p<.001). Our results demonstrate subgroup analysis is important in mammogram classifier performance evaluation, and controlling for confounding between subgroups elucidates the true associations between variables and model failure. These results can help guide developing future breast cancer detection models. △ Less

Submitted 19 October, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

Comments: 29 pages, 6 tables, 7 figures, 2 supplemental tables

arXiv:2302.01061 [pdf]

MLOps with enhanced performance control and observability

Authors: Indradumna Banerjee, Dinesh Ghanta, Girish Nautiyal, Pradeep Sanchana, Prateek Katageri, Atin Modi

Abstract: The explosion of data and its ever increasing complexity in the last few years, has made MLOps systems more prone to failure, and new tools need to be embedded in such systems to avoid such failure. In this demo, we will introduce crucial tools in the observability module of a MLOps system that target difficult issues like data drfit and model version control for optimum model selection. We believ… ▽ More The explosion of data and its ever increasing complexity in the last few years, has made MLOps systems more prone to failure, and new tools need to be embedded in such systems to avoid such failure. In this demo, we will introduce crucial tools in the observability module of a MLOps system that target difficult issues like data drfit and model version control for optimum model selection. We believe integrating these features in our MLOps pipeline would go a long way in building a robust system immune to early stage ML system failures. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Comments: SECOND INTERNATIONAL CONFERENCE ON AI-ML SYSTEMS

arXiv:2302.00651 [pdf]

Ngram-LSTM Open Rate Prediction Model (NLORP) and Error_accuracy@C metric: Simple effective, and easy to implement approach to predict open rates for marketing email

Authors: Shubham Joshi, Indradumna Banerjee

Abstract: Our generation has seen an exponential increase in digital tools adoption. One of the unique areas where digital tools have made an exponential foray is in the sphere of digital marketing, where goods and services have been extensively promoted through the use of digital advertisements. Following this growth, multiple companies have leveraged multiple apps and channels to display their brand ident… ▽ More Our generation has seen an exponential increase in digital tools adoption. One of the unique areas where digital tools have made an exponential foray is in the sphere of digital marketing, where goods and services have been extensively promoted through the use of digital advertisements. Following this growth, multiple companies have leveraged multiple apps and channels to display their brand identities to a significantly larger user base. This has resulted in products, worth billions of dollars to be sold online. Emails and push notifications have become critical channels to publish advertisement content, to proactively engage with their contacts. Several marketing tools provide a user interface for marketers to design Email and Push messages for digital marketing campaigns. Marketers are also given a predicted open rate for the entered subject line. For enabling marketers generate targeted subject lines, multiple machine learning techniques have been used in the recent past. In particular, deep learning techniques that have established good effectiveness and efficiency. However, these techniques require a sizable amount of labelled training data in order to get good results. The creation of such datasets, particularly those with subject lines that have a specific theme, is a challenging and time-consuming task. In this paper, we propose a novel Ngram and LSTM-based modeling approach (NLORPM) to predict open rates of entered subject lines that is easier to implement, has low prediction latency, and performs extremely well for sparse data. To assess the performance of this model, we also devise a new metric called 'Error_accuracy@C' which is simple to grasp and fully comprehensible to marketers. △ Less

Submitted 14 February, 2023; v1 submitted 25 January, 2023; originally announced February 2023.

arXiv:2212.12454 [pdf]

Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media

Authors: Yuting Guo, Swati Rajwal, Sahithi Lakamana, Chia-Chun Chiang, Paul C. Menell, Adnan H. Shahid, Yi-Chieh Chen, Nikita Chhabra, Wan-Ju Chao, Chieh-Ju Chao, Todd J. Schwedt, Imon Banerjee, Abeed Sarker

Abstract: Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text cla… ▽ More Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: Accepted by AMIA 2023 Informatics Summit

arXiv:2211.07092 [pdf, ps, other]

Offline Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

Authors: Imon Banerjee, Harsha Honnappa, Vinayak Rao

Abstract: In this work, we study a natural nonparametric estimator of the transition probability matrices of a finite controlled Markov chain. We consider an offline setting with a fixed dataset, collected using a so-called logging policy. We develop sample complexity bounds for the estimator and establish conditions for minimaxity. Our statistical bounds depend on the logging policy through its mixing prop… ▽ More In this work, we study a natural nonparametric estimator of the transition probability matrices of a finite controlled Markov chain. We consider an offline setting with a fixed dataset, collected using a so-called logging policy. We develop sample complexity bounds for the estimator and establish conditions for minimaxity. Our statistical bounds depend on the logging policy through its mixing properties. We show that achieving a particular statistical risk bound involves a subtle and interesting trade-off between the strength of the mixing properties and the number of samples. We demonstrate the validity of our results under various examples, such as ergodic Markov chains, weakly ergodic inhomogeneous Markov chains, and controlled Markov chains with non-stationary Markov, episodic, and greedy controls. Lastly, we use these sample complexity bounds to establish concomitant ones for offline evaluation of stationary Markov control policies. △ Less

Submitted 26 January, 2024; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: 71 pages, 23 main

arXiv:2208.08938 [pdf, other]

Meta Sparse Principal Component Analysis

Authors: Imon Banerjee, Jean Honorio

Abstract: We study the meta-learning for support (i.e. the set of non-zero entries) recovery in high-dimensional Principal Component Analysis. We reduce the sufficient sample complexity in a novel task with the information that is learned from auxiliary tasks. We assume each task to be a different random Principal Component (PC) matrix with a possibly different support and that the support union of the PC m… ▽ More We study the meta-learning for support (i.e. the set of non-zero entries) recovery in high-dimensional Principal Component Analysis. We reduce the sufficient sample complexity in a novel task with the information that is learned from auxiliary tasks. We assume each task to be a different random Principal Component (PC) matrix with a possibly different support and that the support union of the PC matrices is small. We then pool the data from all the tasks to execute an improper estimation of a single PC matrix by maximising the $l_1$-regularised predictive covariance to establish that with high probability the true support union can be recovered provided a sufficient number of tasks $m$ and a sufficient number of samples $ O\left(\frac{\log(p)}{m}\right)$ for each task, for $p$-dimensional vectors. Then, for a novel task, we prove that the maximisation of the $l_1$-regularised predictive covariance with the additional constraint that the support is a subset of the estimated support union could reduce the sufficient sample complexity of successful support recovery to $O(\log |J|)$, where $J$ is the support union recovered from the auxiliary tasks. Typically, $|J|$ would be much less than $p$ for sparse matrices. Finally, we demonstrate the validity of our experiments through numerical simulations. △ Less

Submitted 19 August, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

Comments: 29 pages, 7 figures

arXiv:2208.00475 [pdf, other]

Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics

Authors: Xiaoyuan Guo, Jiali Duan, C. -C. Jay Kuo, Judy Wawira Gichoya, Imon Banerjee

Abstract: Language modality within the vision language pretraining framework is innately discretized, endowing each word in the language vocabulary a semantic meaning. In contrast, visual modality is inherently continuous and high-dimensional, which potentially prohibits the alignment as well as fusion between vision and language modalities. We therefore propose to "discretize" the visual representation by… ▽ More Language modality within the vision language pretraining framework is innately discretized, endowing each word in the language vocabulary a semantic meaning. In contrast, visual modality is inherently continuous and high-dimensional, which potentially prohibits the alignment as well as fusion between vision and language modalities. We therefore propose to "discretize" the visual representation by joint learning a codebook that imbues each visual token a semantic. We then utilize these discretized visual semantics as self-supervised ground-truths for building our Masked Image Modeling objective, a counterpart of Masked Language Modeling which proves successful for language models. To optimize the codebook, we extend the formulation of VQ-VAE which gives a theoretic guarantee. Experiments validate the effectiveness of our approach across common vision-language benchmarks. △ Less

Submitted 31 July, 2022; originally announced August 2022.

Comments: 7 pages, 4 figures, ICPR2022. arXiv admin note: text overlap with arXiv:2203.00048

arXiv:2207.04846 [pdf]

Fitness Dependent Optimizer for IoT Healthcare using Adapted Parameters: A Case Study Implementation

Authors: Aso M. Aladdin, Jaza M. Abdullah, Kazhan Othman Mohammed Salih, Tarik A. Rashid, Rafid Sagban, Abeer Alsaddon, Nebojsa Bacanin, Amit Chhabra, S. Vimal, Indradip Banerjee

Abstract: This discusses a case study on Fitness Dependent Optimizer or so-called FDO and adapting its parameters to the Internet of Things (IoT) healthcare. The reproductive way is sparked by the bee swarm and the collaborative decision-making of FDO. As opposed to the honey bee or artificial bee colony algorithms, this algorithm has no connection to them. In FDO, the search agent's position is updated usi… ▽ More This discusses a case study on Fitness Dependent Optimizer or so-called FDO and adapting its parameters to the Internet of Things (IoT) healthcare. The reproductive way is sparked by the bee swarm and the collaborative decision-making of FDO. As opposed to the honey bee or artificial bee colony algorithms, this algorithm has no connection to them. In FDO, the search agent's position is updated using speed or velocity, but it's done differently. It creates weights based on the fitness function value of the problem, which assists lead the agents through the exploration and exploitation processes. Other algorithms are evaluated and compared to FDO as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) in the original work. The key current algorithms:The Salp-Swarm Algorithms (SSA), Dragonfly Algorithm (DA), and Whale Optimization Algorithm (WOA) have been evaluated against FDO in terms of their results. Using these FDO experimental findings, we may conclude that FDO outperforms the other techniques stated. There are two primary goals for this chapter: first, the implementation of FDO will be shown step-by-step so that readers can better comprehend the algorithm method and apply FDO to solve real-world applications quickly. The second issue deals with how to tweak the FDO settings to make the meta-heuristic evolutionary algorithm better in the IoT health service system at evaluating big quantities of information. Ultimately, the target of this chapter's enhancement is to adapt the IoT healthcare framework based on FDO to spawn effective IoT healthcare applications for reasoning out real-world optimization, aggregation, prediction, segmentation, and other technological problems. △ Less

Submitted 18 May, 2022; originally announced July 2022.

Comments: 17 pages

Journal ref: -

arXiv:2207.00066 [pdf]

Advances in Prediction of Readmission Rates Using Long Term Short Term Memory Networks on Healthcare Insurance Data

Authors: Shuja Khalid, Francisco Matos, Ayman Abunimer, Joel Bartlett, Richard Duszak, Michal Horny, Judy Gichoya, Imon Banerjee, Hari Trivedi

Abstract: 30-day hospital readmission is a long standing medical problem that affects patients' morbidity and mortality and costs billions of dollars annually. Recently, machine learning models have been created to predict risk of inpatient readmission for patients with specific diseases, however no model exists to predict this risk across all patients. We developed a bi-directional Long Short Term Memory (… ▽ More 30-day hospital readmission is a long standing medical problem that affects patients' morbidity and mortality and costs billions of dollars annually. Recently, machine learning models have been created to predict risk of inpatient readmission for patients with specific diseases, however no model exists to predict this risk across all patients. We developed a bi-directional Long Short Term Memory (LSTM) Network that is able to use readily available insurance data (inpatient visits, outpatient visits, and drug prescriptions) to predict 30 day re-admission for any admitted patient, regardless of reason. The top-performing model achieved an ROC AUC of 0.763 (0.011) when using historical, inpatient, and post-discharge data. The LSTM model significantly outperformed a baseline random forest classifier, indicating that understanding the sequence of events is important for model prediction. Incorporation of 30-days of historical data also significantly improved model performance compared to inpatient data alone, indicating that a patients clinical history prior to admission, including outpatient visits and pharmacy data is a strong contributor to readmission. Our results demonstrate that a machine learning model is able to predict risk of inpatient readmission with reasonable accuracy for all patients using structured insurance billing data. Because billing data or equivalent surrogates can be extracted from sites, such a model could be deployed to identify patients at risk for readmission before they are discharged, or to assign more robust follow up (closer follow up, home health, mailed medications) to at-risk patients after discharge. △ Less

Submitted 30 June, 2022; originally announced July 2022.

Comments: 7 pages, 3 figures, 3 tables

arXiv:2205.06885 [pdf]

PathologyBERT -- Pre-trained Vs. A New Transformer Language Model for Pathology Domain

Authors: Thiago Santos, Amara Tariq, Susmita Das, Kavyasree Vayalpati, Geoffrey H. Smith, Hari Trivedi, Imon Banerjee

Abstract: Pathology text mining is a challenging task given the reporting variability and constant new findings in cancer sub-type definitions. However, successful text mining of a large pathology database can play a critical role to advance 'big data' cancer research like similarity-based treatment selection, case identification, prognostication, surveillance, clinical trial screening, risk stratification,… ▽ More Pathology text mining is a challenging task given the reporting variability and constant new findings in cancer sub-type definitions. However, successful text mining of a large pathology database can play a critical role to advance 'big data' cancer research like similarity-based treatment selection, case identification, prognostication, surveillance, clinical trial screening, risk stratification, and many others. While there is a growing interest in developing language models for more specific clinical domains, no pathology-specific language space exist to support the rapid data-mining development in pathology space. In literature, a few approaches fine-tuned general transformer models on specialized corpora while maintaining the original tokenizer, but in fields requiring specialized terminology, these models often fail to perform adequately. We propose PathologyBERT - a pre-trained masked language model which was trained on 347,173 histopathology specimen reports and publicly released in the Huggingface repository. Our comprehensive experiments demonstrate that pre-training of transformer model on pathology corpora yields performance improvements on Natural Language Understanding (NLU) and Breast Cancer Diagnose Classification when compared to nonspecific language models. △ Less

Submitted 13 May, 2022; originally announced May 2022.

Comments: submitted to "American Medical Informatics Association (AMIA)" 2022 Annual Symposium

arXiv:2204.06766 [pdf]

doi 10.1109/JBHI.2023.3236888

Multimodal spatiotemporal graph neural networks for improved prediction of 30-day all-cause hospital readmission

Authors: Siyi Tang, Amara Tariq, Jared Dunnmon, Umesh Sharma, Praneetha Elugunti, Daniel Rubin, Bhavik N. Patel, Imon Banerjee

Abstract: Measures to predict 30-day readmission are considered an important quality factor for hospitals as accurate predictions can reduce the overall cost of care by identifying high risk patients before they are discharged. While recent deep learning-based studies have shown promising empirical results on readmission prediction, several limitations exist that may hinder widespread clinical utility, such… ▽ More Measures to predict 30-day readmission are considered an important quality factor for hospitals as accurate predictions can reduce the overall cost of care by identifying high risk patients before they are discharged. While recent deep learning-based studies have shown promising empirical results on readmission prediction, several limitations exist that may hinder widespread clinical utility, such as (a) only patients with certain conditions are considered, (b) existing approaches do not leverage data temporality, (c) individual admissions are assumed independent of each other, which is unrealistic, (d) prior studies are usually limited to single source of data and single center data. To address these limitations, we propose a multimodal, modality-agnostic spatiotemporal graph neural network (MM-STGNN) for prediction of 30-day all-cause hospital readmission that fuses multimodal in-patient longitudinal data. By training and evaluating our methods using longitudinal chest radiographs and electronic health records from two independent centers, we demonstrate that MM-STGNN achieves AUROC of 0.79 on both primary and external datasets. Furthermore, MM-STGNN significantly outperforms the current clinical reference standard, LACE+ score (AUROC=0.61), on the primary dataset. For subset populations of patients with heart and vascular disease, our model also outperforms baselines on predicting 30-day readmission (e.g., 3.7 point improvement in AUROC in patients with heart disease). Lastly, qualitative model interpretability analysis indicates that while patients' primary diagnoses were not explicitly used to train the model, node features crucial for model prediction directly reflect patients' primary diagnoses. Importantly, our MM-STGNN is agnostic to node feature modalities and could be utilized to integrate multimodal data for triaging patients in various downstream resource allocation tasks. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Journal ref: IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 4, pp. 2071-2082, April 2023

arXiv:2204.03074 [pdf, other]

OSCARS: An Outlier-Sensitive Content-Based Radiography Retrieval System

Authors: Xiaoyuan Guo, Jiali Duan, Saptarshi Purkayastha, Hari Trivedi, Judy Wawira Gichoya, Imon Banerjee

Abstract: Improving the retrieval relevance on noisy datasets is an emerging need for the curation of a large-scale clean dataset in the medical domain. While existing methods can be applied for class-wise retrieval (aka. inter-class), they cannot distinguish the granularity of likeness within the same class (aka. intra-class). The problem is exacerbated on medical external datasets, where noisy samples of… ▽ More Improving the retrieval relevance on noisy datasets is an emerging need for the curation of a large-scale clean dataset in the medical domain. While existing methods can be applied for class-wise retrieval (aka. inter-class), they cannot distinguish the granularity of likeness within the same class (aka. intra-class). The problem is exacerbated on medical external datasets, where noisy samples of the same class are treated equally during training. Our goal is to identify both intra/inter-class similarities for fine-grained retrieval. To achieve this, we propose an Outlier-Sensitive Content-based rAdiologhy Retrieval System (OSCARS), consisting of two steps. First, we train an outlier detector on a clean internal dataset in an unsupervised manner. Then we use the trained detector to generate the anomaly scores on the external dataset, whose distribution will be used to bin intra-class variations. Second, we propose a quadruplet (a, p, nintra, ninter) sampling strategy, where intra-class negatives nintra are sampled from bins of the same class other than the bin anchor a belongs to, while niner are randomly sampled from inter-classes. We suggest a weighted metric learning objective to balance the intra and inter-class feature learning. We experimented on two representative public radiography datasets. Experiments show the effectiveness of our approach. The training and evaluation code can be found in https://github.com/XiaoyuanGuo/oscars. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: 12 pages, 6 figures, 2 tables

arXiv:2202.04073 [pdf]

The EMory BrEast imaging Dataset (EMBED): A Racially Diverse, Granular Dataset of 3.5M Screening and Diagnostic Mammograms

Authors: Jiwoong J. Jeong, Brianna L. Vey, Ananth Reddy, Thomas Kim, Thiago Santos, Ramon Correa, Raman Dutt, Marina Mosunjac, Gabriela Oprea-Ilies, Geoffrey Smith, Minjae Woo, Christopher R. McAdams, Mary S. Newell, Imon Banerjee, Judy Gichoya, Hari Trivedi

Abstract: Developing and validating artificial intelligence models in medical imaging requires datasets that are large, granular, and diverse. To date, the majority of publicly available breast imaging datasets lack in one or more of these areas. Models trained on these data may therefore underperform on patient populations or pathologies that have not previously been encountered. The EMory BrEast imaging D… ▽ More Developing and validating artificial intelligence models in medical imaging requires datasets that are large, granular, and diverse. To date, the majority of publicly available breast imaging datasets lack in one or more of these areas. Models trained on these data may therefore underperform on patient populations or pathologies that have not previously been encountered. The EMory BrEast imaging Dataset (EMBED) addresses these gaps by providing 3650,000 2D and DBT screening and diagnostic mammograms for 116,000 women divided equally between White and African American patients. The dataset also contains 40,000 annotated lesions linked to structured imaging descriptors and 61 ground truth pathologic outcomes grouped into six severity classes. Our goal is to share this dataset with research partners to aid in development and validation of breast AI models that will serve all patients fairly and help decrease bias in medical AI. △ Less

Submitted 8 February, 2022; originally announced February 2022.

arXiv:2112.13885 [pdf, other]

MedShift: identifying shift data for medical dataset curation

Authors: Xiaoyuan Guo, Judy Wawira Gichoya, Hari Trivedi, Saptarshi Purkayastha, Imon Banerjee

Abstract: To curate a high-quality dataset, identifying data variance between the internal and external sources is a fundamental and crucial step. However, methods to detect shift or variance in data have not been significantly researched. Challenges to this are the lack of effective approaches to learn dense representation of a dataset and difficulties of sharing private data across medical institutions. T… ▽ More To curate a high-quality dataset, identifying data variance between the internal and external sources is a fundamental and crucial step. However, methods to detect shift or variance in data have not been significantly researched. Challenges to this are the lack of effective approaches to learn dense representation of a dataset and difficulties of sharing private data across medical institutions. To overcome the problems, we propose a unified pipeline called MedShift to detect the top-level shift samples and thus facilitate the medical curation. Given an internal dataset A as the base source, we first train anomaly detectors for each class of dataset A to learn internal distributions in an unsupervised way. Second, without exchanging data across sources, we run the trained anomaly detectors on an external dataset B for each class. The data samples with high anomaly scores are identified as shift data. To quantify the shiftness of the external dataset, we cluster B's data into groups class-wise based on the obtained scores. We then train a multi-class classifier on A and measure the shiftness with the classifier's performance variance on B by gradually dropping the group with the largest anomaly score for each class. Additionally, we adapt a dataset quality metric to help inspect the distribution differences for multiple medical sources. We verify the efficacy of MedShift with musculoskeletal radiographs (MURA) and chest X-rays datasets from more than one external source. Experiments show our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently. An interface introduction video to visualize our results is available at https://youtu.be/V3BF0P1sxQE. △ Less

Submitted 27 December, 2021; originally announced December 2021.

Comments: 35 pages, 28 figures, 2 tables

arXiv:2111.11665 [pdf, other]

RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR

Authors: Yuyin Zhou, Shih-Cheng Huang, Jason Alan Fries, Alaa Youssef, Timothy J. Amrhein, Marcello Chang, Imon Banerjee, Daniel Rubin, Lei Xing, Nigam Shah, Matthew P. Lungren

Abstract: Despite the routine use of electronic health record (EHR) data by radiologists to contextualize clinical history and inform image interpretation, the majority of deep learning architectures for medical imaging are unimodal, i.e., they only learn features from pixel-level information. Recent research revealing how race can be recovered from pixel data alone highlights the potential for serious bias… ▽ More Despite the routine use of electronic health record (EHR) data by radiologists to contextualize clinical history and inform image interpretation, the majority of deep learning architectures for medical imaging are unimodal, i.e., they only learn features from pixel-level information. Recent research revealing how race can be recovered from pixel data alone highlights the potential for serious biases in models which fail to account for demographics and other key patient attributes. Yet the lack of imaging datasets which capture clinical context, inclusive of demographics and longitudinal medical history, has left multimodal medical imaging underexplored. To better assess these challenges, we present RadFusion, a multimodal, benchmark dataset of 1794 patients with corresponding EHR data and high-resolution computed tomography (CT) scans labeled for pulmonary embolism. We evaluate several representative multimodal fusion models and benchmark their fairness properties across protected subgroups, e.g., gender, race/ethnicity, age. Our results suggest that integrating imaging and EHR data can improve classification performance and robustness without introducing large disparities in the true positive rate between population groups. △ Less

Submitted 26 November, 2021; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: RadFusion dataset: https://stanfordaimi.azurewebsites.net/datasets/3a7548a4-8f65-4ab7-85fa-3d68c9efc1bd

arXiv:2111.08711 [pdf, other]

Two-step adversarial debiasing with partial learning -- medical image case-studies

Authors: Ramon Correa, Jiwoong Jason Jeong, Bhavik Patel, Hari Trivedi, Judy W. Gichoya, Imon Banerjee

Abstract: The use of artificial intelligence (AI) in healthcare has become a very active research area in the last few years. While significant progress has been made in image classification tasks, only a few AI methods are actually being deployed in hospitals. A major hurdle in actively using clinical AI models currently is the trustworthiness of these models. More often than not, these complex models are… ▽ More The use of artificial intelligence (AI) in healthcare has become a very active research area in the last few years. While significant progress has been made in image classification tasks, only a few AI methods are actually being deployed in hospitals. A major hurdle in actively using clinical AI models currently is the trustworthiness of these models. More often than not, these complex models are black boxes in which promising results are generated. However, when scrutinized, these models begin to reveal implicit biases during the decision making, such as detecting race and having bias towards ethnic groups and subpopulations. In our ongoing study, we develop a two-step adversarial debiasing approach with partial learning that can reduce the racial disparity while preserving the performance of the targeted task. The methodology has been evaluated on two independent medical image case-studies - chest X-ray and mammograms, and showed promises in bias reduction while preserving the targeted performance. △ Less

Submitted 16 November, 2021; originally announced November 2021.

arXiv:2110.15811 [pdf, other]

CVAD: A generic medical anomaly detector based on Cascade VAE

Authors: Xiaoyuan Guo, Judy Wawira Gichoya, Saptarshi Purkayastha, Imon Banerjee

Abstract: Detecting out-of-distribution (OOD) samples in medical imaging plays an important role for downstream medical diagnosis. However, existing OOD detectors are demonstrated on natural images composed of inter-classes and have difficulty generalizing to medical images. The key issue is the granularity of OOD data in the medical domain, where intra-class OOD samples are predominant. We focus on the gen… ▽ More Detecting out-of-distribution (OOD) samples in medical imaging plays an important role for downstream medical diagnosis. However, existing OOD detectors are demonstrated on natural images composed of inter-classes and have difficulty generalizing to medical images. The key issue is the granularity of OOD data in the medical domain, where intra-class OOD samples are predominant. We focus on the generalizability of OOD detection for medical images and propose a self-supervised Cascade Variational autoencoder-based Anomaly Detector (CVAD). We use a variational autoencoders' cascade architecture, which combines latent representation at multiple scales, before being fed to a discriminator to distinguish the OOD data from the in-distribution (ID) data. Finally, both the reconstruction error and the OOD probability predicted by the binary discriminator are used to determine the anomalies. We compare the performance with the state-of-the-art deep learning models to demonstrate our model's efficacy on various open-access medical imaging datasets for both intra- and inter-class OOD. Further extensive results on datasets including common natural datasets show our model's effectiveness and generalizability. The code is available at https://github.com/XiaoyuanGuo/CVAD. △ Less

Submitted 26 January, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

Comments: 6 pages, 4 figures, 4 tables

arXiv:2108.00117 [pdf, other]

Margin-Aware Intra-Class Novelty Identification for Medical Images

Authors: Xiaoyuan Guo, Judy Wawira Gichoya, Saptarshi Purkayastha, Imon Banerjee

Abstract: Traditional anomaly detection methods focus on detecting inter-class variations while medical image novelty identification is inherently an intra-class detection problem. For example, a machine learning model trained with normal chest X-ray and common lung abnormalities, is expected to discover and flag idiopathic pulmonary fibrosis which a rare lung disease and unseen by the model during training… ▽ More Traditional anomaly detection methods focus on detecting inter-class variations while medical image novelty identification is inherently an intra-class detection problem. For example, a machine learning model trained with normal chest X-ray and common lung abnormalities, is expected to discover and flag idiopathic pulmonary fibrosis which a rare lung disease and unseen by the model during training. The nuances from intra-class variations and lack of relevant training data in medical image analysis pose great challenges for existing anomaly detection methods. To tackle the challenges, we propose a hybrid model - Transformation-based Embedding learning for Novelty Detection (TEND) which without any out-of-distribution training data, performs novelty identification by combining both autoencoder-based and classifier-based method. With a pre-trained autoencoder as image feature extractor, TEND learns to discriminate the feature embeddings of in-distribution data from the transformed counterparts as fake out-of-distribution inputs. To enhance the separation, a distance objective is optimized to enforce a margin between the two classes. Extensive experimental results on both natural image datasets and medical image datasets are presented and our method out-performs state-of-the-art approaches. △ Less

Submitted 22 January, 2022; v1 submitted 30 July, 2021; originally announced August 2021.

Comments: 35 pages, 8 figures

Journal ref: Journal of Medical Imaging 2022

arXiv:2107.10356 [pdf]

doi 10.1016/S2589-7500(22)00063-2

Reading Race: AI Recognises Patient's Racial Identity In Medical Images

Authors: Imon Banerjee, Ananth Reddy Bhimireddy, John L. Burns, Leo Anthony Celi, Li-Ching Chen, Ramon Correa, Natalie Dullerud, Marzyeh Ghassemi, Shih-Cheng Huang, Po-Chih Kuo, Matthew P Lungren, Lyle Palmer, Brandon J Price, Saptarshi Purkayastha, Ayis Pyrros, Luke Oakden-Rayner, Chima Okechukwu, Laleh Seyyed-Kalantari, Hari Trivedi, Ryan Wang, Zachary Zaiman, Haoran Zhang, Judy W Gichoya

Abstract: Background: In medical imaging, prior studies have demonstrated disparate AI performance by race, yet there is no known correlation for race on medical imaging that would be obvious to the human expert interpreting the images. Methods: Using private and public datasets we evaluate: A) performance quantification of deep learning models to detect race from medical images, including the ability of… ▽ More Background: In medical imaging, prior studies have demonstrated disparate AI performance by race, yet there is no known correlation for race on medical imaging that would be obvious to the human expert interpreting the images. Methods: Using private and public datasets we evaluate: A) performance quantification of deep learning models to detect race from medical images, including the ability of these models to generalize to external environments and across multiple imaging modalities, B) assessment of possible confounding anatomic and phenotype population features, such as disease distribution and body habitus as predictors of race, and C) investigation into the underlying mechanism by which AI models can recognize race. Findings: Standard deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities. Our findings hold under external validation conditions, as well as when models are optimized to perform clinically motivated tasks. We demonstrate this detection is not due to trivial proxies or imaging-related surrogate covariates for race, such as underlying disease distribution. Finally, we show that performance persists over all anatomical regions and frequency spectrum of the images suggesting that mitigation efforts will be challenging and demand further study. Interpretation: We emphasize that model ability to predict self-reported race is itself not the issue of importance. However, our findings that AI can trivially predict self-reported race -- even from corrupted, cropped, and noised medical images -- in a setting where clinical experts cannot, creates an enormous risk for all model deployments in medical imaging: if an AI model secretly used its knowledge of self-reported race to misclassify all Black patients, radiologists would not be able to tell using the same data the model has access to. △ Less

Submitted 21 July, 2021; originally announced July 2021.

MSC Class: 68-XX ACM Class: I.2

arXiv:2012.09135 [pdf]

doi 10.3390/math8122171

An Improved Simulation Model for Pedestrian Crowd Evacuation

Authors: Danial A. Muhammed, Tarik A. Rashid, Abeer Alsadoon, Nebojsa Bacanin, Polla Fattah, Mokhtar Mohammadi, Indradip Banerjee

Abstract: This paper works on one of the most recent pedestrian crowd evacuation models, i.e., "a simulation model for pedestrian crowd evacuation based on various AI techniques", developed in late 2019. This study adds a new feature to the developed model by proposing a new method and integrating it with the model. This method enables the developed model to find a more appropriate evacuation area design, a… ▽ More This paper works on one of the most recent pedestrian crowd evacuation models, i.e., "a simulation model for pedestrian crowd evacuation based on various AI techniques", developed in late 2019. This study adds a new feature to the developed model by proposing a new method and integrating it with the model. This method enables the developed model to find a more appropriate evacuation area design, among others regarding safety due to selecting the best exit door location among many suggested locations. This method is completely dependent on the selected model's output, i.e., the evacuation time for each individual within the evacuation process. The new method finds an average of the evacuees' evacuation times of each exit door location; then, based on the average evacuation time, it decides which exit door location would be the best exit door to be used for evacuation by the evacuees. To validate the method, various designs for the evacuation area with various written scenarios were used. The results showed that the model with this new method could predict a proper exit door location among many suggested locations. Lastly, from the results of this research using the integration of this newly proposed method, a new capability for the selected model in terms of safety allowed the right decision in selecting the finest design for the evacuation area among other designs. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: 15 pages, accepted in Mathematics, MDPI, 2020

arXiv:2007.05786 [pdf, other]

Generalization of Deep Convolutional Neural Networks -- A Case-study on Open-source Chest Radiographs

Authors: Nazanin Mashhaditafreshi, Amara Tariq, Judy Wawira Gichoya, Imon Banerjee

Abstract: Deep Convolutional Neural Networks (DCNNs) have attracted extensive attention and been applied in many areas, including medical image analysis and clinical diagnosis. One major challenge is to conceive a DCNN model with remarkable performance on both internal and external data. We demonstrate that DCNNs may not generalize to new data, but increasing the quality and heterogeneity of the training da… ▽ More Deep Convolutional Neural Networks (DCNNs) have attracted extensive attention and been applied in many areas, including medical image analysis and clinical diagnosis. One major challenge is to conceive a DCNN model with remarkable performance on both internal and external data. We demonstrate that DCNNs may not generalize to new data, but increasing the quality and heterogeneity of the training data helps to improve the generalizibility factor. We use InceptionResNetV2 and DenseNet121 architectures to predict the risk of 5 common chest pathologies. The experiments were conducted on three publicly available databases: CheXpert, ChestX-ray14, and MIMIC Chest Xray JPG. The results show the internal performance of each of the 5 pathologies outperformed external performance on both of the models. Moreover, our strategy of exposing the models to a mix of different datasets during the training phase helps to improve model performance on the external dataset. △ Less

Submitted 11 July, 2020; originally announced July 2020.

arXiv:2006.13262 [pdf]

Was there COVID-19 back in 2012? Challenge for AI in Diagnosis with Similar Indications

Authors: Imon Banerjee, Priyanshu Sinha, Saptarshi Purkayastha, Nazanin Mashhaditafreshi, Amara Tariq, Jiwoong Jeong, Hari Trivedi, Judy W. Gichoya

Abstract: Purpose: Since the recent COVID-19 outbreak, there has been an avalanche of research papers applying deep learning based image processing to chest radiographs for detection of the disease. To test the performance of the two top models for CXR COVID-19 diagnosis on external datasets to assess model generalizability. Methods: In this paper, we present our argument regarding the efficiency and applic… ▽ More Purpose: Since the recent COVID-19 outbreak, there has been an avalanche of research papers applying deep learning based image processing to chest radiographs for detection of the disease. To test the performance of the two top models for CXR COVID-19 diagnosis on external datasets to assess model generalizability. Methods: In this paper, we present our argument regarding the efficiency and applicability of existing deep learning models for COVID-19 diagnosis. We provide results from two popular models - COVID-Net and CoroNet evaluated on three publicly available datasets and an additional institutional dataset collected from EMORY Hospital between January and May 2020, containing patients tested for COVID-19 infection using RT-PCR. Results: There is a large false positive rate (FPR) for COVID-Net on both ChexPert (55.3%) and MIMIC-CXR (23.4%) dataset. On the EMORY Dataset, COVID-Net has 61.4% sensitivity, 0.54 F1-score and 0.49 precision value. The FPR of the CoroNet model is significantly lower across all the datasets as compared to COVID-Net - EMORY(9.1%), ChexPert (1.3%), ChestX-ray14 (0.02%), MIMIC-CXR (0.06%). Conclusion: The models reported good to excellent performance on their internal datasets, however we observed from our testing that their performance dramatically worsened on external data. This is likely from several causes including overfitting models due to lack of appropriate control patients and ground truth labels. The fourth institutional dataset was labeled using RT-PCR, which could be positive without radiographic findings and vice versa. Therefore, a fusion model of both clinical and radiographic data may have better performance and generalization. △ Less

Submitted 23 June, 2020; originally announced June 2020.

arXiv:2006.02825 [pdf, other]

SOS -- Self-Organization for Survival: Introducing fairness in emergency communication to save lives

Authors: Indushree Banerjee, Martijn Warnier, Frances M. T. Brazier, Dirk Helbing

Abstract: Communication is crucial when disasters isolate communities of people and rescue is delayed. Such delays force citizens to be first responders and form small rescue teams. Rescue teams require reliable communication, particularly in the first 72 hours, which is challenging due to damaged infrastructure and electrical blackouts. We design a peer-to-peer communication network that meets these challe… ▽ More Communication is crucial when disasters isolate communities of people and rescue is delayed. Such delays force citizens to be first responders and form small rescue teams. Rescue teams require reliable communication, particularly in the first 72 hours, which is challenging due to damaged infrastructure and electrical blackouts. We design a peer-to-peer communication network that meets these challenges. We introduce the concept of participatory fairness: equal communication opportunities for all citizens regardless of initial inequality in phone battery charge. Our value-sensitive design approach achieves an even battery charge distribution across phones over time and enables citizens to communicate over 72 hours. We apply the fairness principle to communication in an adapted standard Barabasi-Albert model of a scale-free network that automatically (i) assigns high-battery phones as hubs, (ii) adapts the network topology to the spatio-temporal battery charge distribution, and (iii) self-organizes to remain robust and reliable when links fail or phones leave the network. While the Barabasi-Albert model has become a widespread descriptive model, we demonstrate its use as a design principle to meet values such as fairness and systemic efficiency. Our results demonstrate that, compared to a generic peer-to-peer mesh network, the new protocol achieves (i) a longer network lifetime, (ii) an adaptive information flow, (iii) a fair distribution of battery charge, and (iv) higher participation rates. Hence, our protocol, Self-Organization for Survival ('SOS'), provides fair communication opportunities to all citizens during a disaster through self-organization. SOS enables participatory resilience and sustainability, empowering citizens to communicate when they need it most. △ Less

Submitted 4 June, 2020; originally announced June 2020.

arXiv:2004.07965 [pdf, other]

doi 10.1007/s10278-021-00491-w

A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology Images

Authors: Pradeeban Kathiravelu, Puneet Sharma, Ashish Sharma, Imon Banerjee, Hari Trivedi, Saptarshi Purkayastha, Priyanshu Sinha, Alexandre Cadrin-Chenevert, Nabile Safdar, Judy Wawira Gichoya

Abstract: Executing machine learning (ML) pipelines in real-time on radiology images is hard due to the limited computing resources in clinical environments and the lack of efficient data transfer capabilities to run them on research clusters. We propose Niffler, an integrated framework that enables the execution of ML pipelines at research clusters by efficiently querying and retrieving radiology images fr… ▽ More Executing machine learning (ML) pipelines in real-time on radiology images is hard due to the limited computing resources in clinical environments and the lack of efficient data transfer capabilities to run them on research clusters. We propose Niffler, an integrated framework that enables the execution of ML pipelines at research clusters by efficiently querying and retrieving radiology images from the Picture Archiving and Communication Systems (PACS) of the hospitals. Niffler uses the Digital Imaging and Communications in Medicine (DICOM) protocol to fetch and store imaging data and provides metadata extraction capabilities and Application programming interfaces (APIs) to apply filters on the images. Niffler further enables the sharing of the outcomes from the ML pipelines in a de-identified manner. Niffler has been running stable for more than 19 months and has supported several research projects at the department. In this paper, we present its architecture and three of its use cases: an inferior vena cava (IVC) filter detection from the images in real-time, identification of scanner utilization, and scanner clock calibration. Evaluations on the Niffler prototype highlight its feasibility and efficiency in facilitating the ML pipelines on the images and metadata in real-time and retrospectively. △ Less

Submitted 5 August, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

Comments: Preprint

Journal ref: Journal of Digital Imaging (JDI), 2021

arXiv:1902.10700 [pdf]

A Deep-learning Approach for Prognosis of Age-Related Macular Degeneration Disease using SD-OCT Imaging Biomarkers

Authors: Imon Banerjee, Luis de Sisternes, Joelle Hallak, Theodore Leng, Aaron Osborne, Mary Durbin, Daniel Rubin

Abstract: We propose a hybrid sequential deep learning model to predict the risk of AMD progression in non-exudative AMD eyes at multiple timepoints, starting from short-term progression (3-months) up to long-term progression (21-months). Proposed model combines radiomics and deep learning to handle challenges related to imperfect ratio of OCT scan dimension and training cohort size. We considered a retrosp… ▽ More We propose a hybrid sequential deep learning model to predict the risk of AMD progression in non-exudative AMD eyes at multiple timepoints, starting from short-term progression (3-months) up to long-term progression (21-months). Proposed model combines radiomics and deep learning to handle challenges related to imperfect ratio of OCT scan dimension and training cohort size. We considered a retrospective clinical trial dataset that includes 671 fellow eyes with 13,954 dry AMD observations for training and validating the machine learning models on a 10-fold cross validation setting. The proposed RNN model achieved high accuracy (0.96 AUCROC) for the prediction of both short term and long-term AMD progression, and outperformed the traditional random forest model trained. High accuracy achieved by the RNN establishes the ability to identify AMD patients at risk of progressing to advanced AMD at an early stage which could have a high clinical impact as it allows for optimal clinical follow-up, with more frequent screening and potential earlier treatment for those patients at high risk. △ Less

Submitted 27 February, 2019; originally announced February 2019.

arXiv:1806.07346 [pdf]

A Scalable Machine Learning Approach for Inferring Probabilistic US-LI-RADS Categorization

Authors: Imon Banerjee, Hailey H. Choi, Terry Desser, Daniel L. Rubin

Abstract: We propose a scalable computerized approach for large-scale inference of Liver Imaging Reporting and Data System (LI-RADS) final assessment categories in narrative ultrasound (US) reports. Although our model was trained on reports created using a LI-RADS template, it was also able to infer LI-RADS scoring for unstructured reports that were created before the LI-RADS guidelines were established. No… ▽ More We propose a scalable computerized approach for large-scale inference of Liver Imaging Reporting and Data System (LI-RADS) final assessment categories in narrative ultrasound (US) reports. Although our model was trained on reports created using a LI-RADS template, it was also able to infer LI-RADS scoring for unstructured reports that were created before the LI-RADS guidelines were established. No human-labelled data was required in any step of this study; for training, LI-RADS scores were automatically extracted from those reports that contained structured LI-RADS scores, and it translated the derived knowledge to reasoning on unstructured radiology reports. By providing automated LI-RADS categorization, our approach may enable standardizing screening recommendations and treatment planning of patients at risk for hepatocellular carcinoma, and it may facilitate AI-based healthcare research with US images by offering large scale text mining and data gathering opportunities from standard hospital clinical data repositories. △ Less

Submitted 15 June, 2018; originally announced June 2018.

Comments: AMIA Annual Symposium 2018 (accepted)

arXiv:1801.03058 [pdf]

Abstract: Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients

Authors: Imon Banerjee, Michael Francis Gensheimer, Douglas J. Wood, Solomon Henry, Daniel Chang, Daniel L. Rubin

Abstract: We propose a deep learning model - Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) for estimating short-term life expectancy (3 months) of the patients by analyzing free-text clinical notes in the electronic medical record, while maintaining the temporal visit sequence. In a single framework, we integrated semantic data mapping and neural embedding technique… ▽ More We propose a deep learning model - Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) for estimating short-term life expectancy (3 months) of the patients by analyzing free-text clinical notes in the electronic medical record, while maintaining the temporal visit sequence. In a single framework, we integrated semantic data mapping and neural embedding technique to produce a text processing method that extracts relevant information from heterogeneous types of clinical notes in an unsupervised manner, and we designed a recurrent neural network to model the temporal dependency of the patient visits. The model was trained on a large dataset (10,293 patients) and validated on a separated dataset (1818 patients). Our method achieved an area under the ROC curve (AUC) of 0.89. To provide explain-ability, we developed an interactive graphical tool that may improve physician understanding of the basis for the model's predictions. The high accuracy and explain-ability of the PPES-Met model may enable our model to be used as a decision support tool to personalize metastatic cancer treatment and provide valuable assistance to the physicians. △ Less

Submitted 13 July, 2018; v1 submitted 9 January, 2018; originally announced January 2018.

Journal ref: AMIA Informatics Conference 2018

arXiv:1711.06968 [pdf, other]

Intelligent Word Embeddings of Free-Text Radiology Reports

Authors: Imon Banerjee, Sriraman Madhavan, Roger Eric Goldman, Daniel L. Rubin

Abstract: Radiology reports are a rich resource for advancing deep learning applications in medicine by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the ambiguity and subtlety of natural language. We propose a hybrid strategy that combines semantic-dictionary mapping and word2vec modeling for creatin… ▽ More Radiology reports are a rich resource for advancing deep learning applications in medicine by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the ambiguity and subtlety of natural language. We propose a hybrid strategy that combines semantic-dictionary mapping and word2vec modeling for creating dense vector embeddings of free-text radiology reports. Our method leverages the benefits of both semantic-dictionary mapping as well as unsupervised learning. Using the vector representation, we automatically classify the radiology reports into three classes denoting confidence in the diagnosis of intracranial hemorrhage by the interpreting radiologist. We performed experiments with varying hyperparameter settings of the word embeddings and a range of different classifiers. Best performance achieved was a weighted precision of 88% and weighted recall of 90%. Our work offers the potential to leverage unstructured electronic health record data by allowing direct analysis of narrative clinical notes. △ Less

Submitted 19 November, 2017; originally announced November 2017.

Comments: AMIA Annual Symposium 2017

arXiv:1709.02477 [pdf, other]

Inferring Generative Model Structure with Static Analysis

Authors: Paroma Varma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin, Christopher Ré

Abstract: Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects training label quality, but is difficult to learn without any ground truth labels. We instead rely on these weak supervision sources h… ▽ More Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects training label quality, but is difficult to learn without any ground truth labels. We instead rely on these weak supervision sources having some structure by virtue of being encoded programmatically. We present Coral, a paradigm that infers generative model structure by statically analyzing the code for these heuristics, thus reducing the data required to learn structure significantly. We prove that Coral's sample complexity scales quasilinearly with the number of heuristics and number of relations found, improving over the standard sample complexity, which is exponential in $n$ for identifying $n^{\textrm{th}}$ degree relations. Experimentally, Coral matches or outperforms traditional structure learning approaches by up to 3.81 F1 points. Using Coral to model dependencies instead of assuming independence results in better performance than a fully supervised model by 3.07 accuracy points when heuristics are used to label radiology data without ground truth labels. △ Less

Submitted 7 September, 2017; originally announced September 2017.

Comments: NIPS 2017

arXiv:1706.09355 [pdf, other]

New Results On Routing Via Matchings On Graphs

Authors: Indranil Banerjee, Dana Richards

Abstract: In this paper we present some new complexity results on the routing time of a graph under the \textit{routing via matching} model. This is a parallel routing model which was introduced by Alon et al\cite{alon1994routing}. The model can be viewed as a communication scheme on a distributed network. The nodes in the network can communicate via matchings (a step), where a node exchanges data (pebbles)… ▽ More In this paper we present some new complexity results on the routing time of a graph under the \textit{routing via matching} model. This is a parallel routing model which was introduced by Alon et al\cite{alon1994routing}. The model can be viewed as a communication scheme on a distributed network. The nodes in the network can communicate via matchings (a step), where a node exchanges data (pebbles) with its matched partner. Let $G$ be a connected graph with vertices labeled from $\{1,...,n\}$ and the destination vertices of the pebbles are given by a permutation $π$. The problem is to find a minimum step routing scheme for the input permutation $π$. This is denoted as the routing time $rt(G,π)$ of $G$ given $π$. In this paper we characterize the complexity of some known problems under the routing via matching model and discuss their relationship to graph connectivity and clique number. We also introduce some new problems in this domain, which may be of independent interest. △ Less

Submitted 18 March, 2022; v1 submitted 28 June, 2017; originally announced June 2017.

Comments: 15 Pages, 5 Figures , 21st International Symposium on Fundamentals of Computation Theory. arXiv admin note: text overlap with arXiv:1604.04978

arXiv:1612.08178 [pdf]

JU_KS_Group@FIRE 2016: Consumer Health Information Search

Authors: Kamal Sarkar, Debanjan Das, Indra Banerjee, Mamta Kumari, Prasenjit Biswas

Abstract: In this paper, we describe the methodology used and the results obtained by us for completing the tasks given under the shared task on Consumer Health Information Search (CHIS) collocated with the Forum for Information Retrieval Evaluation (FIRE) 2016, ISI Kolkata. The shared task consists of two sub-tasks - (1) task1: given a query and a document/set of documents associated with that query, the t… ▽ More In this paper, we describe the methodology used and the results obtained by us for completing the tasks given under the shared task on Consumer Health Information Search (CHIS) collocated with the Forum for Information Retrieval Evaluation (FIRE) 2016, ISI Kolkata. The shared task consists of two sub-tasks - (1) task1: given a query and a document/set of documents associated with that query, the task is to classify the sentences in the document as relevant to the query or not and (2) task 2: the relevant sentences need to be further classified as supporting the claim made in the query, or opposing the claim made in the query. We have participated in both the sub-tasks. The percentage accuracy obtained by our developed system for task1 was 73.39 which is third highest among the 9 teams participated in the shared task. △ Less

Submitted 24 December, 2016; originally announced December 2016.

Comments: 8th meeting of Forum for Information Retrieval Evaluation 2016, 2016

arXiv:1612.06473 [pdf, other]

Sorting Networks On Restricted Topologies

Authors: Indranil Banerjee, Dana Richards, Igor Shinkar

Abstract: The sorting number of a graph with $n$ vertices is the minimum depth of a sorting network with $n$ inputs and outputs that uses only the edges of the graph to perform comparisons. Many known results on sorting networks can be stated in terms of sorting numbers of different classes of graphs. In this paper we show the following general results about the sorting number of graphs. Any $n$-vertex gr… ▽ More The sorting number of a graph with $n$ vertices is the minimum depth of a sorting network with $n$ inputs and outputs that uses only the edges of the graph to perform comparisons. Many known results on sorting networks can be stated in terms of sorting numbers of different classes of graphs. In this paper we show the following general results about the sorting number of graphs. Any $n$-vertex graph that contains a simple path of length $d$ has a sorting network of depth $O(n \log(n/d))$. Any $n$-vertex graph with maximal degree $Δ$ has a sorting network of depth $O(Δn)$. We also provide several results that relate the sorting number of a graph with its routing number, size of its maximal matching, and other well known graph properties. Additionally, we give some new bounds on the sorting number for some typical graphs. △ Less

Submitted 18 March, 2022; v1 submitted 19 December, 2016; originally announced December 2016.

Comments: 16 pages, 3 figures

arXiv:1612.03361 [pdf, ps, other]

An Energy-Efficient VCO-Based Matrix Multiplier Block to Support On-Chip Image Analysis

Authors: Imon Banerjee, Arindam Sanyal

Abstract: Images typically are represented as uniformly sampled data in the form of matrix of pixels/voxels. Therefore, matrix multiply-and-accumulate (MAC) forms the core of most state-of-the-art image analysis algorithms. While digital implementation of MAC has generally been the preferred approach, high power consumption is an impediment to adopting it for medical image analysis. In this work, we present… ▽ More Images typically are represented as uniformly sampled data in the form of matrix of pixels/voxels. Therefore, matrix multiply-and-accumulate (MAC) forms the core of most state-of-the-art image analysis algorithms. While digital implementation of MAC has generally been the preferred approach, high power consumption is an impediment to adopting it for medical image analysis. In this work, we present a time-domain signal processing architecture which performs MAC operations with 7bit accuracy while consuming 400X lower energy than digital implementation. The proposed architecture performs analog computation using mostly digital circuits and is suitable for scaled CMOS technologies. The proposed time-domain MAC architecture is expected to play a central role in empowering the advancement of various on-chip image analysis operations. △ Less

Submitted 10 December, 2016; originally announced December 2016.

arXiv:1612.00408 [pdf, other]

Computerized Multiparametric MR image Analysis for Prostate Cancer Aggressiveness-Assessment

Authors: Imon Banerjee, Lewis Hahn, Geoffrey Sonn, Richard Fan, Daniel L. Rubin

Abstract: We propose an automated method for detecting aggressive prostate cancer(CaP) (Gleason score >=7) based on a comprehensive analysis of the lesion and the surrounding normal prostate tissue which has been simultaneously captured in T2-weighted MR images, diffusion-weighted images (DWI) and apparent diffusion coefficient maps (ADC). The proposed methodology was tested on a dataset of 79 patients (40… ▽ More We propose an automated method for detecting aggressive prostate cancer(CaP) (Gleason score >=7) based on a comprehensive analysis of the lesion and the surrounding normal prostate tissue which has been simultaneously captured in T2-weighted MR images, diffusion-weighted images (DWI) and apparent diffusion coefficient maps (ADC). The proposed methodology was tested on a dataset of 79 patients (40 aggressive, 39 non-aggressive). We evaluated the performance of a wide range of popular quantitative imaging features on the characterization of aggressive versus non-aggressive CaP. We found that a group of 44 discriminative predictors among 1464 quantitative imaging features can be used to produce an area under the ROC curve of 0.73. △ Less

Submitted 1 December, 2016; originally announced December 2016.

Comments: NIPS 2016 Workshop on Machine Learning for Health (NIPS ML4HC)

arXiv:1611.07933 [pdf, other]

Routing Number Of A Pyramid

Authors: Indranil Banerjee, Dana Richards

Abstract: In this short note we give the routing number of pyramid graph under the \textit{routing via matching} model introduced by Alon et al\cite{5}. This model can be viewed as a communication scheme on a distributed network. The nodes in the network can communicate via matchings (a step), where a node exchanges data with its partner. Formally, given a connected graph $G$ with vertices labeled from… ▽ More In this short note we give the routing number of pyramid graph under the \textit{routing via matching} model introduced by Alon et al\cite{5}. This model can be viewed as a communication scheme on a distributed network. The nodes in the network can communicate via matchings (a step), where a node exchanges data with its partner. Formally, given a connected graph $G$ with vertices labeled from $[1,...,n]$ and a permutation $π$ giving the destination of pebbles on the vertices the problem is to find a minimum step routing scheme. This is denoted as the routing time $rt(G,π)$ of $G$ given $π$. We show that a $d$-dimensional pyramid with $m$ levels has a routing number of $O(dN^{1/d})$. △ Less

Submitted 23 November, 2016; originally announced November 2016.

Comments: 3 pages, 2 figures

arXiv:1604.04978 [pdf, other]

Routing and Sorting Via Matchings On Graphs

Authors: Indranil Banerjee, Dana Richards

Abstract: The paper is divided in to two parts. In the first part we present some new results for the \textit{routing via matching} model introduced by Alon et al\cite{5}. This model can be viewed as a communication scheme on a distributed network. The nodes in the network can communicate via matchings (a step), where a node exchanges data with its partner. Formally, given a connected graph $G$ with vertice… ▽ More The paper is divided in to two parts. In the first part we present some new results for the \textit{routing via matching} model introduced by Alon et al\cite{5}. This model can be viewed as a communication scheme on a distributed network. The nodes in the network can communicate via matchings (a step), where a node exchanges data with its partner. Formally, given a connected graph $G$ with vertices labeled from $[1,...,n]$ and a permutation $π$ giving the destination of pebbles on the vertices the problem is to find a minimum step routing scheme. This is denoted as the routing time $rt(G,π)$ of $G$ given $π$. In this paper we present the following new results, which answer one of the open problems posed in \cite{5}: 1) Determining whether $rt(G,π)$ is $\le 2$ can be done in $O(n^{2.5})$ deterministic time for any arbitrary connected graph $G$. 2) Determining whether $rt(G,π)$ is $\le k$ for any $k \ge 3$ is NP-Complete. In the second part we study a related property of graphs, which measures how easy it is to design sorting networks using only the edges of a given graph. Informally, \textit{sorting number} of a graph is the minimum depth sorting network that only uses edges of the graph. Many of the classical results on sorting networks can be represented in this framework. We show that a tree with maximum degree $Δ$ can accommodate a $O(\min(nΔ^2,n^2))$ depth sorting network. Additionally, we give two instance of trees for which this bound is tight. △ Less

Submitted 27 April, 2016; v1 submitted 17 April, 2016; originally announced April 2016.

Comments: 14 pages, submitted to ESA 2016

arXiv:1508.03698 [pdf, ps, other]

Sorting Under 1-$\infty$ Cost Model

Authors: Indranil Banerjee, Dana Richards

Abstract: In this paper we study the problem of sorting under non-uniform comparison costs, where costs are either 1 or $\infty$. If comparing a pair has an associated cost of $\infty$ then we say that such a pair cannot be compared (forbidden pairs). Along with the set of elements $V$ the input to our problem is a graph $G(V, E)$, whose edges represents the pairs that we can compare incurring an unit of co… ▽ More In this paper we study the problem of sorting under non-uniform comparison costs, where costs are either 1 or $\infty$. If comparing a pair has an associated cost of $\infty$ then we say that such a pair cannot be compared (forbidden pairs). Along with the set of elements $V$ the input to our problem is a graph $G(V, E)$, whose edges represents the pairs that we can compare incurring an unit of cost. Given a graph with $n$ vertices and $q$ forbidden edges we propose the first non-trivial deterministic algorithm which makes $O((q + n)\log{n})$ comparisons with a total complexity of $O(n^2 + q^{ω/2})$, where $ω$ is the exponent in the complexity of matrix multiplication. We also propose a simple randomized algorithm for the problem which makes $\widetilde{O}(n^2/\sqrt{q + n} + n\sqrt{q})$ probes with high probability. When the input graph is random we show that $\widetilde{O}(\min{(n^{3/2}, pn^2)})$ probes suffice, where $p$ is the edge probability. △ Less

Submitted 10 November, 2015; v1 submitted 15 August, 2015; originally announced August 2015.

Comments: 12 pages, 1 figure, submitted to STOC 2016

arXiv:1508.02477 [pdf, ps, other]

Computing Maximal Layers Of Points in $E^{f(n)}$

Authors: Indranil Banerjee, Dana Richards

Abstract: In this paper we present a randomized algorithm for computing the collection of maximal layers for a point set in $E^{k}$ ($k = f(n)$). The input to our algorithm is a point set $P = \{p_1,...,p_n\}$ with $p_i \in E^{k}$. The proposed algorithm achieves a runtime of $O\left(kn^{2 - {1 \over \log{k}} + \log_k{\left(1 + {2 \over {k+1}}\right)}}\log{n}\right)$ when $P$ is a random order and a runtime… ▽ More In this paper we present a randomized algorithm for computing the collection of maximal layers for a point set in $E^{k}$ ($k = f(n)$). The input to our algorithm is a point set $P = \{p_1,...,p_n\}$ with $p_i \in E^{k}$. The proposed algorithm achieves a runtime of $O\left(kn^{2 - {1 \over \log{k}} + \log_k{\left(1 + {2 \over {k+1}}\right)}}\log{n}\right)$ when $P$ is a random order and a runtime of $O(k^2 n^{3/2 + (\log_{k}{(k-1)})/2}\log{n})$ for an arbitrary $P$. Both bounds hold in expectation. Additionally, the run time is bounded by $O(kn^2)$ in the worst case. This is the first non-trivial algorithm whose run-time remains polynomial whenever $f(n)$ is bounded by some polynomial in $n$ while remaining sub-quadratic in $n$ for constant $k$. The algorithm is implemented using a new data-structure for storing and answering dominance queries over the set of incomparable points. △ Less

Submitted 10 November, 2015; v1 submitted 10 August, 2015; originally announced August 2015.

Comments: 13 pages, submitted to LATIN 2016

arXiv:1305.7103 [pdf]

Fault-tolerant multipath routing scheme for energy efficient wireless sensor networks

Authors: Prasenjit Chanak, Tuhina Samanta, Indrajit Banerjee

Abstract: The main challenge in wireless sensor network is to improve the fault tolerance of each node and also provide an energy efficient fast data routing service. In this paper we propose an energy efficient node fault diagnosis and recovery for wireless sensor networks referred as fault tolerant multipath routing scheme for energy efficient wireless sensor network (FTMRS).The FTMRS is based on multipat… ▽ More The main challenge in wireless sensor network is to improve the fault tolerance of each node and also provide an energy efficient fast data routing service. In this paper we propose an energy efficient node fault diagnosis and recovery for wireless sensor networks referred as fault tolerant multipath routing scheme for energy efficient wireless sensor network (FTMRS).The FTMRS is based on multipath data routing scheme. One shortest path is use for main data routing in FTMRS technique and other two backup paths are used as alternative path for faulty network and to handle the overloaded traffic on main channel. Shortest path data routing ensures energy efficient data routing. The performance analysis of FTMRS shows better results compared to other popular fault tolerant techniques in wireless sensor networks. △ Less

Submitted 30 May, 2013; originally announced May 2013.

Journal ref: International Journal of Wireless & Mobile Networks (IJWMN) Vol. 5, No. 2, April 2013

arXiv:1209.0286 [pdf]

CAWS - Security Algorithms for Wireless Sensor Networks: A Cellular Automata Based Approach

Authors: Nilanjan Sen, Indrajit Banerjee

Abstract: Security in the Wireless Sensor Networks (WSN) is a very challenging task because of their dissimilarities with the conventional wireless networks. The related works so far have been done have tried to solve the problem keeping in the mind the constraints of WSNs. In this paper we have proposed a set of cellular automata based security algorithms (CAWS) which consists of CAKD, a Cellular Automata… ▽ More Security in the Wireless Sensor Networks (WSN) is a very challenging task because of their dissimilarities with the conventional wireless networks. The related works so far have been done have tried to solve the problem keeping in the mind the constraints of WSNs. In this paper we have proposed a set of cellular automata based security algorithms (CAWS) which consists of CAKD, a Cellular Automata (CA) based key management algorithm and CASC, a CA based secure data communication algorithm, which require very small amount of memory as well as simple computation. △ Less

Submitted 3 September, 2012; originally announced September 2012.

Comments: Proceedings of "All India Seminar on Role of ICT in Improving Quality of Life" on March 26-27, 2010 organized by The Institution of Engineers (India) and Bengal Engineering and Science University, Shibpur

Journal ref: Proceedings of "All India Seminar on Role of ICT in Improving Quality of Life", Dated on March 26-27, 2010; pp: 81-88

arXiv:1205.4928 [pdf, ps, other]

Grey-box GUI Testing: Efficient Generation of Event Sequences

Authors: Stephan Arlt, Ishan Banerjee, Cristiano Bertolini, Atif M. Memon, Martin Schäf

Abstract: Graphical user interfaces (GUIs), due to their event driven nature, present a potentially unbounded space of all possible ways to interact with software. During testing it becomes necessary to effectively sample this space. In this paper we develop algorithms that sample the GUI's input space by only generating sequences that (1) are allowed by the GUI's structure, and (2) chain together only thos… ▽ More Graphical user interfaces (GUIs), due to their event driven nature, present a potentially unbounded space of all possible ways to interact with software. During testing it becomes necessary to effectively sample this space. In this paper we develop algorithms that sample the GUI's input space by only generating sequences that (1) are allowed by the GUI's structure, and (2) chain together only those events that have data dependencies between their event handlers. We create a new abstraction, called an event-dependency graph (EDG) of the GUI, that captures data dependencies between event handler code. We develop a mapping between EDGs and an existing black-box user-level model of the GUI's workflow, called an event-flow graph (EFG). We have implemented automated EDG construction in a tool that analyzes the bytecode of each event handler. We evaluate our "grey-box" approach using four open-source applications and compare it with the current state-of-the-art EFG approach. Our results show that using the EDG reduces the number of test cases while still achieving at least the same coverage. Furthermore, we were able to detect 2 new bugs in the subject applications. △ Less

Submitted 22 May, 2012; originally announced May 2012.

Comments: 11 pages

MSC Class: 68N30

arXiv:1109.2430 [pdf]

CCABC: Cyclic Cellular Automata Based Clustering For Energy Conservation in Sensor Networks

Authors: Indrajit Banerjee, Prasenjit Chanak, Hafizur Rahaman

Abstract: Sensor network has been recognized as the most significant technology for next century. Despites of its potential application, wireless sensor network encounters resource restriction such as low power, reduced bandwidth and specially limited power sources. This work proposes an efficient technique for the conservation of energy in a wireless sensor network (WSN) by forming an effective cluster of… ▽ More Sensor network has been recognized as the most significant technology for next century. Despites of its potential application, wireless sensor network encounters resource restriction such as low power, reduced bandwidth and specially limited power sources. This work proposes an efficient technique for the conservation of energy in a wireless sensor network (WSN) by forming an effective cluster of the network nodes distributed over a wide range of geographical area. The clustering scheme is developed around a specified class of cellular automata (CA) referred to as the modified cyclic cellular automata (mCCA). It sets a number of nodes in stand-by mode at an instance of time without compromising the area of network coverage and thereby conserves the battery power. The proposed scheme also determines an effective cluster size where the inter-cluster and intra-cluster communication cost is minimum. The simulation results establish that the cyclic cellular automata based clustering for energy conservation in sensor networks (CCABC) is more reliable than the existing schemes where clustering and CA based energy saving technique is used. △ Less

Submitted 12 September, 2011; originally announced September 2011.

Showing 1–45 of 45 results for author: Banerjee, I