-
TIER: Text-Image Entropy Regularization for CLIP-style models
Authors:
Anil Palepu,
Andrew L. Beam
Abstract:
In this paper, we introduce a novel regularization scheme on contrastive language-image pre-trained (CLIP) medical vision models. Our approach is based on the observation that on many medical imaging tasks text tokens should only describe a small number of image regions and, likewise, each image region should correspond to only a few text tokens. In CLIP-style models, this implies that text-token…
▽ More
In this paper, we introduce a novel regularization scheme on contrastive language-image pre-trained (CLIP) medical vision models. Our approach is based on the observation that on many medical imaging tasks text tokens should only describe a small number of image regions and, likewise, each image region should correspond to only a few text tokens. In CLIP-style models, this implies that text-token embeddings should have high similarity to only a small number of image-patch embeddings for a given image-text pair. We formalize this observation using a novel regularization scheme that penalizes the entropy of the text-token to image-patch similarity scores. We qualitatively and quantitatively demonstrate that the proposed regularization scheme shrinks most of the pairwise text-token and image-patch similarity scores towards zero, thus achieving the desired effect. We demonstrate the promise of our approach in an important medical context, chest x-rays, where this underlying sparsity hypothesis naturally arises. Using our proposed approach, we achieve state of the art (SOTA) average zero-shot performance on the CheXpert and Padchest chest x-ray datasets, outperforming an unregularized version of the model and several recently published self-supervised models.
△ Less
Submitted 27 February, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Self-Supervision on Images and Text Reduces Reliance on Visual Shortcut Features
Authors:
Anil Palepu,
Andrew L Beam
Abstract:
Deep learning models trained in a fully supervised manner have been shown to rely on so-called "shortcut" features. Shortcut features are inputs that are associated with the outcome of interest in the training data, but are either no longer associated or not present in testing or deployment settings. Here we provide experiments that show recent self-supervised models trained on images and text pro…
▽ More
Deep learning models trained in a fully supervised manner have been shown to rely on so-called "shortcut" features. Shortcut features are inputs that are associated with the outcome of interest in the training data, but are either no longer associated or not present in testing or deployment settings. Here we provide experiments that show recent self-supervised models trained on images and text provide more robust image representations and reduce the model's reliance on visual shortcut features on a realistic medical imaging example. Additionally, we find that these self-supervised models "forget" shortcut features more quickly than fully supervised ones when fine-tuned on labeled data. Though not a complete solution, our experiments provide compelling evidence that self-supervised models trained on images and text provide some resilience to visual shortcut features.
△ Less
Submitted 10 July, 2022; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Deep Learning Methods for Proximal Inference via Maximum Moment Restriction
Authors:
Benjamin Kompa,
David R. Bellamy,
Thomas Kolokotrones,
James M. Robins,
Andrew L. Beam
Abstract:
The No Unmeasured Confounding Assumption is widely used to identify causal effects in observational studies. Recent work on proximal inference has provided alternative identification results that succeed even in the presence of unobserved confounders, provided that one has measured a sufficiently rich set of proxy variables, satisfying specific structural conditions. However, proximal inference re…
▽ More
The No Unmeasured Confounding Assumption is widely used to identify causal effects in observational studies. Recent work on proximal inference has provided alternative identification results that succeed even in the presence of unobserved confounders, provided that one has measured a sufficiently rich set of proxy variables, satisfying specific structural conditions. However, proximal inference requires solving an ill-posed integral equation. Previous approaches have used a variety of machine learning techniques to estimate a solution to this integral equation, commonly referred to as the bridge function. However, prior work has often been limited by relying on pre-specified kernel functions, which are not data adaptive and struggle to scale to large datasets. In this work, we introduce a flexible and scalable method based on a deep neural network to estimate causal effects in the presence of unmeasured confounding using proximal inference. Our method achieves state of the art performance on two well-established proximal inference benchmarks. Finally, we provide theoretical consistency guarantees for our method.
△ Less
Submitted 12 October, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
MedSelect: Selective Labeling for Medical Image Classification Combining Meta-Learning with Deep Reinforcement Learning
Authors:
Akshay Smit,
Damir Vrabac,
Yujie He,
Andrew Y. Ng,
Andrew L. Beam,
Pranav Rajpurkar
Abstract:
We propose a selective learning method using meta-learning and deep reinforcement learning for medical image interpretation in the setting of limited labeling resources. Our method, MedSelect, consists of a trainable deep learning selector that uses image embeddings obtained from contrastive pretraining for determining which images to label, and a non-parametric selector that uses cosine similarit…
▽ More
We propose a selective learning method using meta-learning and deep reinforcement learning for medical image interpretation in the setting of limited labeling resources. Our method, MedSelect, consists of a trainable deep learning selector that uses image embeddings obtained from contrastive pretraining for determining which images to label, and a non-parametric selector that uses cosine similarity to classify unseen images. We demonstrate that MedSelect learns an effective selection strategy outperforming baseline selection strategies across seen and unseen medical conditions for chest X-ray interpretation. We also perform an analysis of the selections performed by MedSelect comparing the distribution of latent embeddings and clinical features, and find significant differences compared to the strongest performing baseline. We believe that our method may be broadly applicable across medical imaging settings where labels are expensive to acquire.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Evaluating Progress on Machine Learning for Longitudinal Electronic Healthcare Data
Authors:
David Bellamy,
Leo Celi,
Andrew L. Beam
Abstract:
The Large Scale Visual Recognition Challenge based on the well-known Imagenet dataset catalyzed an intense flurry of progress in computer vision. Benchmark tasks have propelled other sub-fields of machine learning forward at an equally impressive pace, but in healthcare it has primarily been image processing tasks, such as in dermatology and radiology, that have experienced similar benchmark-drive…
▽ More
The Large Scale Visual Recognition Challenge based on the well-known Imagenet dataset catalyzed an intense flurry of progress in computer vision. Benchmark tasks have propelled other sub-fields of machine learning forward at an equally impressive pace, but in healthcare it has primarily been image processing tasks, such as in dermatology and radiology, that have experienced similar benchmark-driven progress. In the present study, we performed a comprehensive review of benchmarks in medical machine learning for structured data, identifying one based on the Medical Information Mart for Intensive Care (MIMIC-III) that allows the first direct comparison of predictive performance and thus the evaluation of progress on four clinical prediction tasks: mortality, length of stay, phenotyping, and patient decompensation. We find that little meaningful progress has been made over a 3 year period on these tasks, despite significant community engagement. Through our meta-analysis, we find that the performance of deep recurrent models is only superior to logistic regression on certain tasks. We conclude with a synthesis of these results, possible explanations, and a list of desirable qualities for future benchmarks in medical machine learning.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
Authors:
Natalia Antropova,
Andrew L. Beam,
Brett K. Beaulieu-Jones,
Irene Chen,
Corey Chivers,
Adrian Dalca,
Sam Finlayson,
Madalina Fiterau,
Jason Alan Fries,
Marzyeh Ghassemi,
Mike Hughes,
Bruno Jedynak,
Jasvinder S. Kandola,
Matthew McDermott,
Tristan Naumann,
Peter Schulam,
Farah Shamout,
Alexandre Yahi
Abstract:
This volume represents the accepted submissions from the Machine Learning for Health (ML4H) workshop at the conference on Neural Information Processing Systems (NeurIPS) 2018, held on December 8, 2018 in Montreal, Canada.
This volume represents the accepted submissions from the Machine Learning for Health (ML4H) workshop at the conference on Neural Information Processing Systems (NeurIPS) 2018, held on December 8, 2018 in Montreal, Canada.
△ Less
Submitted 24 November, 2018; v1 submitted 17 November, 2018;
originally announced November 2018.
-
Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes
Authors:
Brett K. Beaulieu-Jones,
Isaac S. Kohane,
Andrew L. Beam
Abstract:
Biomedical association studies are increasingly done using clinical concepts, and in particular diagnostic codes from clinical data repositories as phenotypes. Clinical concepts can be represented in a meaningful, vector space using word embedding models. These embeddings allow for comparison between clinical concepts or for straightforward input to machine learning models. Using traditional appro…
▽ More
Biomedical association studies are increasingly done using clinical concepts, and in particular diagnostic codes from clinical data repositories as phenotypes. Clinical concepts can be represented in a meaningful, vector space using word embedding models. These embeddings allow for comparison between clinical concepts or for straightforward input to machine learning models. Using traditional approaches, good representations require high dimensionality, making downstream tasks such as visualization more difficult. We applied Poincaré embeddings in a 2-dimensional hyperbolic space to a large-scale administrative claims database and show performance comparable to 100-dimensional embeddings in a euclidean space. We then examine disease relationships under different disease contexts to better understand potential phenotypes.
△ Less
Submitted 3 November, 2018;
originally announced November 2018.
-
A Review of Challenges and Opportunities in Machine Learning for Health
Authors:
Marzyeh Ghassemi,
Tristan Naumann,
Peter Schulam,
Andrew L. Beam,
Irene Y. Chen,
Rajesh Ranganath
Abstract:
Modern electronic health records (EHRs) provide data to answer clinically meaningful questions. The growing data in EHRs makes healthcare ripe for the use of machine learning. However, learning in a clinical setting presents unique challenges that complicate the use of common machine learning methodologies. For example, diseases in EHRs are poorly labeled, conditions can encompass multiple underly…
▽ More
Modern electronic health records (EHRs) provide data to answer clinically meaningful questions. The growing data in EHRs makes healthcare ripe for the use of machine learning. However, learning in a clinical setting presents unique challenges that complicate the use of common machine learning methodologies. For example, diseases in EHRs are poorly labeled, conditions can encompass multiple underlying endotypes, and healthy individuals are underrepresented. This article serves as a primer to illuminate these challenges and highlights opportunities for members of the machine learning community to contribute to healthcare.
△ Less
Submitted 5 December, 2019; v1 submitted 1 June, 2018;
originally announced June 2018.
-
Adversarial Attacks Against Medical Deep Learning Systems
Authors:
Samuel G. Finlayson,
Hyung Won Chung,
Isaac S. Kohane,
Andrew L. Beam
Abstract:
The discovery of adversarial examples has raised concerns about the practical deployment of deep learning systems. In this paper, we demonstrate that adversarial examples are capable of manipulating deep learning systems across three clinical domains. For each of our representative medical deep learning classifiers, both white and black box attacks were highly successful. Our models are representa…
▽ More
The discovery of adversarial examples has raised concerns about the practical deployment of deep learning systems. In this paper, we demonstrate that adversarial examples are capable of manipulating deep learning systems across three clinical domains. For each of our representative medical deep learning classifiers, both white and black box attacks were highly successful. Our models are representative of the current state of the art in medical computer vision and, in some cases, directly reflect architectures already seeing deployment in real world clinical settings. In addition to the technical contribution of our paper, we synthesize a large body of knowledge about the healthcare system to argue that medicine may be uniquely susceptible to adversarial attacks, both in terms of monetary incentives and technical vulnerability. To this end, we outline the healthcare economy and the incentives it creates for fraud and provide concrete examples of how and why such attacks could be realistically carried out. We urge practitioners to be aware of current vulnerabilities when deploying deep learning systems in clinical settings, and encourage the machine learning community to further investigate the domain-specific characteristics of medical learning systems.
△ Less
Submitted 4 February, 2019; v1 submitted 14 April, 2018;
originally announced April 2018.
-
Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data
Authors:
Andrew L. Beam,
Benjamin Kompa,
Allen Schmaltz,
Inbar Fried,
Griffin Weber,
Nathan P. Palmer,
Xu Shi,
Tianxi Cai,
Isaac S. Kohane
Abstract:
Word embeddings are a popular approach to unsupervised learning of word relationships that are widely used in natural language processing. In this article, we present a new set of embeddings for medical concepts learned using an extremely large collection of multimodal medical data. Leaning on recent theoretical insights, we demonstrate how an insurance claims database of 60 million members, a col…
▽ More
Word embeddings are a popular approach to unsupervised learning of word relationships that are widely used in natural language processing. In this article, we present a new set of embeddings for medical concepts learned using an extremely large collection of multimodal medical data. Leaning on recent theoretical insights, we demonstrate how an insurance claims database of 60 million members, a collection of 20 million clinical notes, and 1.7 million full text biomedical journal articles can be combined to embed concepts into a common space, resulting in the largest ever set of embeddings for 108,477 medical concepts. To evaluate our approach, we present a new benchmark methodology based on statistical power specifically designed to test embeddings of medical concepts. Our approach, called cui2vec, attains state-of-the-art performance relative to previous methods in most instances. Finally, we provide a downloadable set of pre-trained embeddings for other researchers to use, as well as an online tool for interactive exploration of the cui2vec embeddings
△ Less
Submitted 19 August, 2019; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Auditory Brainstem Response in Infants and Children with Autism: A Meta-Analysis
Authors:
Oren Miron,
Andrew L. Beam,
Isaac S. Kohane
Abstract:
Infants with autism were recently found to have prolonged Auditory Brainstem Response (ABR); however, at older ages, findings are contradictory. We compared ABR differences between participants with autism and controls with respect to age using a meta-analysis. Data sources included MEDLINE, EMBASE, Web of Science, Google Scholar, HOLLIS and ScienceDirect from their inception to June 2016. The 25…
▽ More
Infants with autism were recently found to have prolonged Auditory Brainstem Response (ABR); however, at older ages, findings are contradictory. We compared ABR differences between participants with autism and controls with respect to age using a meta-analysis. Data sources included MEDLINE, EMBASE, Web of Science, Google Scholar, HOLLIS and ScienceDirect from their inception to June 2016. The 25 studies that were included had a total of 1349 participants (727 participants with autism and 622 controls) and an age range of 0-40 years. Prolongation of wave V in autism had a significant negative correlation with age (R2=0.23; P=.01). The 22 studies below age 18 years showed a significantly prolonged wave V in autism (Standard Mean Difference=0.6 [95% CI, 0.5 to 0.8]; P<.001). The 3 studies above 18 years of age showed a significantly shorter wave V in autism (SMD=-0.6 [95% CI, -1.0 to -0.2]; P=.004). Prolonged ABR was consistent in infants and children with autism, suggesting it can serve as an autism biomarker at infancy. As the ABR is routinely used to screen infants for hearing impairment, the opportunity for replication studies is extensive.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
Bayesian Neural Networks for Genetic Association Studies of Complex Disease
Authors:
Andrew L. Beam,
Alison Motsinger-Reif,
Jon Doyle
Abstract:
Discovering causal genetic variants from large genetic association studies poses many difficult challenges. Assessing which genetic markers are involved in determining trait status is a computationally demanding task, especially in the presence of gene-gene interactions. A non-parametric Bayesian approach in the form of a Bayesian neural network is proposed for use in analyzing genetic association…
▽ More
Discovering causal genetic variants from large genetic association studies poses many difficult challenges. Assessing which genetic markers are involved in determining trait status is a computationally demanding task, especially in the presence of gene-gene interactions. A non-parametric Bayesian approach in the form of a Bayesian neural network is proposed for use in analyzing genetic association studies. Demonstrations on synthetic and real data reveal they are able to efficiently and accurately determine which variants are involved in determining case-control status. Using graphics processing units (GPUs) the time needed to build these models is decreased by several orders of magnitude. In comparison with commonly used approaches for detecting genetic interactions, Bayesian neural networks perform very well across a broad spectrum of possible genetic relationships while having the computational efficiency needed to handle large datasets.
△ Less
Submitted 15 April, 2014; v1 submitted 15 April, 2014;
originally announced April 2014.
-
Fast Hamiltonian Monte Carlo Using GPU Computing
Authors:
Andrew L. Beam,
Sujit K. Ghosh,
Jon Doyle
Abstract:
In recent years, the Hamiltonian Monte Carlo (HMC) algorithm has been found to work more efficiently compared to other popular Markov Chain Monte Carlo (MCMC) methods (such as random walk Metropolis-Hastings) in generating samples from a posterior distribution. A general framework for HMC based on the use of graphical processing units (GPUs) is shown to greatly reduce the computing time needed for…
▽ More
In recent years, the Hamiltonian Monte Carlo (HMC) algorithm has been found to work more efficiently compared to other popular Markov Chain Monte Carlo (MCMC) methods (such as random walk Metropolis-Hastings) in generating samples from a posterior distribution. A general framework for HMC based on the use of graphical processing units (GPUs) is shown to greatly reduce the computing time needed for Bayesian inference. The most expensive computational tasks in HMC are the evaluation of the posterior kernel and computing its gradient with respect to the parameters of interest. One of primary goals of this article to show that by expressing each of these tasks in terms of simple matrix or element-wise operations and maintaining persistent objects in GPU memory, the computational time can be drastically reduced. By using GPU objects to perform the entire HMC simulation, most of the latency penalties associated with transferring data from main to GPU memory can be avoided. Thus, the proposed computational framework is conceptually very simple, but also is general enough to be applied to most problems that use HMC sampling. For clarity of exposition, the effectiveness of the proposed approach is demonstrated in the high-dimensional setting on a standard statistical model - multinomial regression. Using GPUs, analyses of data sets that were previously intractable for fully Bayesian approaches due to the prohibitively high computational cost are now feasible.
△ Less
Submitted 17 February, 2014;
originally announced February 2014.