Search | arXiv e-print repository

NeRF-US: Removing Ultrasound Imaging Artifacts from Neural Radiance Fields in the Wild

Authors: Rishit Dagli, Atsuhiro Hibi, Rahul G. Krishnan, Pascal N. Tyrrell

Abstract: Current methods for performing 3D reconstruction and novel view synthesis (NVS) in ultrasound imaging data often face severe artifacts when training NeRF-based approaches. The artifacts produced by current approaches differ from NeRF floaters in general scenes because of the unique nature of ultrasound capture. Furthermore, existing models fail to produce reasonable 3D reconstructions when ultraso… ▽ More Current methods for performing 3D reconstruction and novel view synthesis (NVS) in ultrasound imaging data often face severe artifacts when training NeRF-based approaches. The artifacts produced by current approaches differ from NeRF floaters in general scenes because of the unique nature of ultrasound capture. Furthermore, existing models fail to produce reasonable 3D reconstructions when ultrasound data is captured or obtained casually in uncontrolled environments, which is common in clinical settings. Consequently, existing reconstruction and NVS methods struggle to handle ultrasound motion, fail to capture intricate details, and cannot model transparent and reflective surfaces. In this work, we introduced NeRF-US, which incorporates 3D-geometry guidance for border probability and scattering density into NeRF training, while also utilizing ultrasound-specific rendering over traditional volume rendering. These 3D priors are learned through a diffusion model. Through experiments conducted on our new "Ultrasound in the Wild" dataset, we observed accurate, clinically plausible, artifact-free reconstructions. △ Less

Submitted 20 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.05437 [pdf, other]

Predicting Long-Term Allograft Survival in Liver Transplant Recipients

Authors: Xiang Gao, Michael Cooper, Maryam Naghibzadeh, Amirhossein Azhie, Mamatha Bhat, Rahul G. Krishnan

Abstract: Liver allograft failure occurs in approximately 20% of liver transplant recipients within five years post-transplant, leading to mortality or the need for retransplantation. Providing an accurate and interpretable model for individualized risk estimation of graft failure is essential for improving post-transplant care. To this end, we introduce the Model for Allograft Survival (MAS), a simple line… ▽ More Liver allograft failure occurs in approximately 20% of liver transplant recipients within five years post-transplant, leading to mortality or the need for retransplantation. Providing an accurate and interpretable model for individualized risk estimation of graft failure is essential for improving post-transplant care. To this end, we introduce the Model for Allograft Survival (MAS), a simple linear risk score that outperforms other advanced survival models. Using longitudinal patient follow-up data from the United States (U.S.), we develop our models on 82,959 liver transplant recipients and conduct multi-site evaluations on 11 regions. Additionally, by testing on a separate non-U.S. cohort, we explore the out-of-distribution generalization performance of various models without additional fine-tuning, a crucial property for clinical deployment. We find that the most complex models are also the ones most vulnerable to distribution shifts despite achieving the best in-distribution performance. Our findings not only provide a strong risk score for predicting long-term graft failure but also suggest that the routine machine learning pipeline with only in-distribution held-out validation could create harmful consequences for patients at deployment. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: Accepted at MLHC 2024

arXiv:2407.07018 [pdf, other]

End-To-End Causal Effect Estimation from Unstructured Natural Language Data

Authors: Nikita Dhawan, Leonardo Cotta, Karen Ullrich, Rahul G. Krishnan, Chris J. Maddison

Abstract: Knowing the effect of an intervention is critical for human decision-making, but current approaches for causal effect estimation rely on manual data collection and structuring, regardless of the causal assumptions. This increases both the cost and time-to-completion for studies. We show how large, diverse observational text data can be mined with large language models (LLMs) to produce inexpensive… ▽ More Knowing the effect of an intervention is critical for human decision-making, but current approaches for causal effect estimation rely on manual data collection and structuring, regardless of the causal assumptions. This increases both the cost and time-to-completion for studies. We show how large, diverse observational text data can be mined with large language models (LLMs) to produce inexpensive causal effect estimates under appropriate causal assumptions. We introduce NATURAL, a novel family of causal effect estimators built with LLMs that operate over datasets of unstructured text. Our estimators use LLM conditional distributions (over variables of interest, given the text data) to assist in the computation of classical estimators of causal effect. We overcome a number of technical challenges to realize this idea, such as automating data curation and using LLMs to impute missing information. We prepare six (two synthetic and four real) observational datasets, paired with corresponding ground truth in the form of randomized trials, which we used to systematically evaluate each step of our pipeline. NATURAL estimators demonstrate remarkable performance, yielding causal effect estimates that fall within 3 percentage points of their ground truth counterparts, including on real-world Phase 3/4 clinical trials. Our results suggest that unstructured text data is a rich source of causal effect information, and NATURAL is a first step towards an automated pipeline to tap this resource. △ Less

Submitted 23 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: 28 pages, 11 figures

arXiv:2406.00426 [pdf, other]

InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation

Authors: Jacob Si, Wendy Yusi Cheng, Michael Cooper, Rahul G. Krishnan

Abstract: Tabular data are omnipresent in various sectors of industries. Neural networks for tabular data such as TabNet have been proposed to make predictions while leveraging the attention mechanism for interpretability. However, the inferred attention masks are often dense, making it challenging to come up with rationales about the predictive signal. To remedy this, we propose InterpreTabNet, a variant o… ▽ More Tabular data are omnipresent in various sectors of industries. Neural networks for tabular data such as TabNet have been proposed to make predictions while leveraging the attention mechanism for interpretability. However, the inferred attention masks are often dense, making it challenging to come up with rationales about the predictive signal. To remedy this, we propose InterpreTabNet, a variant of the TabNet model that models the attention mechanism as a latent variable sampled from a Gumbel-Softmax distribution. This enables us to regularize the model to learn distinct concepts in the attention masks via a KL Divergence regularizer. It prevents overlapping feature selection by promoting sparsity which maximizes the model's efficacy and improves interpretability to determine the important features when predicting the outcome. To assist in the interpretation of feature interdependencies from our model, we employ a large language model (GPT-4) and use prompt engineering to map from the learned feature mask onto natural language text describing the learned signal. Through comprehensive experiments on real-world datasets, we demonstrate that InterpreTabNet outperforms previous methods for interpreting tabular data while attaining competitive accuracy. △ Less

Submitted 11 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

Comments: ICML 2024 Spotlight

arXiv:2404.07266 [pdf, other]

Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity

Authors: Vahid Balazadeh, Keertana Chidambaram, Viet Nguyen, Rahul G. Krishnan, Vasilis Syrgkanis

Abstract: We study the problem of online sequential decision-making given auxiliary demonstrations from experts who made their decisions based on unobserved contextual information. These demonstrations can be viewed as solving related but slightly different tasks than what the learner faces. This setting arises in many application domains, such as self-driving cars, healthcare, and finance, where expert dem… ▽ More We study the problem of online sequential decision-making given auxiliary demonstrations from experts who made their decisions based on unobserved contextual information. These demonstrations can be viewed as solving related but slightly different tasks than what the learner faces. This setting arises in many application domains, such as self-driving cars, healthcare, and finance, where expert demonstrations are made using contextual information, which is not recorded in the data available to the learning agent. We model the problem as a zero-shot meta-reinforcement learning setting with an unknown task distribution and a Bayesian regret minimization objective, where the unobserved tasks are encoded as parameters with an unknown prior. We propose the Experts-as-Priors algorithm (ExPerior), a non-parametric empirical Bayes approach that utilizes the principle of maximum entropy to establish an informative prior over the learner's decision-making problem. This prior enables the application of any Bayesian approach for online decision-making, such as posterior sampling. We demonstrate that our strategy surpasses existing behaviour cloning and online algorithms for multi-armed bandits and reinforcement learning, showcasing the utility of our approach in leveraging expert demonstrations across different decision-making setups. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.18910 [pdf, other]

A Geometric Explanation of the Likelihood OOD Detection Paradox

Authors: Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul G. Krishnan, Gabriel Loaiza-Ganem

Abstract: Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making l… ▽ More Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection. △ Less

Submitted 11 June, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: ICML 2024

arXiv:2402.07344 [pdf, other]

Measurement Scheduling for ICU Patients with Offline Reinforcement Learning

Authors: Zongliang Ji, Anna Goldenberg, Rahul G. Krishnan

Abstract: Scheduling laboratory tests for ICU patients presents a significant challenge. Studies show that 20-40% of lab tests ordered in the ICU are redundant and could be eliminated without compromising patient safety. Prior work has leveraged offline reinforcement learning (Offline-RL) to find optimal policies for ordering lab tests based on patient information. However, new ICU patient datasets have sin… ▽ More Scheduling laboratory tests for ICU patients presents a significant challenge. Studies show that 20-40% of lab tests ordered in the ICU are redundant and could be eliminated without compromising patient safety. Prior work has leveraged offline reinforcement learning (Offline-RL) to find optimal policies for ordering lab tests based on patient information. However, new ICU patient datasets have since been released, and various advancements have been made in Offline-RL methods. In this study, we first introduce a preprocessing pipeline for the newly-released MIMIC-IV dataset geared toward time-series tasks. We then explore the efficacy of state-of-the-art Offline-RL methods in identifying better policies for ICU patient lab test scheduling. Besides assessing methodological performance, we also discuss the overall suitability and practicality of using Offline-RL frameworks for scheduling laboratory tests in ICU settings. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 11 pages

arXiv:2311.18780 [pdf, other]

MultiResFormer: Transformer with Adaptive Multi-Resolution Modeling for General Time Series Forecasting

Authors: Linfeng Du, Ji Xin, Alex Labach, Saba Zuberi, Maksims Volkovs, Rahul G. Krishnan

Abstract: Transformer-based models have greatly pushed the boundaries of time series forecasting recently. Existing methods typically encode time series data into $\textit{patches}$ using one or a fixed set of patch lengths. This, however, could result in a lack of ability to capture the variety of intricate temporal dependencies present in real-world multi-periodic time series. In this paper, we propose Mu… ▽ More Transformer-based models have greatly pushed the boundaries of time series forecasting recently. Existing methods typically encode time series data into $\textit{patches}$ using one or a fixed set of patch lengths. This, however, could result in a lack of ability to capture the variety of intricate temporal dependencies present in real-world multi-periodic time series. In this paper, we propose MultiResFormer, which dynamically models temporal variations by adaptively choosing optimal patch lengths. Concretely, at the beginning of each layer, time series data is encoded into several parallel branches, each using a detected periodicity, before going through the transformer encoder block. We conduct extensive evaluations on long- and short-term forecasting datasets comparing MultiResFormer with state-of-the-art baselines. MultiResFormer outperforms patch-based Transformer baselines on long-term forecasting tasks and also consistently outperforms CNN baselines by a large margin, while using much fewer parameters than these baselines. △ Less

Submitted 8 February, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.02221 [pdf, other]

Structured Neural Networks for Density Estimation and Causal Inference

Authors: Asic Q. Chen, Ruian Shi, Xiang Gao, Ricardo Baptista, Rahul G. Krishnan

Abstract: Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structur… ▽ More Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structure through masking pathways in a neural network. The masks are designed via a novel relationship we explore between neural network architectures and binary matrix factorization, to ensure that the desired independencies are respected. We devise and study practical algorithms for this otherwise NP-hard design problem based on novel objectives that control the model architecture. We demonstrate the utility of StrNN in three applications: (1) binary and Gaussian density estimation with StrNN, (2) real-valued density estimation with Structured Autoregressive Flows (StrAFs) and Structured Continuous Normalizing Flows (StrCNF), and (3) interventional and counterfactual analysis with StrAFs for causal inference. Our work opens up new avenues for learning neural networks that enable data-efficient generative modeling and the use of normalizing flows for causal effect estimation. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 10 pages with 5 figures, to be published in Neural Information Processing Systems 2023

arXiv:2308.07480 [pdf, other]

Order-based Structure Learning with Normalizing Flows

Authors: Hamidreza Kamkari, Vahid Balazadeh, Vahid Zehtab, Rahul G. Krishnan

Abstract: Estimating the causal structure of observational data is a challenging combinatorial search problem that scales super-exponentially with graph size. Existing methods use continuous relaxations to make this problem computationally tractable but often restrict the data-generating process to additive noise models (ANMs) through explicit or implicit assumptions. We present Order-based Structure Learni… ▽ More Estimating the causal structure of observational data is a challenging combinatorial search problem that scales super-exponentially with graph size. Existing methods use continuous relaxations to make this problem computationally tractable but often restrict the data-generating process to additive noise models (ANMs) through explicit or implicit assumptions. We present Order-based Structure Learning with Normalizing Flows (OSLow), a framework that relaxes these assumptions using autoregressive normalizing flows. We leverage the insight that searching over topological orderings is a natural way to enforce acyclicity in structure discovery and propose a novel, differentiable permutation learning method to find such orderings. Through extensive experiments on synthetic and real-world data, we demonstrate that OSLow outperforms prior baselines and improves performance on the observational Sachs and SynTReN datasets as measured by structural hamming distance and structural intervention distance, highlighting the importance of relaxing the ANM assumption made by existing methods. △ Less

Submitted 17 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2306.11912 [pdf, other]

Copula-Based Deep Survival Models for Dependent Censoring

Authors: Ali Hossein Gharari Foomani, Michael Cooper, Russell Greiner, Rahul G. Krishnan

Abstract: A survival dataset describes a set of instances (e.g. patients) and provides, for each, either the time until an event (e.g. death), or the censoring time (e.g. when lost to follow-up - which is a lower bound on the time until the event). We consider the challenge of survival prediction: learning, from such data, a predictive model that can produce an individual survival distribution for a novel i… ▽ More A survival dataset describes a set of instances (e.g. patients) and provides, for each, either the time until an event (e.g. death), or the censoring time (e.g. when lost to follow-up - which is a lower bound on the time until the event). We consider the challenge of survival prediction: learning, from such data, a predictive model that can produce an individual survival distribution for a novel instance. Many contemporary methods of survival prediction implicitly assume that the event and censoring distributions are independent conditional on the instance's covariates - a strong assumption that is difficult to verify (as we observe only one outcome for each instance) and which can induce significant bias when it does not hold. This paper presents a parametric model of survival that extends modern non-linear survival analysis by relaxing the assumption of conditional independence. On synthetic and semi-synthetic data, our approach significantly improves estimates of survival distributions compared to the standard that assumes conditional independence in the data. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 23 pages, 7 figures

arXiv:2305.12031 [pdf, other]

Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding

Authors: Augustin Toma, Patrick R. Lawler, Jimmy Ba, Rahul G. Krishnan, Barry B. Rubin, Bo Wang

Abstract: We present Clinical Camel, an open large language model (LLM) explicitly tailored for clinical research. Fine-tuned from LLaMA-2 using QLoRA, Clinical Camel achieves state-of-the-art performance across medical benchmarks among openly available medical LLMs. Leveraging efficient single-GPU training, Clinical Camel surpasses GPT-3.5 in five-shot evaluations on all assessed benchmarks, including 64.3… ▽ More We present Clinical Camel, an open large language model (LLM) explicitly tailored for clinical research. Fine-tuned from LLaMA-2 using QLoRA, Clinical Camel achieves state-of-the-art performance across medical benchmarks among openly available medical LLMs. Leveraging efficient single-GPU training, Clinical Camel surpasses GPT-3.5 in five-shot evaluations on all assessed benchmarks, including 64.3% on the USMLE Sample Exam (compared to 58.5% for GPT-3.5), 77.9% on PubMedQA (compared to 60.2%), 60.7% on MedQA (compared to 53.6%), and 54.2% on MedMCQA (compared to 51.0%). In addition to these benchmarks, Clinical Camel demonstrates its broader capabilities, such as synthesizing plausible clinical notes. This work introduces dialogue-based knowledge encoding, a novel method to synthesize conversational data from dense medical texts. While benchmark results are encouraging, extensive and rigorous human evaluation across diverse clinical scenarios is imperative to ascertain safety before implementation. By openly sharing Clinical Camel, we hope to foster transparent and collaborative research, working towards the safe integration of LLMs within the healthcare domain. Significant challenges concerning reliability, bias, and the potential for outdated knowledge persist. Nonetheless, the transparency provided by an open approach reinforces the scientific rigor essential for future clinical applications. △ Less

Submitted 17 August, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: for model weights, see https://huggingface.co/wanglab/

arXiv:2304.13017 [pdf, other]

DuETT: Dual Event Time Transformer for Electronic Health Records

Authors: Alex Labach, Aslesha Pokhrel, Xiao Shi Huang, Saba Zuberi, Seung Eun Yi, Maksims Volkovs, Tomi Poutanen, Rahul G. Krishnan

Abstract: Electronic health records (EHRs) recorded in hospital settings typically contain a wide range of numeric time series data that is characterized by high sparsity and irregular observations. Effective modelling for such data must exploit its time series nature, the semantic relationship between different types of observations, and information in the sparsity structure of the data. Self-supervised Tr… ▽ More Electronic health records (EHRs) recorded in hospital settings typically contain a wide range of numeric time series data that is characterized by high sparsity and irregular observations. Effective modelling for such data must exploit its time series nature, the semantic relationship between different types of observations, and information in the sparsity structure of the data. Self-supervised Transformers have shown outstanding performance in a variety of structured tasks in NLP and computer vision. But multivariate time series data contains structured relationships over two dimensions: time and recorded event type, and straightforward applications of Transformers to time series data do not leverage this distinct structure. The quadratic scaling of self-attention layers can also significantly limit the input sequence length without appropriate input engineering. We introduce the DuETT architecture, an extension of Transformers designed to attend over both time and event type dimensions, yielding robust representations from EHR data. DuETT uses an aggregated input where sparse time series are transformed into a regular sequence with fixed length; this lowers the computational complexity relative to previous EHR Transformer models and, more importantly, enables the use of larger and deeper neural networks. When trained with self-supervised prediction tasks, that provide rich and informative signals for model pre-training, our model outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets. △ Less

Submitted 15 August, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted at MLHC 2023, camera-ready version

arXiv:2303.01841 [pdf, other]

Anamnesic Neural Differential Equations with Orthogonal Polynomial Projections

Authors: Edward De Brouwer, Rahul G. Krishnan

Abstract: Neural ordinary differential equations (Neural ODEs) are an effective framework for learning dynamical systems from irregularly sampled time series data. These models provide a continuous-time latent representation of the underlying dynamical system where new observations at arbitrary time points can be used to update the latent representation of the dynamical system. Existing parameterizations fo… ▽ More Neural ordinary differential equations (Neural ODEs) are an effective framework for learning dynamical systems from irregularly sampled time series data. These models provide a continuous-time latent representation of the underlying dynamical system where new observations at arbitrary time points can be used to update the latent representation of the dynamical system. Existing parameterizations for the dynamics functions of Neural ODEs limit the ability of the model to retain global information about the time series; specifically, a piece-wise integration of the latent process between observations can result in a loss of memory on the dynamic patterns of previously observed data points. We propose PolyODE, a Neural ODE that models the latent continuous-time process as a projection onto a basis of orthogonal polynomials. This formulation enforces long-range memory and preserves a global representation of the underlying dynamical system. Our construction is backed by favourable theoretical guarantees and in a series of experiments, we demonstrate that it outperforms previous works in the reconstruction of past and future data, and in downstream prediction tasks. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: Accepted at ICLR 2023

Journal ref: International Conference on Learning Representations (ICLR) 2023

arXiv:2212.02742 [pdf, other]

A Learning Based Hypothesis Test for Harmful Covariate Shift

Authors: Tom Ginsberg, Zhongyuan Liang, Rahul G. Krishnan

Abstract: The ability to quickly and accurately identify covariate shift at test time is a critical and often overlooked component of safe machine learning systems deployed in high-risk domains. While methods exist for detecting when predictions should not be made on out-of-distribution test examples, identifying distributional level differences between training and test time can help determine when a model… ▽ More The ability to quickly and accurately identify covariate shift at test time is a critical and often overlooked component of safe machine learning systems deployed in high-risk domains. While methods exist for detecting when predictions should not be made on out-of-distribution test examples, identifying distributional level differences between training and test time can help determine when a model should be removed from the deployment setting and retrained. In this work, we define harmful covariate shift (HCS) as a change in distribution that may weaken the generalization of a predictive model. To detect HCS, we use the discordance between an ensemble of classifiers trained to agree on training data and disagree on test data. We derive a loss function for training this ensemble and show that the disagreement rate and entropy represent powerful discriminative statistics for HCS. Empirically, we demonstrate the ability of our method to detect harmful covariate shift with statistical certainty on a variety of high-dimensional datasets. Across numerous domains and modalities, we show state-of-the-art performance compared to existing methods, particularly when the number of observed test samples is small. △ Less

Submitted 1 March, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

arXiv:2211.07076 [pdf, other]

Learning predictive checklists from continuous medical data

Authors: Yukti Makhija, Edward De Brouwer, Rahul G. Krishnan

Abstract: Checklists, while being only recently introduced in the medical domain, have become highly popular in daily clinical practice due to their combined effectiveness and great interpretability. Checklists are usually designed by expert clinicians that manually collect and analyze available evidence. However, the increasing quantity of available medical data is calling for a partially automated checkli… ▽ More Checklists, while being only recently introduced in the medical domain, have become highly popular in daily clinical practice due to their combined effectiveness and great interpretability. Checklists are usually designed by expert clinicians that manually collect and analyze available evidence. However, the increasing quantity of available medical data is calling for a partially automated checklist design. Recent works have taken a step in that direction by learning predictive checklists from categorical data. In this work, we propose to extend this approach to accomodate learning checklists from continuous medical data using mixed-integer programming approach. We show that this extension outperforms a range of explainable machine learning baselines on the prediction of sepsis from intensive care clinical trajectories. △ Less

Submitted 13 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 7 pages

Journal ref: Machine Learning for Health (ML4H) symposium 2022

arXiv:2210.08139 [pdf, other]

Partial Identification of Treatment Effects with Implicit Generative Models

Authors: Vahid Balazadeh, Vasilis Syrgkanis, Rahul G. Krishnan

Abstract: We consider the problem of partial identification, the estimation of bounds on the treatment effects from observational data. Although studied using discrete treatment variables or in specific causal graphs (e.g., instrumental variables), partial identification has been recently explored using tools from deep generative modeling. We propose a new method for partial identification of average treatm… ▽ More We consider the problem of partial identification, the estimation of bounds on the treatment effects from observational data. Although studied using discrete treatment variables or in specific causal graphs (e.g., instrumental variables), partial identification has been recently explored using tools from deep generative modeling. We propose a new method for partial identification of average treatment effects(ATEs) in general causal graphs using implicit generative models comprising continuous and discrete random variables. Since ATE with continuous treatment is generally non-regular, we leverage the partial derivatives of response functions to define a regular approximation of ATE, a quantity we call uniform average treatment derivative (UATD). We prove that our algorithm converges to tight bounds on ATE in linear structural causal models (SCMs). For nonlinear SCMs, we empirically show that using UATD leads to tighter and more stable bounds than methods that directly optimize the ATE. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2208.02301 [pdf, other]

HiCu: Leveraging Hierarchy for Curriculum Learning in Automated ICD Coding

Authors: Weiming Ren, Ruijing Zeng, Tongzi Wu, Tianshu Zhu, Rahul G. Krishnan

Abstract: There are several opportunities for automation in healthcare that can improve clinician throughput. One such example is assistive tools to document diagnosis codes when clinicians write notes. We study the automation of medical code prediction using curriculum learning, which is a training strategy for machine learning models that gradually increases the hardness of the learning tasks from easy to… ▽ More There are several opportunities for automation in healthcare that can improve clinician throughput. One such example is assistive tools to document diagnosis codes when clinicians write notes. We study the automation of medical code prediction using curriculum learning, which is a training strategy for machine learning models that gradually increases the hardness of the learning tasks from easy to difficult. One of the challenges in curriculum learning is the design of curricula -- i.e., in the sequential design of tasks that gradually increase in difficulty. We propose Hierarchical Curriculum Learning (HiCu), an algorithm that uses graph structure in the space of outputs to design curricula for multi-label classification. We create curricula for multi-label classification models that predict ICD diagnosis and procedure codes from natural language descriptions of patients. By leveraging the hierarchy of ICD codes, which groups diagnosis codes based on various organ systems in the human body, we find that our proposed curricula improve the generalization of neural network-based predictive models across recurrent, convolutional, and transformer-based architectures. Our code is available at https://github.com/wren93/HiCu-ICD. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: To appear at Machine Learning for Healthcare Conference (MLHC2022)

arXiv:2206.02647 [pdf, other]

Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning

Authors: Richard J. Chen, Chengkuan Chen, Yicong Li, Tiffany Y. Chen, Andrew D. Trister, Rahul G. Krishnan, Faisal Mahmood

Abstract: Vision Transformers (ViTs) and their multi-scale and hierarchical variations have been successful at capturing image representations but their use has been generally studied for low-resolution images (e.g. - 256x256, 384384). For gigapixel whole-slide imaging (WSI) in computational pathology, WSIs can be as large as 150000x150000 pixels at 20X magnification and exhibit a hierarchical structure of… ▽ More Vision Transformers (ViTs) and their multi-scale and hierarchical variations have been successful at capturing image representations but their use has been generally studied for low-resolution images (e.g. - 256x256, 384384). For gigapixel whole-slide imaging (WSI) in computational pathology, WSIs can be as large as 150000x150000 pixels at 20X magnification and exhibit a hierarchical structure of visual tokens across varying resolutions: from 16x16 images capture spatial patterns among cells, to 4096x4096 images characterizing interactions within the tissue microenvironment. We introduce a new ViT architecture called the Hierarchical Image Pyramid Transformer (HIPT), which leverages the natural hierarchical structure inherent in WSIs using two levels of self-supervised learning to learn high-resolution image representations. HIPT is pretrained across 33 cancer types using 10,678 gigapixel WSIs, 408,218 4096x4096 images, and 104M 256x256 images. We benchmark HIPT representations on 9 slide-level tasks, and demonstrate that: 1) HIPT with hierarchical pretraining outperforms current state-of-the-art methods for cancer subtyping and survival prediction, 2) self-supervised ViTs are able to model important inductive biases about the hierarchical structure of phenotypes in the tumor microenvironment. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: Accepted to CVPR 2022 (Oral)

arXiv:2204.08324 [pdf, other]

Hierarchical Optimal Transport for Comparing Histopathology Datasets

Authors: Anna Yeaton, Rahul G. Krishnan, Rebecca Mieloszyk, David Alvarez-Melis, Grace Huynh

Abstract: Scarcity of labeled histopathology data limits the applicability of deep learning methods to under-profiled cancer types and labels. Transfer learning allows researchers to overcome the limitations of small datasets by pre-training machine learning models on larger datasets similar to the small target dataset. However, similarity between datasets is often determined heuristically. In this paper, w… ▽ More Scarcity of labeled histopathology data limits the applicability of deep learning methods to under-profiled cancer types and labels. Transfer learning allows researchers to overcome the limitations of small datasets by pre-training machine learning models on larger datasets similar to the small target dataset. However, similarity between datasets is often determined heuristically. In this paper, we propose a principled notion of distance between histopathology datasets based on a hierarchical generalization of optimal transport distances. Our method does not require any training, is agnostic to model type, and preserves much of the hierarchical structure in histopathology datasets imposed by tiling. We apply our method to H&E stained slides from The Cancer Genome Atlas from six different cancer types. We show that our method outperforms a baseline distance in a cancer-type prediction task. Our results also show that our optimal transport distance predicts difficulty of transferability in a tumor vs.normal prediction setting. △ Less

Submitted 20 April, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

arXiv:2204.05229 [pdf, other]

Mixture-of-experts VAEs can disregard variation in surjective multimodal data

Authors: Jannik Wolff, Tassilo Klein, Moin Nabi, Rahul G. Krishnan, Shinichi Nakajima

Abstract: Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed multimodal variational autoencoders (VAEs) that generate several modalities. We consider subjective data, where single datapoints from one modality (such as class labels) describe multi… ▽ More Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed multimodal variational autoencoders (VAEs) that generate several modalities. We consider subjective data, where single datapoints from one modality (such as class labels) describe multiple datapoints from another modality (such as images). We theoretically and empirically demonstrate that multimodal VAEs with a mixture of experts posterior can struggle to capture variability in such surjective data. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted at the NeurIPS 2021 workshop on Bayesian Deep Learning

arXiv:2203.00585 [pdf, other]

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology

Authors: Richard J. Chen, Rahul G. Krishnan

Abstract: Tissue phenotyping is a fundamental task in learning objective characterizations of histopathologic biomarkers within the tumor-immune microenvironment in cancer pathology. However, whole-slide imaging (WSI) is a complex computer vision in which: 1) WSIs have enormous image resolutions with precludes large-scale pixel-level efforts in data curation, and 2) diversity of morphological phenotypes res… ▽ More Tissue phenotyping is a fundamental task in learning objective characterizations of histopathologic biomarkers within the tumor-immune microenvironment in cancer pathology. However, whole-slide imaging (WSI) is a complex computer vision in which: 1) WSIs have enormous image resolutions with precludes large-scale pixel-level efforts in data curation, and 2) diversity of morphological phenotypes results in inter- and intra-observer variability in tissue labeling. To address these limitations, current efforts have proposed using pretrained image encoders (transfer learning from ImageNet, self-supervised pretraining) in extracting morphological features from pathology, but have not been extensively validated. In this work, we conduct a search for good representations in pathology by training a variety of self-supervised models with validation on a variety of weakly-supervised and patch-level tasks. Our key finding is in discovering that Vision Transformers using DINO-based knowledge distillation are able to learn data-efficient and interpretable features in histology images wherein the different attention heads learn distinct morphological phenotypes. We make evaluation code and pretrained weights publicly-available at: https://github.com/Richarizardd/Self-Supervised-ViT-Path. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: Learning Meaningful Representations of Life (NeurIPS 2021)

arXiv:2110.14993 [pdf, other]

Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Authors: Rickard K. A. Karlsson, Martin Willbo, Zeshan Hussain, Rahul G. Krishnan, David Sontag, Fredrik D. Johansson

Abstract: We study prediction of future outcomes with supervised models that use privileged information during learning. The privileged information comprises samples of time series observed between the baseline time of prediction and the future outcome; this information is only available at training time which differs from the traditional supervised learning. Our question is when using this privileged data… ▽ More We study prediction of future outcomes with supervised models that use privileged information during learning. The privileged information comprises samples of time series observed between the baseline time of prediction and the future outcome; this information is only available at training time which differs from the traditional supervised learning. Our question is when using this privileged data leads to more sample-efficient learning of models that use only baseline data for predictions at test time. We give an algorithm for this setting and prove that when the time series are drawn from a non-stationary Gaussian-linear dynamical system of fixed horizon, learning with privileged information is more efficient than learning without it. On synthetic data, we test the limits of our algorithm and theory, both when our assumptions hold and when they are violated. On three diverse real-world datasets, we show that our approach is generally preferable to classical learning, particularly when data is scarce. Finally, we relate our estimator to a distillation approach both theoretically and empirically. △ Less

Submitted 5 May, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

Journal ref: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:5459-5484, 2022

arXiv:2102.11218 [pdf, other]

Neural Pharmacodynamic State Space Modeling

Authors: Zeshan Hussain, Rahul G. Krishnan, David Sontag

Abstract: Modeling the time-series of high-dimensional, longitudinal data is important for predicting patient disease progression. However, existing neural network based approaches that learn representations of patient state, while very flexible, are susceptible to overfitting. We propose a deep generative model that makes use of a novel attention-based neural architecture inspired by the physics of how tre… ▽ More Modeling the time-series of high-dimensional, longitudinal data is important for predicting patient disease progression. However, existing neural network based approaches that learn representations of patient state, while very flexible, are susceptible to overfitting. We propose a deep generative model that makes use of a novel attention-based neural architecture inspired by the physics of how treatments affect disease state. The result is a scalable and accurate model of high-dimensional patient biomarkers as they vary over time. Our proposed model yields significant improvements in generalization and, on real-world clinical data, provides interpretable insights into the dynamics of cancer progression. △ Less

Submitted 17 June, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

Comments: To appear at the International Conference on Machine Learning (ICML) 2021

arXiv:2102.07005 [pdf, other]

Clustering Interval-Censored Time-Series for Disease Phenotyping

Authors: Irene Y. Chen, Rahul G. Krishnan, David Sontag

Abstract: Unsupervised learning is often used to uncover clusters in data. However, different kinds of noise may impede the discovery of useful patterns from real-world time-series data. In this work, we focus on mitigating the interference of interval censoring in the task of clustering for disease phenotyping. We develop a deep generative, continuous-time model of time-series data that clusters time-serie… ▽ More Unsupervised learning is often used to uncover clusters in data. However, different kinds of noise may impede the discovery of useful patterns from real-world time-series data. In this work, we focus on mitigating the interference of interval censoring in the task of clustering for disease phenotyping. We develop a deep generative, continuous-time model of time-series data that clusters time-series while correcting for censorship time. We provide conditions under which clusters and the amount of delayed entry may be identified from data under a noiseless model. On synthetic data, we demonstrate accurate, stable, and interpretable results that outperform several benchmarks. On real-world clinical datasets of heart failure and Parkinson's disease patients, we study how interval censoring can adversely affect the task of disease phenotyping. Our model corrects for this source of error and recovers known clinical subtypes. △ Less

Submitted 5 December, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

Comments: AAAI 2022

arXiv:1802.05814 [pdf, other]

Variational Autoencoders for Collaborative Filtering

Authors: Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, Tony Jebara

Abstract: We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread… ▽ More We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements. △ Less

Submitted 15 February, 2018; originally announced February 2018.

Comments: 10 pages, 3 figures. WWW 2018

arXiv:1710.06085 [pdf, other]

On the challenges of learning with inference networks on sparse, high-dimensional data

Authors: Rahul G. Krishnan, Dawen Liang, Matthew Hoffman

Abstract: We study parameter estimation in Nonlinear Factor Analysis (NFA) where the generative model is parameterized by a deep neural network. Recent work has focused on learning such models using inference (or recognition) networks; we identify a crucial problem when modeling large, sparse, high-dimensional datasets -- underfitting. We study the extent of underfitting, highlighting that its severity incr… ▽ More We study parameter estimation in Nonlinear Factor Analysis (NFA) where the generative model is parameterized by a deep neural network. Recent work has focused on learning such models using inference (or recognition) networks; we identify a crucial problem when modeling large, sparse, high-dimensional datasets -- underfitting. We study the extent of underfitting, highlighting that its severity increases with the sparsity of the data. We propose methods to tackle it via iterative optimization inspired by stochastic variational inference \citep{hoffman2013stochastic} and improvements in the sparse data representation used for inference. The proposed techniques drastically improve the ability of these powerful models to fit sparse data, achieving state-of-the-art results on a benchmark text-count dataset and excellent results on the task of top-N recommendation. △ Less

Submitted 17 October, 2017; originally announced October 2017.

Comments: 14 pages, 3 tables, 11 figures

arXiv:1609.09869 [pdf, other]

Structured Inference Networks for Nonlinear State Space Models

Authors: Rahul G. Krishnan, Uri Shalit, David Sontag

Abstract: Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are mode… ▽ More Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood. △ Less

Submitted 5 December, 2016; v1 submitted 30 September, 2016; originally announced September 2016.

Comments: To appear in the Thirty-First AAAI Conference on Artificial Intelligence, February 2017, 13 pages, 11 figures with supplement, changed to AAAI formatting style, added references

arXiv:1511.05121 [pdf, other]

Deep Kalman Filters

Authors: Rahul G. Krishnan, Uri Shalit, David Sontag

Abstract: Kalman Filters are one of the most influential models of time-varying phenomena. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption in a variety of disciplines. Motivated by recent variational methods for learning deep generative models, we introduce a unified algorithm to efficiently learn a broad spectrum of Kalman filters. Of parti… ▽ More Kalman Filters are one of the most influential models of time-varying phenomena. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption in a variety of disciplines. Motivated by recent variational methods for learning deep generative models, we introduce a unified algorithm to efficiently learn a broad spectrum of Kalman filters. Of particular interest is the use of temporal generative models for counterfactual inference. We investigate the efficacy of such models for counterfactual inference, and to that end we introduce the "Healing MNIST" dataset where long-term structure, noise and actions are applied to sequences of digits. We show the efficacy of our method for modeling this dataset. We further show how our model can be used for counterfactual inference for patients, based on electronic health record data of 8,000 patients over 4.5 years. △ Less

Submitted 25 November, 2015; v1 submitted 16 November, 2015; originally announced November 2015.

Comments: 17 pages, 14 figures: Fixed typo in Fig. 1(b) and added reference

arXiv:1511.02124 [pdf, other]

Barrier Frank-Wolfe for Marginal Inference

Authors: Rahul G. Krishnan, Simon Lacoste-Julien, David Sontag

Abstract: We introduce a globally-convergent algorithm for optimizing the tree-reweighted (TRW) variational objective over the marginal polytope. The algorithm is based on the conditional gradient method (Frank-Wolfe) and moves pseudomarginals within the marginal polytope through repeated maximum a posteriori (MAP) calls. This modular structure enables us to leverage black-box MAP solvers (both exact and ap… ▽ More We introduce a globally-convergent algorithm for optimizing the tree-reweighted (TRW) variational objective over the marginal polytope. The algorithm is based on the conditional gradient method (Frank-Wolfe) and moves pseudomarginals within the marginal polytope through repeated maximum a posteriori (MAP) calls. This modular structure enables us to leverage black-box MAP solvers (both exact and approximate) for variational inference, and obtains more accurate results than tree-reweighted algorithms that optimize over the local consistency relaxation. Theoretically, we bound the sub-optimality for the proposed algorithm despite the TRW objective having unbounded gradients at the boundary of the marginal polytope. Empirically, we demonstrate the increased quality of results found by tightening the relaxation over the marginal polytope as well as the spanning tree polytope on synthetic and real-world instances. △ Less

Submitted 25 November, 2015; v1 submitted 6 November, 2015; originally announced November 2015.

Comments: 25 pages, 12 figures, To appear in Neural Information Processing Systems (NIPS) 2015, Corrected reference and cleaned up bibliography

Showing 1–30 of 30 results for author: Krishnan, R G