Search | arXiv e-print repository

Query-Guided Self-Supervised Summarization of Nursing Notes

Authors: Ya Gao, Hans Moen, Saila Koivusalo, Miika Koskinen, Pekka Marttinen

Abstract: Nursing notes, an important component of Electronic Health Records (EHRs), keep track of the progression of a patient's health status during a care episode. Distilling the key information in nursing notes through text summarization techniques can improve clinicians' efficiency in understanding patients' conditions when reviewing nursing notes. However, existing abstractive summarization methods in… ▽ More Nursing notes, an important component of Electronic Health Records (EHRs), keep track of the progression of a patient's health status during a care episode. Distilling the key information in nursing notes through text summarization techniques can improve clinicians' efficiency in understanding patients' conditions when reviewing nursing notes. However, existing abstractive summarization methods in the clinical setting have often overlooked nursing notes and require the creation of reference summaries for supervision signals, which is time-consuming. In this work, we introduce QGSumm, a query-guided self-supervised domain adaptation framework for nursing note summarization. Using patient-related clinical queries as guidance, our approach generates high-quality, patient-centered summaries without relying on reference summaries for training. Through automatic and manual evaluation by an expert clinician, we demonstrate the strengths of our approach compared to the state-of-the-art Large Language Models (LLMs) in both zero-shot and few-shot settings. Ultimately, our approach provides a new perspective on conditional text summarization, tailored to the specific interests of clinical personnel. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.03337 [pdf, other]

Identifying latent state transition in non-linear dynamical systems

Authors: Çağlar Hızlı, Çağatay Yıldız, Matthias Bethge, ST John, Pekka Marttinen

Abstract: This work aims to improve generalization and interpretability of dynamical systems by recovering the underlying lower-dimensional latent states and their time evolutions. Previous work on disentangled representation learning within the realm of dynamical systems focused on the latent states, possibly with linear transition approximations. As such, they cannot identify nonlinear transition dynamics… ▽ More This work aims to improve generalization and interpretability of dynamical systems by recovering the underlying lower-dimensional latent states and their time evolutions. Previous work on disentangled representation learning within the realm of dynamical systems focused on the latent states, possibly with linear transition approximations. As such, they cannot identify nonlinear transition dynamics, and hence fail to reliably predict complex future behavior. Inspired by the advances in nonlinear ICA, we propose a state-space modeling framework in which we can identify not just the latent states but also the unknown transition function that maps the past states to the present. We introduce a practical algorithm based on variational auto-encoders and empirically demonstrate in realistic synthetic settings that we can (i) recover latent state dynamics with high accuracy, (ii) correspondingly achieve high future prediction accuracy, and (iii) adapt fast to new environments. △ Less

Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.20003 [pdf, other]

Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities

Authors: Alexander Nikitin, Jannik Kossen, Yarin Gal, Pekka Marttinen

Abstract: Uncertainty quantification in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. In particular, uncertainty can be used to improve the trustworthiness of LLMs by detecting factually incorrect model responses, commonly called hallucinations. Critically, one should seek to capture the model's semantic uncertainty, i.e., the uncertainty over the meani… ▽ More Uncertainty quantification in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. In particular, uncertainty can be used to improve the trustworthiness of LLMs by detecting factually incorrect model responses, commonly called hallucinations. Critically, one should seek to capture the model's semantic uncertainty, i.e., the uncertainty over the meanings of LLM outputs, rather than uncertainty over lexical or syntactic variations that do not affect answer correctness. To address this problem, we propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs. KLE defines positive semidefinite unit trace kernels to encode the semantic similarities of LLM outputs and quantifies uncertainty using the von Neumann entropy. It considers pairwise semantic dependencies between answers (or semantic clusters), providing more fine-grained uncertainty estimates than previous methods based on hard clustering of answers. We theoretically prove that KLE generalizes the previous state-of-the-art method called semantic entropy and empirically demonstrate that it improves uncertainty quantification performance across multiple natural language generation datasets and LLM architectures. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19988 [pdf, other]

Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics

Authors: Minttu Alakuijala, Reginald McLean, Isaac Woungang, Nariman Farsad, Samuel Kaski, Pekka Marttinen, Kai Yuan

Abstract: Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from su… ▽ More Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from substantial amounts of external observation-only data, and only the latter depends on a specific robot embodiment. To this end, we propose Video-Language Critic, a reward model that can be trained on readily available cross-embodiment data using contrastive learning and a temporal ranking objective, and use it to score behavior traces from a separate reinforcement learning actor. When trained on Open X-Embodiment data, our reward model enables 2x more sample-efficient policy training on Meta-World tasks than a sparse reward only, despite a significant domain gap. Using in-domain data but in a challenging task generalization setting on Meta-World, we further demonstrate more sample-efficient training than is possible with prior language-conditioned reward models that are either trained with binary classification, use static images, or do not leverage the temporal information present in video data. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 10 pages in the main text, 16 pages including references and supplementary materials. 4 figures and 3 tables in the main text, 1 table in supplementary materials

arXiv:2405.15383 [pdf, other]

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

Authors: Nicola Dainese, Matteo Merler, Minttu Alakuijala, Pekka Marttinen

Abstract: In this work we consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has the advantages of being precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructio… ▽ More In this work we consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has the advantages of being precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructions, to generate exact code with non-trivial logic and to self-debug a long program with feedback from unit tests and environment trajectories. To address these challenges, we propose Generate, Improve and Fix with Monte Carlo Tree Search (GIF-MCTS), a new code generation strategy for LLMs. To test our approach, we introduce the Code World Models Benchmark (CWMB), a suite of program synthesis and planning tasks comprised of 18 diverse RL environments paired with corresponding textual descriptions and curated trajectories. GIF-MCTS surpasses all baselines on the CWMB and two other benchmarks, and we show that the Code World Models synthesized with it can be successfully used for planning, resulting in model-based RL agents with greatly improved sample efficiency and inference speed. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 10 pages in main text, 24 pages including references and supplementary materials. 2 figures and 3 tables in the main text, 9 figures and 12 tables when including the supplementary materials

arXiv:2405.07097 [pdf, other]

Diffusion models as probabilistic neural operators for recovering unobserved states of dynamical systems

Authors: Katsiaryna Haitsiukevich, Onur Poyraz, Pekka Marttinen, Alexander Ilin

Abstract: This paper explores the efficacy of diffusion-based generative models as neural operators for partial differential equations (PDEs). Neural operators are neural networks that learn a mapping from the parameter space to the solution space of PDEs from data, and they can also solve the inverse problem of estimating the parameter from the solution. Diffusion models excel in many domains, but their po… ▽ More This paper explores the efficacy of diffusion-based generative models as neural operators for partial differential equations (PDEs). Neural operators are neural networks that learn a mapping from the parameter space to the solution space of PDEs from data, and they can also solve the inverse problem of estimating the parameter from the solution. Diffusion models excel in many domains, but their potential as neural operators has not been thoroughly explored. In this work, we show that diffusion-based generative models exhibit many properties favourable for neural operators, and they can effectively generate the solution of a PDE conditionally on the parameter or recover the unobserved parts of the system. We propose to train a single model adaptable to multiple tasks, by alternating between the tasks during training. In our experiments with multiple realistic dynamical systems, diffusion models outperform other neural operators. Furthermore, we demonstrate how the probabilistic diffusion model can elegantly deal with systems which are only partially identifiable, by producing samples corresponding to the different possible solutions. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Preprint submitted to IEEE MLSP 2024

Journal ref: IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024

arXiv:2404.19094 [pdf, other]

In-Context Symbolic Regression: Leveraging Large Language Models for Function Discovery

Authors: Matteo Merler, Katsiaryna Haitsiukevich, Nicola Dainese, Pekka Marttinen

Abstract: State of the art Symbolic Regression (SR) methods currently build specialized models, while the application of Large Language Models (LLMs) remains largely unexplored. In this work, we introduce the first comprehensive framework that utilizes LLMs for the task of SR. We propose In-Context Symbolic Regression (ICSR), an SR method which iteratively refines a functional form with an LLM and determine… ▽ More State of the art Symbolic Regression (SR) methods currently build specialized models, while the application of Large Language Models (LLMs) remains largely unexplored. In this work, we introduce the first comprehensive framework that utilizes LLMs for the task of SR. We propose In-Context Symbolic Regression (ICSR), an SR method which iteratively refines a functional form with an LLM and determines its coefficients with an external optimizer. ICSR leverages LLMs' strong mathematical prior both to propose an initial set of possible functions given the observations and to refine them based on their errors. Our findings reveal that LLMs are able to successfully find symbolic equations that fit the given data, matching or outperforming the overall performance of the best SR baselines on four popular benchmarks, while yielding simpler equations with better out of distribution generalization. △ Less

Submitted 17 July, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: 18 pages, 11 figures

Journal ref: ACL Student Research Workshop 2024

arXiv:2403.10153 [pdf, other]

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Authors: Yogesh Kumar, Pekka Marttinen

Abstract: We introduce eCLIP, an enhanced version of the CLIP model that integrates expert annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in contrastive multi-modal medical imaging analysis, notably data scarcity and the "modality gap" -- a significant disparity between image and text embeddings that diminishes the quality of representations and hampers cross-modal inter… ▽ More We introduce eCLIP, an enhanced version of the CLIP model that integrates expert annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in contrastive multi-modal medical imaging analysis, notably data scarcity and the "modality gap" -- a significant disparity between image and text embeddings that diminishes the quality of representations and hampers cross-modal interoperability. eCLIP integrates a heatmap processor and leverages mixup augmentation to efficiently utilize the scarce expert annotations, thus boosting the model's learning effectiveness. eCLIP is designed to be generally applicable to any variant of CLIP without requiring any modifications of the core architecture. Through detailed evaluations across several tasks, including zero-shot inference, linear probing, cross-modal retrieval, and Retrieval Augmented Generation (RAG) of radiology reports using a frozen Large Language Model, eCLIP showcases consistent improvements in embedding quality. The outcomes reveal enhanced alignment and uniformity, affirming eCLIP's capability to harness high-quality annotations for enriched multi-modal analysis in the medical imaging domain. △ Less

Submitted 15 July, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted to ECCV 2024

arXiv:2311.07867 [pdf, other]

Mixture of Coupled HMMs for Robust Modeling of Multivariate Healthcare Time Series

Authors: Onur Poyraz, Pekka Marttinen

Abstract: Analysis of multivariate healthcare time series data is inherently challenging: irregular sampling, noisy and missing values, and heterogeneous patient groups with different dynamics violating exchangeability. In addition, interpretability and quantification of uncertainty are critically important. Here, we propose a novel class of models, a mixture of coupled hidden Markov models (M-CHMM), and de… ▽ More Analysis of multivariate healthcare time series data is inherently challenging: irregular sampling, noisy and missing values, and heterogeneous patient groups with different dynamics violating exchangeability. In addition, interpretability and quantification of uncertainty are critically important. Here, we propose a novel class of models, a mixture of coupled hidden Markov models (M-CHMM), and demonstrate how it elegantly overcomes these challenges. To make the model learning feasible, we derive two algorithms to sample the sequences of the latent variables in the CHMM: samplers based on (i) particle filtering and (ii) factorized approximation. Compared to existing inference methods, our algorithms are computationally tractable, improve mixing, and allow for likelihood estimation, which is necessary to learn the mixture model. Experiments on challenging real-world epidemiological and semi-synthetic data demonstrate the advantages of the M-CHMM: improved data fit, capacity to efficiently handle missing and noisy measurements, improved prediction accuracy, and ability to identify interpretable subsets in the data. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 9 pages, 7 figures, Proceedings of Machine Learning Research, Machine Learning for Health (ML4H) 2023

arXiv:2311.03129 [pdf, other]

Nonparametric modeling of the composite effect of multiple nutrients on blood glucose dynamics

Authors: Arina Odnoblyudova, Çağlar Hizli, ST John, Andrea Cognolato, Anne Juuti, Simo Särkkä, Kirsi Pietiläinen, Pekka Marttinen

Abstract: In biomedical applications it is often necessary to estimate a physiological response to a treatment consisting of multiple components, and learn the separate effects of the components in addition to the joint effect. Here, we extend existing probabilistic nonparametric approaches to explicitly address this problem. We also develop a new convolution-based model for composite treatment-response cur… ▽ More In biomedical applications it is often necessary to estimate a physiological response to a treatment consisting of multiple components, and learn the separate effects of the components in addition to the joint effect. Here, we extend existing probabilistic nonparametric approaches to explicitly address this problem. We also develop a new convolution-based model for composite treatment-response curves that is more biologically interpretable. We validate our models by estimating the impact of carbohydrate and fat in meals on blood glucose. By differentiating treatment components, incorporating their dosages, and sharing statistical information across patients via a hierarchical multi-output Gaussian process, our method improves prediction accuracy over existing approaches, and allows us to interpret the different effects of carbohydrates and fat on the overall glucose response. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2309.07437 [pdf]

doi 10.2196/43979

Supporting Management of Gestational Diabetes with Comprehensive Self-Tracking: Mixed-Method Study of Wearable Sensors

Authors: Mikko Kytö, Saila Koivusalo, Heli Tuomonen, Lisbeth Strömberg, Antti Ruonala, Pekka Marttinen, Seppo Heinonen, Giulio Jacucci

Abstract: Gestational diabetes (GDM) poses a growing health risk to both pregnant women and their offspring. While telehealth interventions for GDM management have proven effective, they have traditionally relied on healthcare professionals for guidance and feedback. Our aim was to explore self-tracking in GDM with wearable sensors from self-discovery (i.e., learning associations between glucose levels and… ▽ More Gestational diabetes (GDM) poses a growing health risk to both pregnant women and their offspring. While telehealth interventions for GDM management have proven effective, they have traditionally relied on healthcare professionals for guidance and feedback. Our aim was to explore self-tracking in GDM with wearable sensors from self-discovery (i.e., learning associations between glucose levels and lifestyle) and user experience perspectives. We conducted a mixed-methods study with women diagnosed with GDM, utilizing continuous glucose monitor and three types of physical activity sensors (activity bracelet, hip-worn sensor, and electrocardiography sensor) for a week. Data from the sensors was collected, and participants were later interviewed about their experience with the wearable sensors. Additionally, we gathered maternal nutrition data through a 3-day food diary and recorded self-reported physical activity using a logbook. We discovered that continuous glucose monitors were especially valuable for self-discovery, particularly when establishing links between glucose levels and nutritional intake. Challenges associated with using wearable sensors data for self-discovery in GDM included: (1) Separation of glucose and physical activity data in different applications, (2) Missing key trackable features, such as light physical activity and non-walking activities, (3) Discrepancies in data, and (4) Differences in perceived versus measured physical activity. The placement of sensors on the body emerged as a critical factor influencing data quality and personal preferences. To conclude, an app where glucose, nutrition, and physical activity data are combined is needed to support self-discovery. This app should enable tracking of essential features for women with GDM, including light physical activity, with data originating from a single sensor to ensure consistency and eliminate redundancy. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.06009 [pdf, other]

Content Reduction, Surprisal and Information Density Estimation for Long Documents

Authors: Shaoxiong Ji, Wei Sun, Pekka Marttinen

Abstract: Many computational linguistic methods have been proposed to study the information content of languages. We consider two interesting research questions: 1) how is information distributed over long documents, and 2) how does content reduction, such as token selection and text summarization, affect the information density in long documents. We present four criteria for information density estimation… ▽ More Many computational linguistic methods have been proposed to study the information content of languages. We consider two interesting research questions: 1) how is information distributed over long documents, and 2) how does content reduction, such as token selection and text summarization, affect the information density in long documents. We present four criteria for information density estimation for long documents, including surprisal, entropy, uniform information density, and lexical density. Among those criteria, the first three adopt the measures from information theory. We propose an attention-based word selection method for clinical notes and study machine summarization for multiple-domain documents. Our findings reveal the systematic difference in information density of long text in various domains. Empirical results on automated medical coding from long clinical notes show the effectiveness of the attention-based word selection method. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2306.10614 [pdf, other]

Identifiable causal inference with noisy treatment and no side information

Authors: Antti Pöllänen, Pekka Marttinen

Abstract: In some causal inference scenarios, the treatment variable is measured inaccurately, for instance in epidemiology or econometrics. Failure to correct for the effect of this measurement error can lead to biased causal effect estimates. Previous research has not studied methods that address this issue from a causal viewpoint while allowing for complex nonlinear dependencies and without assuming acce… ▽ More In some causal inference scenarios, the treatment variable is measured inaccurately, for instance in epidemiology or econometrics. Failure to correct for the effect of this measurement error can lead to biased causal effect estimates. Previous research has not studied methods that address this issue from a causal viewpoint while allowing for complex nonlinear dependencies and without assuming access to side information. For such a scenario, this study proposes a model that assumes a continuous treatment variable that is inaccurately measured. Building on existing results for measurement error models, we prove that our model's causal effect estimates are identifiable, even without knowledge of the measurement error variance or other side information. Our method relies on a deep latent variable model in which Gaussian conditionals are parameterized by neural networks, and we develop an amortized importance-weighted variational objective for training the model. Empirical results demonstrate the method's good performance with unknown measurement error. More broadly, our work extends the range of applications in which reliable causal inference can be conducted. △ Less

Submitted 4 May, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: 18 pages, 10 figures. Changes consist of polishing the original version. The experiments and results remain the same

MSC Class: 68T37

arXiv:2306.09656 [pdf, other]

Temporal Causal Mediation through a Point Process: Direct and Indirect Effects of Healthcare Interventions

Authors: Çağlar Hızlı, ST John, Anne Juuti, Tuure Saarinen, Kirsi Pietiläinen, Pekka Marttinen

Abstract: Deciding on an appropriate intervention requires a causal model of a treatment, the outcome, and potential mediators. Causal mediation analysis lets us distinguish between direct and indirect effects of the intervention, but has mostly been studied in a static setting. In healthcare, data come in the form of complex, irregularly sampled time-series, with dynamic interdependencies between a treatme… ▽ More Deciding on an appropriate intervention requires a causal model of a treatment, the outcome, and potential mediators. Causal mediation analysis lets us distinguish between direct and indirect effects of the intervention, but has mostly been studied in a static setting. In healthcare, data come in the form of complex, irregularly sampled time-series, with dynamic interdependencies between a treatment, outcomes, and mediators across time. Existing approaches to dynamic causal mediation analysis are limited to regular measurement intervals, simple parametric models, and disregard long-range mediator--outcome interactions. To address these limitations, we propose a non-parametric mediator--outcome model where the mediator is assumed to be a temporal point process that interacts with the outcome process. With this model, we estimate the direct and indirect effects of an external intervention on the outcome, showing how each of these affects the whole future trajectory. We demonstrate on semi-synthetic data that our method can accurately estimate direct and indirect effects. On real-world healthcare data, our model infers clinically meaningful direct and indirect effect trajectories for blood glucose after a surgery. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2305.01663 [pdf, other]

A Novel Deep Learning based Model for Erythrocytes Classification and Quantification in Sickle Cell Disease

Authors: Manish Bhatia, Balram Meena, Vipin Kumar Rathi, Prayag Tiwari, Amit Kumar Jaiswal, Shagaf M Ansari, Ajay Kumar, Pekka Marttinen

Abstract: The shape of erythrocytes or red blood cells is altered in several pathological conditions. Therefore, identifying and quantifying different erythrocyte shapes can help diagnose various diseases and assist in designing a treatment strategy. Machine Learning (ML) can be efficiently used to identify and quantify distorted erythrocyte morphologies. In this paper, we proposed a customized deep convolu… ▽ More The shape of erythrocytes or red blood cells is altered in several pathological conditions. Therefore, identifying and quantifying different erythrocyte shapes can help diagnose various diseases and assist in designing a treatment strategy. Machine Learning (ML) can be efficiently used to identify and quantify distorted erythrocyte morphologies. In this paper, we proposed a customized deep convolutional neural network (CNN) model to classify and quantify the distorted and normal morphology of erythrocytes from the images taken from the blood samples of patients suffering from Sickle cell disease ( SCD). We chose SCD as a model disease condition due to the presence of diverse erythrocyte morphologies in the blood samples of SCD patients. For the analysis, we used 428 raw microscopic images of SCD blood samples and generated the dataset consisting of 10, 377 single-cell images. We focused on three well-defined erythrocyte shapes, including discocytes, oval, and sickle. We used 18 layered deep CNN architecture to identify and quantify these shapes with 81% accuracy, outperforming other models. We also used SHAP and LIME for further interpretability. The proposed model can be helpful for the quick and accurate analysis of SCD blood samples by the clinicians and help them make the right decision for better management of SCD. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2303.10211 [pdf, other]

SITReg: Multi-resolution architecture for symmetric, inverse consistent, and topology preserving image registration

Authors: Joel Honkamaa, Pekka Marttinen

Abstract: Deep learning has emerged as a strong alternative for classical iterative methods for deformable medical image registration, where the goal is to find a mapping between the coordinate systems of two images. Popular classical image registration methods enforce the useful inductive biases of symmetricity, inverse consistency, and topology preservation by construct. However, while many deep learning… ▽ More Deep learning has emerged as a strong alternative for classical iterative methods for deformable medical image registration, where the goal is to find a mapping between the coordinate systems of two images. Popular classical image registration methods enforce the useful inductive biases of symmetricity, inverse consistency, and topology preservation by construct. However, while many deep learning registration methods encourage these properties via loss functions, no earlier methods enforce all of them by construct. Here, we propose a novel registration architecture based on extracting multi-resolution feature representations which is by construct symmetric, inverse consistent, and topology preserving. We also develop an implicit layer for memory efficient inversion of the deformation fields. Our method achieves state-of-the-art registration accuracy on two datasets. △ Less

Submitted 30 January, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

arXiv:2301.10451 [pdf, other]

Knowledge-augmented Graph Neural Networks with Concept-aware Attention for Adverse Drug Event Detection

Authors: Shaoxiong Ji, Ya Gao, Pekka Marttinen

Abstract: Adverse drug events (ADEs) are an important aspect of drug safety. Various texts such as biomedical literature, drug reviews, and user posts on social media and medical forums contain a wealth of information about ADEs. Recent studies have applied word embedding and deep learning -based natural language processing to automate ADE detection from text. However, they did not explore incorporating exp… ▽ More Adverse drug events (ADEs) are an important aspect of drug safety. Various texts such as biomedical literature, drug reviews, and user posts on social media and medical forums contain a wealth of information about ADEs. Recent studies have applied word embedding and deep learning -based natural language processing to automate ADE detection from text. However, they did not explore incorporating explicit medical knowledge about drugs and adverse reactions or the corresponding feature learning. This paper adopts the heterogenous text graph which describes relationships between documents, words and concepts, augments it with medical knowledge from the Unified Medical Language System, and proposes a concept-aware attention mechanism which learns features differently for the different types of nodes in the graph. We further utilize contextualized embeddings from pretrained language models and convolutional graph neural networks for effective feature representation and relational learning. Experiments on four public datasets show that our model achieves performance competitive to the recent advances and the concept-aware attention consistently outperforms other attention mechanisms. △ Less

Submitted 18 May, 2024; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: LREC-COLING 2024

arXiv:2211.07413 [pdf, other]

doi 10.1098/rsif.2021.0916

Modeling MRSA decolonization: Interactions between body sites and the impact of site-specific clearance

Authors: Onur Poyraz, Mohamad R. A. Sater, Loren G. Miller, James A. Mckinnell, Susan S. Huang, Yonatan H. Grad, Pekka Marttinen

Abstract: MRSA colonization is a critical public health concern. Decolonization protocols have been designed for the clearance of MRSA. Successful decolonization protocols reduce disease incidence; however, multiple protocols exist, comprising diverse therapies targeting multiple body sites, and the optimal protocol is unclear. Here, we formulate a machine learning model using data from a randomized control… ▽ More MRSA colonization is a critical public health concern. Decolonization protocols have been designed for the clearance of MRSA. Successful decolonization protocols reduce disease incidence; however, multiple protocols exist, comprising diverse therapies targeting multiple body sites, and the optimal protocol is unclear. Here, we formulate a machine learning model using data from a randomized controlled trial (RCT) of MRSA decolonization, which estimates interactions between body sites, quantifies the contribution of each therapy to successful decolonization, and enables predictions of the efficacy of therapy combinations. This work shows how a machine learning model can help design and improve complex clinical protocols. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 12 pages

Journal ref: Journal of the Royal Society Interface 19 (2022) 191, 20210916

arXiv:2209.04142 [pdf, other]

Causal Modeling of Policy Interventions From Sequences of Treatments and Outcomes

Authors: Çağlar Hızlı, ST John, Anne Juuti, Tuure Saarinen, Kirsi Pietiläinen, Pekka Marttinen

Abstract: A treatment policy defines when and what treatments are applied to affect some outcome of interest. Data-driven decision-making requires the ability to predict what happens if a policy is changed. Existing methods that predict how the outcome evolves under different scenarios assume that the tentative sequences of future treatments are fixed in advance, while in practice the treatments are determi… ▽ More A treatment policy defines when and what treatments are applied to affect some outcome of interest. Data-driven decision-making requires the ability to predict what happens if a policy is changed. Existing methods that predict how the outcome evolves under different scenarios assume that the tentative sequences of future treatments are fixed in advance, while in practice the treatments are determined stochastically by a policy and may depend, for example, on the efficiency of previous treatments. Therefore, the current methods are not applicable if the treatment policy is unknown or a counterfactual analysis is needed. To handle these limitations, we model the treatments and outcomes jointly in continuous time, by combining Gaussian processes and point processes. Our model enables the estimation of a treatment policy from observational sequences of treatments and outcomes, and it can predict the interventional and counterfactual progression of the outcome after an intervention on the treatment policy (in contrast with the causal effect of a single treatment). We show with real-world and semi-synthetic data on blood glucose progression that our method can answer causal queries more accurately than existing alternatives. △ Less

Submitted 20 June, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

Comments: Accepted at ICML 2023

arXiv:2208.12491 [pdf, other]

doi 10.1016/j.media.2023.10294

Deformation equivariant cross-modality image synthesis with paired non-aligned training data

Authors: Joel Honkamaa, Umair Khan, Sonja Koivukoski, Mira Valkonen, Leena Latonen, Pekka Ruusuvuori, Pekka Marttinen

Abstract: Cross-modality image synthesis is an active research topic with multiple medical clinically relevant applications. Recently, methods allowing training with paired but misaligned data have started to emerge. However, no robust and well-performing methods applicable to a wide range of real world data sets exist. In this work, we propose a generic solution to the problem of cross-modality image synth… ▽ More Cross-modality image synthesis is an active research topic with multiple medical clinically relevant applications. Recently, methods allowing training with paired but misaligned data have started to emerge. However, no robust and well-performing methods applicable to a wide range of real world data sets exist. In this work, we propose a generic solution to the problem of cross-modality image synthesis with paired but non-aligned data by introducing new deformation equivariance encouraging loss functions. The method consists of joint training of an image synthesis network together with separate registration networks and allows adversarial training conditioned on the input even with misaligned data. The work lowers the bar for new clinical applications by allowing effortless training of cross-modality image synthesis networks for more difficult data sets. △ Less

Submitted 29 September, 2023; v1 submitted 26 August, 2022; originally announced August 2022.

Journal ref: Medical Image Analysis 90 (2023): 102940

arXiv:2207.01234 [pdf, other]

Incorporating functional summary information in Bayesian neural networks using a Dirichlet process likelihood approach

Authors: Vishnu Raj, Tianyu Cui, Markus Heinonen, Pekka Marttinen

Abstract: Bayesian neural networks (BNNs) can account for both aleatoric and epistemic uncertainty. However, in BNNs the priors are often specified over the weights which rarely reflects true prior knowledge in large and complex neural network architectures. We present a simple approach to incorporate prior knowledge in BNNs based on external summary information about the predicted classification probabilit… ▽ More Bayesian neural networks (BNNs) can account for both aleatoric and epistemic uncertainty. However, in BNNs the priors are often specified over the weights which rarely reflects true prior knowledge in large and complex neural network architectures. We present a simple approach to incorporate prior knowledge in BNNs based on external summary information about the predicted classification probabilities for a given dataset. The available summary information is incorporated as augmented data and modeled with a Dirichlet process, and we derive the corresponding \emph{Summary Evidence Lower BOund}. The approach is founded on Bayesian principles, and all hyperparameters have a proper probabilistic interpretation. We show how the method can inform the model about task difficulty and class imbalance. Extensive experiments show that, with negligible computational overhead, our method parallels and in many cases outperforms popular alternatives in accuracy, uncertainty calibration, and robustness against corruptions with both balanced and imbalanced data. △ Less

Submitted 24 January, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Accepted in AISTATS 2023

arXiv:2203.11279 [pdf, ps, other]

doi 10.1145/3524499

EEG based Emotion Recognition: A Tutorial and Review

Authors: Xiang Li, Yazhou Zhang, Prayag Tiwari, Dawei Song, Bin Hu, Meihong Yang, Zhigang Zhao, Neeraj Kumar, Pekka Marttinen

Abstract: Emotion recognition technology through analyzing the EEG signal is currently an essential concept in Artificial Intelligence and holds great potential in emotional health care, human-computer interaction, multimedia content recommendation, etc. Though there have been several works devoted to reviewing EEG-based emotion recognition, the content of these reviews needs to be updated. In addition, tho… ▽ More Emotion recognition technology through analyzing the EEG signal is currently an essential concept in Artificial Intelligence and holds great potential in emotional health care, human-computer interaction, multimedia content recommendation, etc. Though there have been several works devoted to reviewing EEG-based emotion recognition, the content of these reviews needs to be updated. In addition, those works are either fragmented in content or only focus on specific techniques adopted in this area but neglect the holistic perspective of the entire technical routes. Hence, in this paper, we review from the perspective of researchers who try to take the first step on this topic. We review the recent representative works in the EEG-based emotion recognition research and provide a tutorial to guide the researchers to start from the beginning. The scientific basis of EEG-based emotion recognition in the psychological and physiological levels is introduced. Further, we categorize these reviewed works into different technical routes and illustrate the theoretical basis and the research motivation, which will help the readers better understand why those techniques are studied and employed. At last, existing challenges and future investigations are also discussed in this paper, which guides the researchers to decide potential future research directions. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2202.00095 [pdf, other]

Deconfounded Representation Similarity for Comparison of Neural Networks

Authors: Tianyu Cui, Yogesh Kumar, Pekka Marttinen, Samuel Kaski

Abstract: Similarity metrics such as representational similarity analysis (RSA) and centered kernel alignment (CKA) have been used to compare layer-wise representations between neural networks. However, these metrics are confounded by the population structure of data items in the input space, leading to spuriously high similarity for even completely random neural networks and inconsistent domain relations i… ▽ More Similarity metrics such as representational similarity analysis (RSA) and centered kernel alignment (CKA) have been used to compare layer-wise representations between neural networks. However, these metrics are confounded by the population structure of data items in the input space, leading to spuriously high similarity for even completely random neural networks and inconsistent domain relations in transfer learning. We introduce a simple and generally applicable fix to adjust for the confounder with covariate adjustment regression, which retains the intuitive invariance properties of the original similarity measures. We show that deconfounding the similarity metrics increases the resolution of detecting semantically similar neural networks. Moreover, in real-world applications, deconfounding improves the consistency of representation similarities with domain similarities in transfer learning, and increases correlation with out-of-distribution accuracy. △ Less

Submitted 31 January, 2022; originally announced February 2022.

arXiv:2201.02797 [pdf, other]

doi 10.1145/3664615

A Unified Review of Deep Learning for Automated Medical Coding

Authors: Shaoxiong Ji, Wei Sun, Xiaobo Li, Hang Dong, Ara Taalas, Yijia Zhang, Honghan Wu, Esa Pitkänen, Pekka Marttinen

Abstract: Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents. Recent advances in deep learning and natural language processing have been widely applied to this task. However, deep learning-based medical coding lacks a unified view of the design of neural network architectures. This review p… ▽ More Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents. Recent advances in deep learning and natural language processing have been widely applied to this task. However, deep learning-based medical coding lacks a unified view of the design of neural network architectures. This review proposes a unified framework to provide a general understanding of the building blocks of medical coding models and summarizes recent advanced models under the proposed framework. Our unified framework decomposes medical coding into four main components, i.e., encoder modules for text feature extraction, mechanisms for building deep encoder architectures, decoder modules for transforming hidden representations into medical codes, and the usage of auxiliary information. Finally, we introduce the benchmarks and real-world usage and discuss key research challenges and future directions. △ Less

Submitted 10 May, 2024; v1 submitted 8 January, 2022; originally announced January 2022.

Comments: ACM Computing Surveys

arXiv:2109.03062 [pdf, other]

Patient Outcome and Zero-shot Diagnosis Prediction with Hypernetwork-guided Multitask Learning

Authors: Shaoxiong Ji, Pekka Marttinen

Abstract: Multitask deep learning has been applied to patient outcome prediction from text, taking clinical notes as input and training deep neural networks with a joint loss function of multiple tasks. However, the joint training scheme of multitask learning suffers from inter-task interference, and diagnosis prediction among the multiple tasks has the generalizability issue due to rare diseases or unseen… ▽ More Multitask deep learning has been applied to patient outcome prediction from text, taking clinical notes as input and training deep neural networks with a joint loss function of multiple tasks. However, the joint training scheme of multitask learning suffers from inter-task interference, and diagnosis prediction among the multiple tasks has the generalizability issue due to rare diseases or unseen diagnoses. To solve these challenges, we propose a hypernetwork-based approach that generates task-conditioned parameters and coefficients of multitask prediction heads to learn task-specific prediction and balance the multitask learning. We also incorporate semantic task information to improves the generalizability of our task-conditioned multitask model. Experiments on early and discharge notes extracted from the real-world MIMIC database show our method can achieve better performance on multitask patient outcome prediction than strong baselines in most cases. Besides, our method can effectively handle the scenario with limited information and improve zero-shot prediction on unseen diagnosis categories. △ Less

Submitted 25 January, 2023; v1 submitted 7 September, 2021; originally announced September 2021.

Comments: EACL 2023

arXiv:2109.02418 [pdf, other]

doi 10.1145/3563041

Multitask Balanced and Recalibrated Network for Medical Code Prediction

Authors: Wei Sun, Shaoxiong Ji, Erik Cambria, Pekka Marttinen

Abstract: Human coders assign standardized medical codes to clinical documents generated during patients' hospitalization, which is error-prone and labor-intensive. Automated medical coding approaches have been developed using machine learning methods such as deep neural networks. Nevertheless, automated medical coding is still challenging because of the imbalanced class problem, complex code association, a… ▽ More Human coders assign standardized medical codes to clinical documents generated during patients' hospitalization, which is error-prone and labor-intensive. Automated medical coding approaches have been developed using machine learning methods such as deep neural networks. Nevertheless, automated medical coding is still challenging because of the imbalanced class problem, complex code association, and noise in lengthy documents. To solve these issues, we propose a novel neural network called Multitask Balanced and Recalibrated Neural Network. Significantly, the multitask learning scheme shares the relationship knowledge between different code branches to capture the code association. A recalibrated aggregation module is developed by cascading convolutional blocks to extract high-level semantic features that mitigate the impact of noise in documents. Also, the cascaded structure of the recalibrated module can benefit the learning from lengthy notes. To solve the class imbalanced problem, we deploy the focal loss to redistribute the attention of low and high-frequency medical codes. Experimental results show that our proposed model outperforms competitive baselines on a real-world clinical dataset MIMIC-III. △ Less

Submitted 6 September, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

Comments: Accepted by ACM Transactions on Intelligent Systems and Technology (TIST)

arXiv:2108.13672 [pdf, other]

SANSformers: Self-Supervised Forecasting in Electronic Health Records with Attention-Free Models

Authors: Yogesh Kumar, Alexander Ilin, Henri Salo, Sangita Kulathinal, Maarit K. Leinonen, Pekka Marttinen

Abstract: Despite the proven effectiveness of Transformer neural networks across multiple domains, their performance with Electronic Health Records (EHR) can be nuanced. The unique, multidimensional sequential nature of EHR data can sometimes make even simple linear models with carefully engineered features more competitive. Thus, the advantages of Transformers, such as efficient transfer learning and impro… ▽ More Despite the proven effectiveness of Transformer neural networks across multiple domains, their performance with Electronic Health Records (EHR) can be nuanced. The unique, multidimensional sequential nature of EHR data can sometimes make even simple linear models with carefully engineered features more competitive. Thus, the advantages of Transformers, such as efficient transfer learning and improved scalability are not always fully exploited in EHR applications. Addressing these challenges, we introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data. In this work, we aim to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities. The challenge amplifies when dealing with divergent patient subgroups, like those with rare diseases, which are characterized by unique health trajectories and are typically smaller in size. To address this, we employ a self-supervised pretraining strategy, Generative Summary Pretraining (GSP), which predicts future summary statistics based on past health records of a patient. Our models are pretrained on a health registry of nearly one million patients, then fine-tuned for specific subgroup prediction tasks, showcasing the potential to handle the multifaceted nature of EHR data. In evaluation, SANSformer consistently surpasses robust EHR baselines, with our GSP pretraining method notably amplifying model performance, particularly within smaller patient subgroups. Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics. △ Less

Submitted 10 November, 2023; v1 submitted 31 August, 2021; originally announced August 2021.

Comments: 17 pages, 11 figures, 11 tables, Submitted to an IEEE journal

arXiv:2106.00610 [pdf, other]

Deep Learning for Depression Recognition with Audiovisual Cues: A Review

Authors: Lang He, Mingyue Niu, Prayag Tiwari, Pekka Marttinen, Rui Su, Jiewei Jiang, Chenguang Guo, Hongyu Wang, Songtao Ding, Zhongmin Wang, Wei Dang, Xiaoying Pan

Abstract: With the acceleration of the pace of work and life, people have to face more and more pressure, which increases the possibility of suffering from depression. However, many patients may fail to get a timely diagnosis due to the serious imbalance in the doctor-patient ratio in the world. Promisingly, physiological and psychological studies have indicated some differences in speech and facial express… ▽ More With the acceleration of the pace of work and life, people have to face more and more pressure, which increases the possibility of suffering from depression. However, many patients may fail to get a timely diagnosis due to the serious imbalance in the doctor-patient ratio in the world. Promisingly, physiological and psychological studies have indicated some differences in speech and facial expression between patients with depression and healthy individuals. Consequently, to improve current medical care, many scholars have used deep learning to extract a representation of depression cues in audio and video for automatic depression detection. To sort out and summarize these works, this review introduces the databases and describes objective markers for automatic depression estimation (ADE). Furthermore, we review the deep learning methods for automatic depression detection to extract the representation of depression from audio and video. Finally, this paper discusses challenges and promising directions related to automatic diagnosing of depression using deep learning technologies. △ Less

Submitted 27 May, 2021; originally announced June 2021.

arXiv:2104.00952 [pdf, other]

doi 10.1007/978-3-030-86514-6_23

Multitask Recalibrated Aggregation Network for Medical Code Prediction

Authors: Wei Sun, Shaoxiong Ji, Erik Cambria, Pekka Marttinen

Abstract: Medical coding translates professionally written medical reports into standardized codes, which is an essential part of medical information systems and health insurance reimbursement. Manual coding by trained human coders is time-consuming and error-prone. Thus, automated coding algorithms have been developed, building especially on the recent advances in machine learning and deep neural networks.… ▽ More Medical coding translates professionally written medical reports into standardized codes, which is an essential part of medical information systems and health insurance reimbursement. Manual coding by trained human coders is time-consuming and error-prone. Thus, automated coding algorithms have been developed, building especially on the recent advances in machine learning and deep neural networks. To solve the challenges of encoding lengthy and noisy clinical documents and capturing code associations, we propose a multitask recalibrated aggregation network. In particular, multitask learning shares information across different coding schemes and captures the dependencies between different medical codes. Feature recalibration and aggregation in shared modules enhance representation learning for lengthy notes. Experiments with a real-world MIMIC-III dataset show significantly improved predictive performance. △ Less

Submitted 29 June, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

Comments: ECML-PKDD 2021

arXiv:2103.06511 [pdf, other]

doi 10.1016/j.compbiomed.2021.104998

Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative Study

Authors: Shaoxiong Ji, Matti Hölttä, Pekka Marttinen

Abstract: Unsupervised pretraining is an integral part of many natural language processing systems, and transfer learning with language models has achieved remarkable results in many downstream tasks. In the clinical application of medical code assignment, diagnosis and procedure codes are inferred from lengthy clinical notes such as hospital discharge summaries. However, it is not clear if pretrained model… ▽ More Unsupervised pretraining is an integral part of many natural language processing systems, and transfer learning with language models has achieved remarkable results in many downstream tasks. In the clinical application of medical code assignment, diagnosis and procedure codes are inferred from lengthy clinical notes such as hospital discharge summaries. However, it is not clear if pretrained models are useful for medical code prediction without further architecture engineering. This paper conducts a comprehensive quantitative analysis of various contextualized language models' performance, pretrained in different domains, for medical code assignment from clinical notes. We propose a hierarchical fine-tuning architecture to capture interactions between distant words and adopt label-wise attention to exploit label information. Contrary to current trends, we demonstrate that a carefully trained classical CNN outperforms attention-based models on a MIMIC-III subset with frequent codes. Our empirical findings suggest directions for improving the medical code assignment application. △ Less

Submitted 26 October, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

Journal ref: Computers in Biology and Medicine, 2021

arXiv:2102.06648 [pdf, other]

A Critical Look at the Consistency of Causal Estimation With Deep Latent Variable Models

Authors: Severi Rissanen, Pekka Marttinen

Abstract: Using deep latent variable models in causal inference has attracted considerable interest recently, but an essential open question is their ability to yield consistent causal estimates. While they have demonstrated promising results and theory exists on some simple model formulations, we also know that causal effects are not even identifiable in general with latent variables. We investigate this g… ▽ More Using deep latent variable models in causal inference has attracted considerable interest recently, but an essential open question is their ability to yield consistent causal estimates. While they have demonstrated promising results and theory exists on some simple model formulations, we also know that causal effects are not even identifiable in general with latent variables. We investigate this gap between theory and empirical results with analytical considerations and extensive experiments under multiple synthetic and real-world data sets, using the causal effect variational autoencoder (CEVAE) as a case study. While CEVAE seems to work reliably under some simple scenarios, it does not estimate the causal effect correctly with a misspecified latent variable or a complex data distribution, as opposed to its original motivation. Hence, our results show that more attention should be paid to ensuring the correctness of causal estimates with deep latent variable models. △ Less

Submitted 24 January, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: 10 pages for main text + 19 pages for references and supplementary. 18 Figures

Journal ref: Advances in Neural Information Processing Systems 34 (2021)

arXiv:2010.06975 [pdf, other]

doi 10.18653/v1/2021.findings-acl.89

Medical Code Assignment with Gated Convolution and Note-Code Interaction

Authors: Shaoxiong Ji, Shirui Pan, Pekka Marttinen

Abstract: Medical code assignment from clinical text is a fundamental task in clinical information system management. As medical notes are typically lengthy and the medical coding system's code space is large, this task is a long-standing challenge. Recent work applies deep neural network models to encode the medical notes and assign medical codes to clinical documents. However, these methods are still inef… ▽ More Medical code assignment from clinical text is a fundamental task in clinical information system management. As medical notes are typically lengthy and the medical coding system's code space is large, this task is a long-standing challenge. Recent work applies deep neural network models to encode the medical notes and assign medical codes to clinical documents. However, these methods are still ineffective as they do not fully encode and capture the lengthy and rich semantic information of medical notes nor explicitly exploit the interactions between the notes and codes. We propose a novel method, gated convolutional neural networks, and a note-code interaction (GatedCNN-NCI), for automatic medical code assignment to overcome these challenges. Our methods capture the rich semantic information of the lengthy clinical text for better representation by utilizing embedding injection and gated information propagation in the medical note encoding module. With a novel note-code interaction design and a graph message passing mechanism, we explicitly capture the underlying dependency between notes and codes, enabling effective code prediction. A weight sharing scheme is further designed to decrease the number of trainable parameters. Empirical experiments on real-world clinical datasets show that our proposed model outperforms state-of-the-art models in most cases, and our model size is on par with light-weighted baselines. △ Less

Submitted 15 March, 2022; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: Findings of ACL-IJCNLP 2021

arXiv:2009.14578 [pdf, other]

doi 10.18653/v1/2020.clinicalnlp-1.8

Dilated Convolutional Attention Network for Medical Code Assignment from Clinical Text

Authors: Shaoxiong Ji, Erik Cambria, Pekka Marttinen

Abstract: Medical code assignment, which predicts medical codes from clinical texts, is a fundamental task of intelligent medical information systems. The emergence of deep models in natural language processing has boosted the development of automatic assignment methods. However, recent advanced neural architectures with flat convolutions or multi-channel feature concatenation ignore the sequential causal c… ▽ More Medical code assignment, which predicts medical codes from clinical texts, is a fundamental task of intelligent medical information systems. The emergence of deep models in natural language processing has boosted the development of automatic assignment methods. However, recent advanced neural architectures with flat convolutions or multi-channel feature concatenation ignore the sequential causal constraint within a text sequence and may not learn meaningful clinical text representations, especially for lengthy clinical notes with long-term sequential dependency. This paper proposes a Dilated Convolutional Attention Network (DCAN), integrating dilated convolutions, residual connections, and label attention, for medical code assignment. It adopts dilated convolutions to capture complex medical patterns with a receptive field which increases exponentially with dilation size. Experiments on a real-world clinical dataset empirically show that our model improves the state of the art. △ Less

Submitted 30 September, 2020; originally announced September 2020.

Comments: The 3rd Clinical Natural Language Processing Workshop at EMNLP 2020

arXiv:2002.10243 [pdf, other]

doi 10.1214/21-BA1291

Informative Bayesian Neural Network Priors for Weak Signals

Authors: Tianyu Cui, Aki Havulinna, Pekka Marttinen, Samuel Kaski

Abstract: Encoding domain knowledge into the prior over the high-dimensional weight space of a neural network is challenging but essential in applications with limited data and weak signals. Two types of domain knowledge are commonly available in scientific applications: 1. feature sparsity (fraction of features deemed relevant); 2. signal-to-noise ratio, quantified, for instance, as the proportion of varia… ▽ More Encoding domain knowledge into the prior over the high-dimensional weight space of a neural network is challenging but essential in applications with limited data and weak signals. Two types of domain knowledge are commonly available in scientific applications: 1. feature sparsity (fraction of features deemed relevant); 2. signal-to-noise ratio, quantified, for instance, as the proportion of variance explained (PVE). We show how to encode both types of domain knowledge into the widely used Gaussian scale mixture priors with Automatic Relevance Determination. Specifically, we propose a new joint prior over the local (i.e., feature-specific) scale parameters that encodes knowledge about feature sparsity, and a Stein gradient optimization to tune the hyperparameters in such a way that the distribution induced on the model's PVE matches the prior distribution. We show empirically that the new prior improves prediction accuracy, compared to existing neural network priors, on several publicly available datasets and in a genetics application where signals are weak and sparse, often outperforming even computationally intensive cross-validation for hyperparameter tuning. △ Less

Submitted 7 January, 2021; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: 25 pages, 8 figures, 4 tables

arXiv:2002.00388 [pdf, other]

doi 10.1109/TNNLS.2021.3070843

A Survey on Knowledge Graphs: Representation, Acquisition and Applications

Authors: Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, Philip S. Yu

Abstract: Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review of knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge… ▽ More Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review of knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning, are reviewed. We further explore several emerging topics, including meta relational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions. △ Less

Submitted 1 April, 2021; v1 submitted 2 February, 2020; originally announced February 2020.

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2021

arXiv:1910.06121 [pdf, other]

Batch simulations and uncertainty quantification in Gaussian process surrogate approximate Bayesian computation

Authors: Marko Järvenpää, Aki Vehtari, Pekka Marttinen

Abstract: The computational efficiency of approximate Bayesian computation (ABC) has been improved by using surrogate models such as Gaussian processes (GP). In one such promising framework the discrepancy between the simulated and observed data is modelled with a GP which is further used to form a model-based estimator for the intractable posterior. In this article we improve this approach in several ways.… ▽ More The computational efficiency of approximate Bayesian computation (ABC) has been improved by using surrogate models such as Gaussian processes (GP). In one such promising framework the discrepancy between the simulated and observed data is modelled with a GP which is further used to form a model-based estimator for the intractable posterior. In this article we improve this approach in several ways. We develop batch-sequential Bayesian experimental design strategies to parallellise the expensive simulations. In earlier work only sequential strategies have been used. Current surrogate-based ABC methods also do not fully account the uncertainty due to the limited budget of simulations as they output only a point estimate of the ABC posterior. We propose a numerical method to fully quantify the uncertainty in, for example, ABC posterior moments. We also provide some new analysis on the GP modelling assumptions in the resulting improved framework called Bayesian ABC and discuss its connection to Bayesian quadrature (BQ) and Bayesian optimisation (BO). Experiments with toy and real-world simulation models demonstrate advantages of the proposed techniques. △ Less

Submitted 6 August, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

Comments: Minor improvements and clarifications to the text over the previous version. 20 pages, 15 figures

arXiv:1906.03989 [pdf, other]

Errors-in-variables Modeling of Personalized Treatment-Response Trajectories

Authors: Guangyi Zhang, Reza Ashrafi, Anne Juuti, Kirsi Pietiläinen, Pekka Marttinen

Abstract: Estimating the effect of a treatment on a given outcome, conditioned on a vector of covariates, is central in many applications. However, learning the impact of a treatment on a continuous temporal response, when the covariates suffer extensively from measurement error and even the timing of the treatments is uncertain, has not been addressed. We introduce a novel data-driven method that can estim… ▽ More Estimating the effect of a treatment on a given outcome, conditioned on a vector of covariates, is central in many applications. However, learning the impact of a treatment on a continuous temporal response, when the covariates suffer extensively from measurement error and even the timing of the treatments is uncertain, has not been addressed. We introduce a novel data-driven method that can estimate treatment-response trajectories in this challenging scenario. We model personalized treatment-response curves as a combination of parametric response functions, hierarchically sharing information across individuals, and a sparse Gaussian process for the baseline trend. Importantly, our model considers measurement error not only in treatment covariates, but also in treatment times, a problem which arises in practice for example when treatment information is based on self-reporting. In a challenging and timely problem of estimating the impact of diet on continuous blood glucose measurements, our model leads to significant improvements in estimation accuracy and prediction. △ Less

Submitted 10 June, 2019; originally announced June 2019.

arXiv:1905.01252 [pdf, other]

Parallel Gaussian process surrogate Bayesian inference with noisy likelihood evaluations

Authors: Marko Järvenpää, Michael Gutmann, Aki Vehtari, Pekka Marttinen

Abstract: We consider Bayesian inference when only a limited number of noisy log-likelihood evaluations can be obtained. This occurs for example when complex simulator-based statistical models are fitted to data, and synthetic likelihood (SL) method is used to form the noisy log-likelihood estimates using computationally costly forward simulations. We frame the inference task as a sequential Bayesian experi… ▽ More We consider Bayesian inference when only a limited number of noisy log-likelihood evaluations can be obtained. This occurs for example when complex simulator-based statistical models are fitted to data, and synthetic likelihood (SL) method is used to form the noisy log-likelihood estimates using computationally costly forward simulations. We frame the inference task as a sequential Bayesian experimental design problem, where the log-likelihood function is modelled with a hierarchical Gaussian process (GP) surrogate model, which is used to efficiently select additional log-likelihood evaluation locations. Motivated by recent progress in the related problem of batch Bayesian optimisation, we develop various batch-sequential design strategies which allow to run some of the potentially costly simulations in parallel. We analyse the properties of the resulting method theoretically and empirically. Experiments with several toy problems and simulation models suggest that our method is robust, highly parallelisable, and sample-efficient. △ Less

Submitted 6 March, 2020; v1 submitted 3 May, 2019; originally announced May 2019.

Comments: Minor changes to the text. 37 pages, 18 figures

arXiv:1901.08361 [pdf, other]

Learning Global Pairwise Interactions with Bayesian Neural Networks

Authors: Tianyu Cui, Pekka Marttinen, Samuel Kaski

Abstract: Estimating global pairwise interaction effects, i.e., the difference between the joint effect and the sum of marginal effects of two input features, with uncertainty properly quantified, is centrally important in science applications. We propose a non-parametric probabilistic method for detecting interaction effects of unknown form. First, the relationship between the features and the output is mo… ▽ More Estimating global pairwise interaction effects, i.e., the difference between the joint effect and the sum of marginal effects of two input features, with uncertainty properly quantified, is centrally important in science applications. We propose a non-parametric probabilistic method for detecting interaction effects of unknown form. First, the relationship between the features and the output is modelled using a Bayesian neural network, capable of representing complex interactions and principled uncertainty. Second, interaction effects and their uncertainty are estimated from the trained model. For the second step, we propose an intuitive global interaction measure: Bayesian Group Expected Hessian (GEH), which aggregates information of local interactions as captured by the Hessian. GEH provides a natural trade-off between type I and type II error and, moreover, comes with theoretical guarantees ensuring that the estimated interaction effects and their uncertainty can be improved by training a more accurate BNN. The method empirically outperforms available non-probabilistic alternatives on simulated and real-world data. Finally, we demonstrate its ability to detect interpretable interactions between higher-level features (at deeper layers of the neural network). △ Less

Submitted 19 November, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

Comments: 8 pages

Journal ref: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI 2020)

arXiv:1811.10958 [pdf, other]

A Bayesian model of acquisition and clearance of bacterial colonization

Authors: Marko Järvenpää, Mohamad R. Abdul Sater, Georgia K. Lagoudas, Paul C. Blainey, Loren G. Miller, James A. McKinnell, Susan S. Huang, Yonatan H. Grad, Pekka Marttinen

Abstract: Bacterial populations that colonize a host play important roles in host health, including serving as a reservoir that transmits to other hosts and from which invasive strains emerge, thus emphasizing the importance of understanding rates of acquisition and clearance of colonizing populations. Studies of colonization dynamics have been based on assessment of whether serial samples represent a singl… ▽ More Bacterial populations that colonize a host play important roles in host health, including serving as a reservoir that transmits to other hosts and from which invasive strains emerge, thus emphasizing the importance of understanding rates of acquisition and clearance of colonizing populations. Studies of colonization dynamics have been based on assessment of whether serial samples represent a single population or distinct colonization events. A common solution to estimate acquisition and clearance rates is to use a fixed genetic distance threshold. However, this approach is often inadequate to account for the diversity of the underlying within-host evolving population, the time intervals between consecutive measurements, and the uncertainty in the estimated acquisition and clearance rates. Here, we summarize recently submitted work \cite{jarvenpaa2018named} and present a Bayesian model that provides probabilities of whether two strains should be considered the same, allowing to determine bacterial clearance and acquisition from genomes sampled over time. We explicitly model the within-host variation using population genetic simulation, and the inference is done by combining information from multiple data sources by using a combination of Approximate Bayesian Computation (ABC) and Markov Chain Monte Carlo (MCMC). We use the method to analyse a collection of methicillin resistant Staphylococcus aureus (MRSA) isolates. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/87

arXiv:1708.00707 [pdf, other]

ELFI: Engine for Likelihood-Free Inference

Authors: Jarno Lintusaari, Henri Vuollekoski, Antti Kangasrääsiö, Kusti Skytén, Marko Järvenpää, Pekka Marttinen, Michael U. Gutmann, Aki Vehtari, Jukka Corander, Samuel Kaski

Abstract: Engine for Likelihood-Free Inference (ELFI) is a Python software library for performing likelihood-free inference (LFI). ELFI provides a convenient syntax for arranging components in LFI, such as priors, simulators, summaries or distances, to a network called ELFI graph. The components can be implemented in a wide variety of languages. The stand-alone ELFI graph can be used with any of the availab… ▽ More Engine for Likelihood-Free Inference (ELFI) is a Python software library for performing likelihood-free inference (LFI). ELFI provides a convenient syntax for arranging components in LFI, such as priors, simulators, summaries or distances, to a network called ELFI graph. The components can be implemented in a wide variety of languages. The stand-alone ELFI graph can be used with any of the available inference methods without modifications. A central method implemented in ELFI is Bayesian Optimization for Likelihood-Free Inference (BOLFI), which has recently been shown to accelerate likelihood-free inference up to several orders of magnitude by surrogate-modelling the distance. ELFI also has an inbuilt support for output data storing for reuse and analysis, and supports parallelization of computation from multiple cores up to a cluster environment. ELFI is designed to be extensible and provides interfaces for widening its functionality. This makes the adding of new inference methods to ELFI straightforward and automatically compatible with the inbuilt features. △ Less

Submitted 5 July, 2018; v1 submitted 2 August, 2017; originally announced August 2017.

Journal ref: Journal of Machine Learning Research, 19(16):1-7, 2018. http://jmlr.org/papers/v19/17-374.html

arXiv:1705.03290 [pdf, other]

doi 10.1093/bioinformatics/bty257

Improving drug sensitivity predictions in precision medicine through active expert knowledge elicitation

Authors: Iiris Sundin, Tomi Peltola, Muntasir Mamun Majumder, Pedram Daee, Marta Soare, Homayun Afrabandpey, Caroline Heckman, Samuel Kaski, Pekka Marttinen

Abstract: Predicting the efficacy of a drug for a given individual, using high-dimensional genomic measurements, is at the core of precision medicine. However, identifying features on which to base the predictions remains a challenge, especially when the sample size is small. Incorporating expert knowledge offers a promising alternative to improve a prediction model, but collecting such knowledge is laborio… ▽ More Predicting the efficacy of a drug for a given individual, using high-dimensional genomic measurements, is at the core of precision medicine. However, identifying features on which to base the predictions remains a challenge, especially when the sample size is small. Incorporating expert knowledge offers a promising alternative to improve a prediction model, but collecting such knowledge is laborious to the expert if the number of candidate features is very large. We introduce a probabilistic model that can incorporate expert feedback about the impact of genomic measurements on the sensitivity of a cancer cell for a given drug. We also present two methods to intelligently collect this feedback from the expert, using experimental design and multi-armed bandit models. In a multiple myeloma blood cancer data set (n=51), expert knowledge decreased the prediction error by 8%. Furthermore, the intelligent approaches can be used to reduce the workload of feedback collection to less than 30% on average compared to a naive approach. △ Less

Submitted 9 May, 2017; originally announced May 2017.

Comments: 24 pages, 3 figures

arXiv:1704.00520 [pdf, other]

doi 10.1214/18-BA1121

Efficient acquisition rules for model-based approximate Bayesian computation

Authors: Marko Järvenpää, Michael U. Gutmann, Arijus Pleska, Aki Vehtari, Pekka Marttinen

Abstract: Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables… ▽ More Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables one to intelligently decide where to evaluate the model next but common BO strategies are not designed for the goal of estimating the posterior distribution. Our paper addresses this gap in the literature. We propose to compute the uncertainty in the ABC posterior density, which is due to a lack of simulations to estimate this quantity accurately, and define a loss function that measures this uncertainty. We then propose to select the next evaluation location to minimise the expected loss. Experiments show that the proposed method often produces the most accurate approximations as compared to common BO strategies. △ Less

Submitted 8 August, 2018; v1 submitted 3 April, 2017; originally announced April 2017.

Comments: 30 pages, 10 figures

arXiv:1612.02487 [pdf, other]

Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets

Authors: Luana Micallef, Iiris Sundin, Pekka Marttinen, Muhammad Ammad-ud-din, Tomi Peltola, Marta Soare, Giulio Jacucci, Samuel Kaski

Abstract: Providing accurate predictions is challenging for machine learning algorithms when the number of features is larger than the number of samples in the data. Prior knowledge can improve machine learning models by indicating relevant variables and parameter values. Yet, this prior knowledge is often tacit and only available from domain experts. We present a novel approach that uses interactive visual… ▽ More Providing accurate predictions is challenging for machine learning algorithms when the number of features is larger than the number of samples in the data. Prior knowledge can improve machine learning models by indicating relevant variables and parameter values. Yet, this prior knowledge is often tacit and only available from domain experts. We present a novel approach that uses interactive visualization to elicit the tacit prior knowledge and uses it to improve the accuracy of prediction models. The main component of our approach is a user model that models the domain expert's knowledge of the relevance of different features for a prediction task. In particular, based on the expert's earlier input, the user model guides the selection of the features on which to elicit user's knowledge next. The results of a controlled user study show that the user model significantly improves prior knowledge elicitation and prediction accuracy, when predicting the relative citation counts of scientific documents in a specific domain. △ Less

Submitted 16 January, 2017; v1 submitted 7 December, 2016; originally announced December 2016.

Comments: in Proceedings of the 22nd International Conference on Intelligent User Interfaces (IUI 2017)

arXiv:1610.06462 [pdf, other]

Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria

Authors: Marko Järvenpää, Michael Gutmann, Aki Vehtari, Pekka Marttinen

Abstract: Approximate Bayesian computation (ABC) can be used for model fitting when the likelihood function is intractable but simulating from the model is feasible. However, even a single evaluation of a complex model may take several hours, limiting the number of model evaluations available. Modelling the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to redu… ▽ More Approximate Bayesian computation (ABC) can be used for model fitting when the likelihood function is intractable but simulating from the model is feasible. However, even a single evaluation of a complex model may take several hours, limiting the number of model evaluations available. Modelling the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by ABC, but the sensitivity of this approach to a specific GP formulation has not yet been thoroughly investigated. We begin with a comprehensive empirical evaluation of using GPs in ABC, including various transformations of the discrepancies and two novel GP formulations. Our results indicate the choice of GP may significantly affect the accuracy of the estimated posterior distribution. Selection of an appropriate GP model is thus important. We formulate expected utility to measure the accuracy of classifying discrepancies below or above the ABC threshold, and show that it can be used to automate the GP model selection step. Finally, based on the understanding gained with toy examples, we fit a population genetic model for bacteria, providing insight into horizontal gene transfer events within the population and from external origins. △ Less

Submitted 16 February, 2018; v1 submitted 20 October, 2016; originally announced October 2016.

Comments: 25 pages, 11 figures

arXiv:1410.7365 [pdf, other]

Multiple Output Regression with Latent Noise

Authors: Jussi Gillberg, Pekka Marttinen, Matti Pirinen, Antti J. Kangas, Pasi Soininen, Mehreen Ali, Aki S. Havulinna, Marjo-Riitta Marjo-Riitta Järvelin, Mika Ala-Korpela, Samuel Kaski

Abstract: In high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regression is of paramount importance. Additionally, (2) assumptions about the correlation structure of the regressi… ▽ More In high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regression is of paramount importance. Additionally, (2) assumptions about the correlation structure of the regression weights are needed. We note that both can be formulated in a natural way in a latent variable model, in which both the interesting signal and the noise are mediated through the same latent factors. Under this assumption, the signal model then borrows strength from the noise model by encouraging similar effects on correlated targets. We introduce a hyperparameter for the \emph{latent signal-to-noise ratio} which turns out to be important for modelling weak signals, and an ordered infinite-dimensional shrinkage prior that resolves the rotational unidentifiability in reduced-rank regression models. Simulations and prediction experiments with metabolite, gene expression, FMRI measurement, and macroeconomic time series data show that our model equals or exceeds the state-of-the-art performance and, in particular, outperforms the standard approach of assuming independent noise and signal models. △ Less

Submitted 3 February, 2016; v1 submitted 27 October, 2014; originally announced October 2014.

arXiv:1310.4362 [pdf, other]

Bayesian Information Sharing Between Noise And Regression Models Improves Prediction of Weak Effects

Authors: Jussi Gillberg, Pekka Marttinen, Matti Pirinen, Antti J Kangas, Pasi Soininen, Marjo-Riitta Järvelin, Mika Ala-Korpela, Samuel Kaski

Abstract: We consider the prediction of weak effects in a multiple-output regression setup, when covariates are expected to explain a small amount, less than $\approx 1%$, of the variance of the target variables. To facilitate the prediction of the weak effects, we constrain our model structure by introducing a novel Bayesian approach of sharing information between the regression model and the noise model.… ▽ More We consider the prediction of weak effects in a multiple-output regression setup, when covariates are expected to explain a small amount, less than $\approx 1%$, of the variance of the target variables. To facilitate the prediction of the weak effects, we constrain our model structure by introducing a novel Bayesian approach of sharing information between the regression model and the noise model. Further reduction of the effective number of parameters is achieved by introducing an infinite shrinkage prior and group sparsity in the context of the Bayesian reduced rank regression, and using the Bayesian infinite factor model as a flexible low-rank noise model. In our experiments the model incorporating the novelties outperformed alternatives in genomic prediction of rich phenotype data. In particular, the information sharing between the noise and regression models led to significant improvement in prediction accuracy. △ Less

Submitted 16 October, 2013; originally announced October 2013.

arXiv:1211.1144 [pdf, ps, other]

Genome-wide association studies with high-dimensional phenotypes

Authors: Pekka Marttinen, Jussi Gillberg, Aki Havulinna, Jukka Corander, Samuel Kaski

Abstract: High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be miss… ▽ More High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small. △ Less

Submitted 13 May, 2013; v1 submitted 6 November, 2012; originally announced November 2012.

Comments: 33 pages, 11 figures

Showing 1–48 of 48 results for author: Marttinen, P