-
With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations
Authors:
Amandine Decker,
Maxime Amblard
Abstract:
Topics play an important role in the global organisation of a conversation as what is currently discussed constrains the possible contributions of the participant. Understanding the way topics are organised in interaction would provide insight on the structure of dialogue beyond the sequence of utterances. However, studying this high-level structure is a complex task that we try to approach by fir…
▽ More
Topics play an important role in the global organisation of a conversation as what is currently discussed constrains the possible contributions of the participant. Understanding the way topics are organised in interaction would provide insight on the structure of dialogue beyond the sequence of utterances. However, studying this high-level structure is a complex task that we try to approach by first segmenting dialogues into smaller topically coherent sets of utterances. Understanding the interactions between these segments would then enable us to propose a model of topic organisation at a dialogue level. In this paper we work with open-domain conversations and try to reach a comparable level of accuracy as recent machine learning based topic segmentation models but with a formal approach. The features we identify as meaningful for this task help us understand better the topical structure of a conversation.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Topic and genre in dialogue
Authors:
Amandine Decker,
Ellen Breitholtz,
Christine Howes,
Staffan Larsson
Abstract:
In this paper we argue that topic plays a fundamental role in conversations, and that the concept is needed in addition to that of genre to define interactions. In particular, the concepts of genre and topic need to be separated and orthogonally defined. This would enable modular, reliable and controllable flexible-domain dialogue systems.
In this paper we argue that topic plays a fundamental role in conversations, and that the concept is needed in addition to that of genre to define interactions. In particular, the concepts of genre and topic need to be separated and orthogonally defined. This would enable modular, reliable and controllable flexible-domain dialogue systems.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Semantic Decomposition Improves Learning of Large Language Models on EHR Data
Authors:
David A. Bloore,
Romane Gauriau,
Anna L. Decker,
Jacob Oppenheim
Abstract:
Electronic health records (EHR) are widely believed to hold a profusion of actionable insights, encrypted in an irregular, semi-structured format, amidst a loud noise background. To simplify learning patterns of health and disease, medical codes in EHR can be decomposed into semantic units connected by hierarchical graphs. Building on earlier synergy between Bidirectional Encoder Representations f…
▽ More
Electronic health records (EHR) are widely believed to hold a profusion of actionable insights, encrypted in an irregular, semi-structured format, amidst a loud noise background. To simplify learning patterns of health and disease, medical codes in EHR can be decomposed into semantic units connected by hierarchical graphs. Building on earlier synergy between Bidirectional Encoder Representations from Transformers (BERT) and Graph Attention Networks (GAT), we present H-BERT, which ingests complete graph tree expansions of hierarchical medical codes as opposed to only ingesting the leaves and pushes patient-level labels down to each visit. This methodology significantly improves prediction of patient membership in over 500 medical diagnosis classes as measured by aggregated AUC and APS, and creates distinct representations of patients in closely related but clinically distinct phenotypes.
△ Less
Submitted 14 November, 2022;
originally announced December 2022.
-
Phenotype Detection in Real World Data via Online MixEHR Algorithm
Authors:
Ying Xu,
Romane Gauriau,
Anna Decker,
Jacob Oppenheim
Abstract:
Understanding patterns of diagnoses, medications, procedures, and laboratory tests from electronic health records (EHRs) and health insurer claims is important for understanding disease risk and for efficient clinical development, which often require rules-based curation in collaboration with clinicians. We extended an unsupervised phenotyping algorithm, mixEHR, to an online version allowing us to…
▽ More
Understanding patterns of diagnoses, medications, procedures, and laboratory tests from electronic health records (EHRs) and health insurer claims is important for understanding disease risk and for efficient clinical development, which often require rules-based curation in collaboration with clinicians. We extended an unsupervised phenotyping algorithm, mixEHR, to an online version allowing us to use it on order of magnitude larger datasets including a large, US-based claims dataset and a rich regional EHR dataset. In addition to recapitulating previously observed disease groups, we discovered clinically meaningful disease subtypes and comorbidities. This work scaled up an effective unsupervised learning method, reinforced existing clinical knowledge, and is a promising approach for efficient collaboration with clinicians.
△ Less
Submitted 15 November, 2022; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Tackling Morphological Analogies Using Deep Learning -- Extended Version
Authors:
Safa Alsaidi,
Amandine Decker,
Esteban Marquer,
Pierre-Alexandre Murena,
Miguel Couceiro
Abstract:
Analogical proportions are statements of the form "A is to B as C is to D". They constitute an inference tool that provides a logical framework to address learning, transfer, and explainability concerns and that finds useful applications in artificial intelligence and natural language processing. In this paper, we address two problems, namely, analogy detection and resolution in morphology. Multip…
▽ More
Analogical proportions are statements of the form "A is to B as C is to D". They constitute an inference tool that provides a logical framework to address learning, transfer, and explainability concerns and that finds useful applications in artificial intelligence and natural language processing. In this paper, we address two problems, namely, analogy detection and resolution in morphology. Multiple symbolic approaches tackle the problem of analogies in morphology and achieve competitive performance. We show that it is possible to use a data-driven strategy to outperform those models. We propose an approach using deep learning to detect and solve morphological analogies. It encodes structural properties of analogical proportions and relies on a specifically designed embedding model capturing morphological characteristics of words. We demonstrate our model's competitive performance on analogy detection and resolution over multiple languages. We provide an empirical study to analyze the impact of balancing training data and evaluate the robustness of our approach to input perturbation.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
A Neural Approach for Detecting Morphological Analogies
Authors:
Safa Alsaidi,
Amandine Decker,
Puthineath Lay,
Esteban Marquer,
Pierre-Alexandre Murena,
Miguel Couceiro
Abstract:
Analogical proportions are statements of the form "A is to B as C is to D" that are used for several reasoning and classification tasks in artificial intelligence and natural language processing (NLP). For instance, there are analogy based approaches to semantics as well as to morphology. In fact, symbolic approaches were developed to solve or to detect analogies between character strings, e.g., t…
▽ More
Analogical proportions are statements of the form "A is to B as C is to D" that are used for several reasoning and classification tasks in artificial intelligence and natural language processing (NLP). For instance, there are analogy based approaches to semantics as well as to morphology. In fact, symbolic approaches were developed to solve or to detect analogies between character strings, e.g., the axiomatic approach as well as that based on Kolmogorov complexity. In this paper, we propose a deep learning approach to detect morphological analogies, for instance, with reinflexion or conjugation. We present empirical results that show that our framework is competitive with the above-mentioned state of the art symbolic approaches. We also explore empirically its transferability capacity across languages, which highlights interesting similarities between them.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
On the Transferability of Neural Models of Morphological Analogies
Authors:
Safa Alsaidi,
Amandine Decker,
Puthineath Lay,
Esteban Marquer,
Pierre-Alexandre Murena,
Miguel Couceiro
Abstract:
Analogical proportions are statements expressed in the form "A is to B as C is to D" and are used for several reasoning and classification tasks in artificial intelligence and natural language processing (NLP). In this paper, we focus on morphological tasks and we propose a deep learning approach to detect morphological analogies. We present an empirical study to see how our framework transfers ac…
▽ More
Analogical proportions are statements expressed in the form "A is to B as C is to D" and are used for several reasoning and classification tasks in artificial intelligence and natural language processing (NLP). In this paper, we focus on morphological tasks and we propose a deep learning approach to detect morphological analogies. We present an empirical study to see how our framework transfers across languages, and that highlights interesting similarities and differences between these languages. In view of these results, we also discuss the possibility of building a multilingual morphological model.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Identification of Latent Variables From Graphical Model Residuals
Authors:
Boris Hayete,
Fred Gruber,
Anna Decker,
Raymond Yan
Abstract:
Graph-based causal discovery methods aim to capture conditional independencies consistent with the observed data and differentiate causal relationships from indirect or induced ones. Successful construction of graphical models of data depends on the assumption of causal sufficiency: that is, that all confounding variables are measured. When this assumption is not met, learned graphical structures…
▽ More
Graph-based causal discovery methods aim to capture conditional independencies consistent with the observed data and differentiate causal relationships from indirect or induced ones. Successful construction of graphical models of data depends on the assumption of causal sufficiency: that is, that all confounding variables are measured. When this assumption is not met, learned graphical structures may become arbitrarily incorrect and effects implied by such models may be wrongly attributed, carry the wrong magnitude, or mis-represent direction of correlation. Wide application of graphical models to increasingly less curated "big data" draws renewed attention to the unobserved confounder problem.
We present a novel method that aims to control for the latent space when estimating a DAG by iteratively deriving proxies for the latent space from the residuals of the inferred model. Under mild assumptions, our method improves structural inference of Gaussian graphical models and enhances identifiability of the causal effect. In addition, when the model is being used to predict outcomes, it un-confounds the coefficients on the parents of the outcomes and leads to improved predictive performance when out-of-sample regime is very different from the training data. We show that any improvement of prediction of an outcome is intrinsically capped and cannot rise beyond a certain limit as compared to the confounded model. We extend our methodology beyond GGMs to ordinal variables and nonlinear cases. Our R package provides both PCA and autoencoder implementations of the methodology, suitable for GGMs with some guarantees and for better performance in general cases but without such guarantees.
△ Less
Submitted 6 January, 2021;
originally announced January 2021.