Search | arXiv e-print repository

With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations

Authors: Amandine Decker, Maxime Amblard

Abstract: Topics play an important role in the global organisation of a conversation as what is currently discussed constrains the possible contributions of the participant. Understanding the way topics are organised in interaction would provide insight on the structure of dialogue beyond the sequence of utterances. However, studying this high-level structure is a complex task that we try to approach by fir… ▽ More Topics play an important role in the global organisation of a conversation as what is currently discussed constrains the possible contributions of the participant. Understanding the way topics are organised in interaction would provide insight on the structure of dialogue beyond the sequence of utterances. However, studying this high-level structure is a complex task that we try to approach by first segmenting dialogues into smaller topically coherent sets of utterances. Understanding the interactions between these segments would then enable us to propose a model of topic organisation at a dialogue level. In this paper we work with open-domain conversations and try to reach a comparable level of accuracy as recent machine learning based topic segmentation models but with a formal approach. The features we identify as meaningful for this task help us understand better the topical structure of a conversation. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Journal ref: CODI 2024 - 5th workshop on Computational Approaches to Discourse, Mar 2024, Malta, Malta

arXiv:2312.03342 [pdf, other]

Topic and genre in dialogue

Authors: Amandine Decker, Ellen Breitholtz, Christine Howes, Staffan Larsson

Abstract: In this paper we argue that topic plays a fundamental role in conversations, and that the concept is needed in addition to that of genre to define interactions. In particular, the concepts of genre and topic need to be separated and orthogonally defined. This would enable modular, reliable and controllable flexible-domain dialogue systems. In this paper we argue that topic plays a fundamental role in conversations, and that the concept is needed in addition to that of genre to define interactions. In particular, the concepts of genre and topic need to be separated and orthogonally defined. This would enable modular, reliable and controllable flexible-domain dialogue systems. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Journal ref: SEMDIAL 2023, Aug 2023, Maribor, Slovenia. pp.143-145

arXiv:2212.06040 [pdf, other]

Semantic Decomposition Improves Learning of Large Language Models on EHR Data

Authors: David A. Bloore, Romane Gauriau, Anna L. Decker, Jacob Oppenheim

Abstract: Electronic health records (EHR) are widely believed to hold a profusion of actionable insights, encrypted in an irregular, semi-structured format, amidst a loud noise background. To simplify learning patterns of health and disease, medical codes in EHR can be decomposed into semantic units connected by hierarchical graphs. Building on earlier synergy between Bidirectional Encoder Representations f… ▽ More Electronic health records (EHR) are widely believed to hold a profusion of actionable insights, encrypted in an irregular, semi-structured format, amidst a loud noise background. To simplify learning patterns of health and disease, medical codes in EHR can be decomposed into semantic units connected by hierarchical graphs. Building on earlier synergy between Bidirectional Encoder Representations from Transformers (BERT) and Graph Attention Networks (GAT), we present H-BERT, which ingests complete graph tree expansions of hierarchical medical codes as opposed to only ingesting the leaves and pushes patient-level labels down to each visit. This methodology significantly improves prediction of patient membership in over 500 medical diagnosis classes as measured by aggregated AUC and APS, and creates distinct representations of patients in closely related but clinically distinct phenotypes. △ Less

Submitted 14 November, 2022; originally announced December 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 9 pages

arXiv:2211.07549 [pdf, other]

Phenotype Detection in Real World Data via Online MixEHR Algorithm

Authors: Ying Xu, Romane Gauriau, Anna Decker, Jacob Oppenheim

Abstract: Understanding patterns of diagnoses, medications, procedures, and laboratory tests from electronic health records (EHRs) and health insurer claims is important for understanding disease risk and for efficient clinical development, which often require rules-based curation in collaboration with clinicians. We extended an unsupervised phenotyping algorithm, mixEHR, to an online version allowing us to… ▽ More Understanding patterns of diagnoses, medications, procedures, and laboratory tests from electronic health records (EHRs) and health insurer claims is important for understanding disease risk and for efficient clinical development, which often require rules-based curation in collaboration with clinicians. We extended an unsupervised phenotyping algorithm, mixEHR, to an online version allowing us to use it on order of magnitude larger datasets including a large, US-based claims dataset and a rich regional EHR dataset. In addition to recapitulating previously observed disease groups, we discovered clinically meaningful disease subtypes and comorbidities. This work scaled up an effective unsupervised learning method, reinforced existing clinical knowledge, and is a promising approach for efficient collaboration with clinicians. △ Less

Submitted 15 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 6 pages

arXiv:2111.05147 [pdf, other]

Tackling Morphological Analogies Using Deep Learning -- Extended Version

Authors: Safa Alsaidi, Amandine Decker, Esteban Marquer, Pierre-Alexandre Murena, Miguel Couceiro

Abstract: Analogical proportions are statements of the form "A is to B as C is to D". They constitute an inference tool that provides a logical framework to address learning, transfer, and explainability concerns and that finds useful applications in artificial intelligence and natural language processing. In this paper, we address two problems, namely, analogy detection and resolution in morphology. Multip… ▽ More Analogical proportions are statements of the form "A is to B as C is to D". They constitute an inference tool that provides a logical framework to address learning, transfer, and explainability concerns and that finds useful applications in artificial intelligence and natural language processing. In this paper, we address two problems, namely, analogy detection and resolution in morphology. Multiple symbolic approaches tackle the problem of analogies in morphology and achieve competitive performance. We show that it is possible to use a data-driven strategy to outperform those models. We propose an approach using deep learning to detect and solve morphological analogies. It encodes structural properties of analogical proportions and relies on a specifically designed embedding model capturing morphological characteristics of words. We demonstrate our model's competitive performance on analogy detection and resolution over multiple languages. We provide an empirical study to analyze the impact of balancing training data and evaluate the robustness of our approach to input perturbation. △ Less

Submitted 9 November, 2021; originally announced November 2021.

arXiv:2108.03945 [pdf, other]

A Neural Approach for Detecting Morphological Analogies

Authors: Safa Alsaidi, Amandine Decker, Puthineath Lay, Esteban Marquer, Pierre-Alexandre Murena, Miguel Couceiro

Abstract: Analogical proportions are statements of the form "A is to B as C is to D" that are used for several reasoning and classification tasks in artificial intelligence and natural language processing (NLP). For instance, there are analogy based approaches to semantics as well as to morphology. In fact, symbolic approaches were developed to solve or to detect analogies between character strings, e.g., t… ▽ More Analogical proportions are statements of the form "A is to B as C is to D" that are used for several reasoning and classification tasks in artificial intelligence and natural language processing (NLP). For instance, there are analogy based approaches to semantics as well as to morphology. In fact, symbolic approaches were developed to solve or to detect analogies between character strings, e.g., the axiomatic approach as well as that based on Kolmogorov complexity. In this paper, we propose a deep learning approach to detect morphological analogies, for instance, with reinflexion or conjugation. We present empirical results that show that our framework is competitive with the above-mentioned state of the art symbolic approaches. We also explore empirically its transferability capacity across languages, which highlights interesting similarities between them. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: Submitted and accepted by the 8th IEEE International Conference on Data Science and Advanced Analytics (DSAA)

arXiv:2108.03938 [pdf, other]

On the Transferability of Neural Models of Morphological Analogies

Authors: Safa Alsaidi, Amandine Decker, Puthineath Lay, Esteban Marquer, Pierre-Alexandre Murena, Miguel Couceiro

Abstract: Analogical proportions are statements expressed in the form "A is to B as C is to D" and are used for several reasoning and classification tasks in artificial intelligence and natural language processing (NLP). In this paper, we focus on morphological tasks and we propose a deep learning approach to detect morphological analogies. We present an empirical study to see how our framework transfers ac… ▽ More Analogical proportions are statements expressed in the form "A is to B as C is to D" and are used for several reasoning and classification tasks in artificial intelligence and natural language processing (NLP). In this paper, we focus on morphological tasks and we propose a deep learning approach to detect morphological analogies. We present an empirical study to see how our framework transfers across languages, and that highlights interesting similarities and differences between these languages. In view of these results, we also discuss the possibility of building a multilingual morphological model. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: Submitted and accepted by AIMLAI, ECML PKDD 2021: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

arXiv:2101.02332 [pdf, other]

Identification of Latent Variables From Graphical Model Residuals

Authors: Boris Hayete, Fred Gruber, Anna Decker, Raymond Yan

Abstract: Graph-based causal discovery methods aim to capture conditional independencies consistent with the observed data and differentiate causal relationships from indirect or induced ones. Successful construction of graphical models of data depends on the assumption of causal sufficiency: that is, that all confounding variables are measured. When this assumption is not met, learned graphical structures… ▽ More Graph-based causal discovery methods aim to capture conditional independencies consistent with the observed data and differentiate causal relationships from indirect or induced ones. Successful construction of graphical models of data depends on the assumption of causal sufficiency: that is, that all confounding variables are measured. When this assumption is not met, learned graphical structures may become arbitrarily incorrect and effects implied by such models may be wrongly attributed, carry the wrong magnitude, or mis-represent direction of correlation. Wide application of graphical models to increasingly less curated "big data" draws renewed attention to the unobserved confounder problem. We present a novel method that aims to control for the latent space when estimating a DAG by iteratively deriving proxies for the latent space from the residuals of the inferred model. Under mild assumptions, our method improves structural inference of Gaussian graphical models and enhances identifiability of the causal effect. In addition, when the model is being used to predict outcomes, it un-confounds the coefficients on the parents of the outcomes and leads to improved predictive performance when out-of-sample regime is very different from the training data. We show that any improvement of prediction of an outcome is intrinsically capped and cannot rise beyond a certain limit as compared to the confounded model. We extend our methodology beyond GGMs to ordinal variables and nonlinear cases. Our R package provides both PCA and autoencoder implementations of the methodology, suitable for GGMs with some guarantees and for better performance in general cases but without such guarantees. △ Less

Submitted 6 January, 2021; originally announced January 2021.

Showing 1–8 of 8 results for author: Decker, A