Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Le-Duc, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.04174  [pdf, other

    cs.CL cs.AI cs.IR cs.LG cs.SD eess.AS

    wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech

    Authors: Khai Le-Duc, Quy-Anh Dang, Tan-Hanh Pham, Truong-Son Hy

    Abstract: Knowledge graphs (KGs) enhance the performance of large language models (LLMs) and search engines by providing structured, interconnected data that improves reasoning and context-awareness. However, KGs only focus on text data, thereby neglecting other modalities such as speech. In this work, we introduce wav2graph, the first framework for supervised learning knowledge graph from speech data. Our… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Preprint, 32 pages

  2. arXiv:2407.21054  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Sentiment Reasoning for Healthcare

    Authors: Khai Le-Duc, Khai-Nguyen Nguyen, Bach Phan Tat, Duy Le, Jerry Ngo, Long Vo-Dang, Anh Totti Nguyen, Truong-Son Hy

    Abstract: Transparency in AI decision-making is crucial in healthcare due to the severe consequences of errors, and this is important for building trust among AI and users in sentiment analysis task. Incorporating reasoning capabilities helps Large Language Models (LLMs) understand human emotions within broader contexts, handle nuanced and ambiguous language, and infer underlying sentiments that may not be… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Preprint, 18 pages

  3. arXiv:2407.12064  [pdf, other

    eess.IV cs.CL cs.CV cs.LG cs.MM

    LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task

    Authors: Khai Le-Duc, Ryan Zhang, Ngoc Son Nguyen, Tan-Hanh Pham, Anh Dao, Ba Hung Ngo, Anh Totti Nguyen, Truong-Son Hy

    Abstract: Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Preprint, 19 pages

  4. arXiv:2406.15888  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Real-time Speech Summarization for Medical Conversations

    Authors: Khai Le-Duc, Khai-Nguyen Nguyen, Long Vo-Dang, Truong-Son Hy

    Abstract: In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation.… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  5. arXiv:2406.13337  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Medical Spoken Named Entity Recognition

    Authors: Khai Le-Duc, David Thulke, Hung-Phong Tran, Long Vo-Dang, Khai-Nguyen Nguyen, Truong-Son Hy, Ralf Schlüter

    Abstract: Spoken Named Entity Recognition (NER) aims to extracting named entities from speech and categorizing them into types like person, location, organization, etc. In this work, we present VietMed-NER - the first spoken NER dataset in the medical domain. To our best knowledge, our real-world dataset is the largest spoken NER dataset in the world in terms of the number of entity types, featuring 18 dist… ▽ More

    Submitted 20 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Preprint, 41 pages

  6. arXiv:2404.05659  [pdf

    cs.CL cs.AI eess.AS

    VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

    Authors: Khai Le-Duc

    Abstract: Due to privacy restrictions, there's a shortage of publicly available speech recognition datasets in the medical domain. In this work, we present VietMed - a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled medical speech and 1200h of unlabeled general-domain speech. To our best knowledge, VietMed is by far the world's largest… ▽ More

    Submitted 28 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: LREC-COLING 2024, 27 pages

  7. arXiv:2309.15869  [pdf, other

    cs.CL cs.SD eess.AS

    Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in the HYKIST Project

    Authors: Khai Le-Duc

    Abstract: In today's interconnected globe, moving abroad is more and more prevalent, whether it's for employment, refugee resettlement, or other causes. Language difficulties between natives and immigrants present a common issue on a daily basis, especially in medical domain. This can make it difficult for patients and doctors to communicate during anamnesis or in the emergency room, which compromises patie… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Bachelor Thesis

    Journal ref: FH Aachen University of Applied Sciences (2023)

  8. arXiv:2210.13397  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

    Authors: Christoph Lüscher, Mohammad Zeineldeen, Zijian Yang, Tina Raissi, Peter Vieting, Khai Le-Duc, Weiyue Wang, Ralf Schlüter, Hermann Ney

    Abstract: Language barriers present a great challenge in our increasingly connected and global world. Especially within the medical domain, e.g. hospital or emergency room, communication difficulties and delays may lead to malpractice and non-optimal patient care. In the HYKIST project, we consider patient-physician communication, more specifically between a German-speaking physician and an Arabic- or Vietn… ▽ More

    Submitted 22 September, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: ASR System Paper for HYKIST project