Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Hy, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.04174  [pdf, other

    cs.CL cs.AI cs.IR cs.LG cs.SD eess.AS

    wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech

    Authors: Khai Le-Duc, Quy-Anh Dang, Tan-Hanh Pham, Truong-Son Hy

    Abstract: Knowledge graphs (KGs) enhance the performance of large language models (LLMs) and search engines by providing structured, interconnected data that improves reasoning and context-awareness. However, KGs only focus on text data, thereby neglecting other modalities such as speech. In this work, we introduce wav2graph, the first framework for supervised learning knowledge graph from speech data. Our… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Preprint, 32 pages

  2. arXiv:2407.21054  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Sentiment Reasoning for Healthcare

    Authors: Khai Le-Duc, Khai-Nguyen Nguyen, Bach Phan Tat, Duy Le, Jerry Ngo, Long Vo-Dang, Anh Totti Nguyen, Truong-Son Hy

    Abstract: Transparency in AI decision-making is crucial in healthcare due to the severe consequences of errors, and this is important for building trust among AI and users in sentiment analysis task. Incorporating reasoning capabilities helps Large Language Models (LLMs) understand human emotions within broader contexts, handle nuanced and ambiguous language, and infer underlying sentiments that may not be… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Preprint, 18 pages

  3. arXiv:2407.12064  [pdf, other

    eess.IV cs.CL cs.CV cs.LG cs.MM

    LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task

    Authors: Khai Le-Duc, Ryan Zhang, Ngoc Son Nguyen, Tan-Hanh Pham, Anh Dao, Ba Hung Ngo, Anh Totti Nguyen, Truong-Son Hy

    Abstract: Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Preprint, 19 pages

  4. arXiv:2406.15888  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Real-time Speech Summarization for Medical Conversations

    Authors: Khai Le-Duc, Khai-Nguyen Nguyen, Long Vo-Dang, Truong-Son Hy

    Abstract: In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation.… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  5. arXiv:2406.13337  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Medical Spoken Named Entity Recognition

    Authors: Khai Le-Duc, David Thulke, Hung-Phong Tran, Long Vo-Dang, Khai-Nguyen Nguyen, Truong-Son Hy, Ralf Schlüter

    Abstract: Spoken Named Entity Recognition (NER) aims to extracting named entities from speech and categorizing them into types like person, location, organization, etc. In this work, we present VietMed-NER - the first spoken NER dataset in the medical domain. To our best knowledge, our real-world dataset is the largest spoken NER dataset in the world in terms of the number of entity types, featuring 18 dist… ▽ More

    Submitted 20 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Preprint, 41 pages