Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Muraoka, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13300  [pdf, other

    cs.CL eess.AS

    Robust ASR Error Correction with Conservative Data Filtering

    Authors: Takuma Udagawa, Masayuki Suzuki, Masayasu Muraoka, Gakuto Kurata

    Abstract: Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2405.10725  [pdf, other

    cs.CL cs.IR

    INDUS: Effective and Efficient Language Models for Scientific Applications

    Authors: Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, Kaylin Bugbee, Mike Little, Elizabeth Fancher, Lauren Sanders, Sylvain Costes, Sergi Blanco-Cuaresma, Kelly Lockhart, Thomas Allen, Felix Grezes, Megan Ansdell, Alberto Accomazzi, Yousef El-Kurdi, Davis Wertheimer, Birgit Pfitzmann, Cesar Berrospi Ramis , et al. (9 additional authors not shown)

    Abstract: Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics,… ▽ More

    Submitted 20 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  3. arXiv:2309.04031  [pdf, other

    cs.CL cs.SD eess.AS

    Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

    Authors: Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Masayasu Muraoka, George Saon

    Abstract: Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different laye… ▽ More

    Submitted 25 December, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024