Zum Hauptinhalt springen

Showing 1–33 of 33 results for author: Sudoh, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06650  [pdf, other

    cs.CL

    A Word Order Synchronization Metric for Evaluating Simultaneous Interpretation and Translation

    Authors: Mana Makinae, Katsuhito Sudoh, Mararu Yamada, Satoshi Nakamura

    Abstract: Simultaneous interpretation (SI), the translation of one language to another in real time, starts translation before the original speech has finished. Its evaluation needs to consider both latency and quality. This trade-off is challenging especially for distant word order language pairs such as English and Japanese. To handle this word order gap, interpreters maintain the word order of the source… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2407.00826  [pdf, other

    cs.CL cs.SD eess.AS

    NAIST Simultaneous Speech Translation System for IWSLT 2024

    Authors: Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Haotian Tan, Makoto Sakai, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign: English-to-{German, Japanese, Chinese} speech-to-text translation and English-to-Japanese speech-to-speech translation. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding poli… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: IWSLT 2024 system paper

  3. arXiv:2406.13476  [pdf, other

    cs.CL

    LLMs Are Zero-Shot Context-Aware Simultaneous Translators

    Authors: Roman Koshkin, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: The advent of transformers has fueled progress in machine translation. More recently large language models (LLMs) have come to the spotlight thanks to their generality and strong performance in a wide range of language tasks, including translation. Here we show that open-source LLMs perform on par with or better than some state-of-the-art baselines in simultaneous machine translation (SiMT) tasks,… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.08940  [pdf, other

    cs.CL

    Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation

    Authors: Kosuke Doi, Yuka Ko, Mana Makinae, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: This paper analyzes the features of monotonic translations, which follow the word order of the source language, in simultaneous interpreting (SI). Word order differences are one of the biggest challenges in SI, especially for language pairs with significant structural differences like English and Japanese. We analyzed the characteristics of chunk-wise monotonic translation (CMT) sentences using th… ▽ More

    Submitted 15 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to IWSLT2024

  5. arXiv:2406.08817  [pdf, other

    cs.CL

    Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory

    Authors: Kosuke Doi, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: This study examines the effect of grammatical features in automatic essay scoring (AES). We use two kinds of grammatical features as input to an AES model: (1) grammatical items that writers used correctly in essays, and (2) the number of grammatical errors. Experimental results show that grammatical features improve the performance of AES models that predict the holistic scores of essays. Multi-t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to BEA2024

  6. arXiv:2406.03881  [pdf, other

    cs.CL

    Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

    Authors: Matthias Sperber, Ondřej Bojar, Barry Haddow, Dávid Javorský, Xutai Ma, Matteo Negri, Jan Niehues, Peter Polák, Elizabeth Salesky, Katsuhito Sudoh, Marco Turchi

    Abstract: Human evaluation is a critical component in machine translation system development and has received much attention in text translation research. However, little prior work exists on the topic of human evaluation for speech translation, which adds additional challenges such as noisy data and segmentation mismatches. We take first steps to fill this gap by conducting a comprehensive human evaluation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: LREC-COLING2024 publication (with corrections for Table 3)

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  7. arXiv:2402.04636  [pdf, other

    cs.CL

    TransLLaMa: LLM-based Simultaneous Translation System

    Authors: Roman Koshkin, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Decoder-only large language models (LLMs) have recently demonstrated impressive capabilities in text generation and reasoning. Nonetheless, they have limited applications in simultaneous machine translation (SiMT), currently dominated by encoder-decoder transformers. This study demonstrates that, after fine-tuning on a small dataset comprising causally aligned source and target sentence pairs, a p… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  8. arXiv:2311.14353  [pdf, other

    cs.CL

    Average Token Delay: A Duration-aware Latency Metric for Simultaneous Translation

    Authors: Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Simultaneous translation is a task in which the translation begins before the end of an input speech segment. Its evaluation should be conducted based on latency in addition to quality, and for users, the smallest possible amount of latency is preferable. Most existing metrics measure latency based on the start timings of partial translations and ignore their duration. This means such metrics do n… ▽ More

    Submitted 27 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Extended version of the paper (doi: 10.21437/Interspeech.2023-933) which appeared in INTERSPEECH 2023

  9. arXiv:2306.08582  [pdf, other

    cs.CL cs.SD eess.AS

    Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data

    Authors: Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Simultaneous speech translation (SimulST) translates partial speech inputs incrementally. Although the monotonic correspondence between input and output is preferable for smaller latency, it is not the case for distant language pairs such as English and Japanese. A prospective approach to this problem is to mimic simultaneous interpretation (SI) using SI data to train a SimulST model. However, the… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted to IWSLT2023 scientific paper

  10. arXiv:2304.11766  [pdf, other

    cs.CL

    NAIST-SIC-Aligned: an Aligned English-Japanese Simultaneous Interpretation Corpus

    Authors: Jinming Zhao, Yuka Ko, Kosuke Doi, Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: It remains a question that how simultaneous interpretation (SI) data affects simultaneous machine translation (SiMT). Research has been limited due to the lack of a large-scale training corpus. In this work, we aim to fill in the gap by introducing NAIST-SIC-Aligned, which is an automatically-aligned parallel English-Japanese SI dataset. Starting with a non-aligned corpus NAIST-SIC, we propose a t… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: LREC-Coling 2024

  11. arXiv:2303.00311  [pdf, other

    cs.CL cs.AI cs.IR

    Modeling Multiple User Interests using Hierarchical Knowledge for Conversational Recommender System

    Authors: Yuka Okuda, Katsuhito Sudoh, Seitaro Shinagawa, Satoshi Nakamura

    Abstract: A conversational recommender system (CRS) is a practical application for item recommendation through natural language conversation. Such a system estimates user interests for appropriate personalized recommendations. Users sometimes have various interests in different categories or genres, but existing studies assume a unique user interest that can be covered by closely related items. In this work… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted as a conference paper at IWSDS 2023

  12. arXiv:2302.05619  [pdf, other

    cs.CL cs.AI

    Evaluating the Robustness of Discrete Prompts

    Authors: Yoichi Ishibashi, Danushka Bollegala, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Discrete prompts have been used for fine-tuning Pre-trained Language Models for diverse NLP tasks. In particular, automatic methods that generate discrete prompts from a small set of training instances have reported superior performance. However, a closer look at the learnt prompts reveals that they contain noisy and counter-intuitive lexical constructs that would not be encountered in manually-wr… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

    Comments: Accepted at EACL 2023

  13. arXiv:2211.13173  [pdf, other

    cs.CL cs.SD

    Average Token Delay: A Latency Metric for Simultaneous Translation

    Authors: Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Simultaneous translation is a task in which translation begins before the speaker has finished speaking. In its evaluation, we have to consider the latency of the translation in addition to the quality. The latency is preferably as small as possible for users to comprehend what the speaker says with a small delay. Existing latency metrics focus on when the translation starts but do not consider ad… ▽ More

    Submitted 8 February, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

  14. arXiv:2211.00513  [pdf, other

    cs.CL

    E2E Refined Dataset

    Authors: Keisuke Toyama, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Although the well-known MR-to-text E2E dataset has been used by many researchers, its MR-text pairs include many deletion/insertion/substitution errors. Since such errors affect the quality of MR-to-text systems, they must be fixed as much as possible. Therefore, we developed a refined dataset and some python programs that convert the original E2E dataset into a refined dataset.

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: 4 pages

    ACM Class: I.2.7

  15. arXiv:2210.13034  [pdf, other

    cs.CL cs.LG

    Subspace Representations for Soft Set Operations and Sentence Similarities

    Authors: Yoichi Ishibashi, Sho Yokoi, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: In the field of natural language processing (NLP), continuous vector representations are crucial for capturing the semantic meanings of individual words. Yet, when it comes to the representations of sets of words, the conventional vector-based approaches often struggle with expressiveness and lack the essential set operations such as union, intersection, and complement. Inspired by quantum logic,… ▽ More

    Submitted 9 April, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted at NAACL 2024

  16. arXiv:2203.15479  [pdf, other

    cs.CL cs.SD eess.AS

    Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

    Authors: Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segm… ▽ More

    Submitted 13 July, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted to INTERSPEECH 2022

  17. arXiv:2203.14725  [pdf, other

    cs.SD

    vTTS: visual-text to speech

    Authors: Yoshifumi Nakano, Takaaki Saeki, Shinnosuke Takamichi, Katsuhito Sudoh, Hiroshi Saruwatari

    Abstract: This paper proposes visual-text to speech (vTTS), a method for synthesizing speech from visual text (i.e., text as an image). Conventional TTS converts phonemes or characters into discrete symbols and synthesizes a speech waveform from them, thus losing the visual features that the characters essentially have. Therefore, our method synthesizes speech not from discrete symbols but from visual text.… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: submitted to interspech 2022

  18. arXiv:2110.13480  [pdf, other

    cs.CL

    Simultaneous Neural Machine Translation with Constituent Label Prediction

    Authors: Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Simultaneous translation is a task in which translation begins before the speaker has finished speaking, so it is important to decide when to start the translation process. However, deciding whether to read more input words or start to translate is difficult for language pairs with different word orders such as English and Japanese. Motivated by the concept of pre-reordering, we propose a couple o… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: WMT2021

  19. arXiv:2107.13689  [pdf, other

    cs.CL

    Using Perturbed Length-aware Positional Encoding for Non-autoregressive Neural Machine Translation

    Authors: Yui Oka, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Non-autoregressive neural machine translation (NAT) usually employs sequence-level knowledge distillation using autoregressive neural machine translation (AT) as its teacher model. However, a NAT model often outputs shorter sentences than an AT model. In this work, we propose sequence-level knowledge distillation (SKD) using perturbed length-aware positional encoding and apply it to a student mode… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

    Comments: 5 pages, 1 figures. Will be presented at ACL SRW 2021

  20. arXiv:2106.07999  [pdf, other

    cs.CL

    ARTA: Collection and Classification of Ambiguous Requests and Thoughtful Actions

    Authors: Shohei Tanaka, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Human-assisting systems such as dialogue systems must take thoughtful, appropriate actions not only for clear and unambiguous user requests, but also for ambiguous user requests, even if the users themselves are not aware of their potential requirements. To construct such a dialogue agent, we collected a corpus and developed a model that classifies ambiguous user requests into corresponding system… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted by The 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL2021)

  21. arXiv:2011.04845  [pdf, other

    cs.CL

    Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS

    Authors: Katsuhito Sudoh, Takatomo Kano, Sashi Novitasari, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura

    Abstract: This paper presents a newly developed, simultaneous neural speech-to-speech translation system and its evaluation. The system consists of three fully-incremental neural processing modules for automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS). We investigated its overall latency in the system's Ear-Voice Span and speaking latency along with module-leve… ▽ More

    Submitted 11 November, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: 6 pages

  22. arXiv:2010.09413  [pdf, other

    cs.CV cs.CL

    Image Captioning with Visual Object Representations Grounded in the Textual Modality

    Authors: Dušan Variš, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: We present our work in progress exploring the possibilities of a shared embedding space between textual and visual modality. Leveraging the textual nature of object detection labels and the hypothetical expressiveness of extracted visual object representations, we propose an approach opposite to the current trend, grounding of the representations in the word embedding space of the captioning syste… ▽ More

    Submitted 20 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

  23. arXiv:2007.02598  [pdf, other

    cs.CL

    Reflection-based Word Attribute Transfer

    Authors: Yoichi Ishibashi, Katsuhito Sudoh, Koichiro Yoshino, Satoshi Nakamura

    Abstract: Word embeddings, which often represent such analogic relations as king - man + woman = queen, can be used to change a word's attribute, including its gender. For transferring king into queen in this analogy-based manner, we subtract a difference vector man - woman based on the knowledge that king is male. However, developing such knowledge is very costly for words and attributes. In this work, we… ▽ More

    Submitted 7 July, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: Accepted at ACL 2020 Student Research Workshop (SRW)

  24. arXiv:1911.11933  [pdf, other

    cs.CL

    Simultaneous Neural Machine Translation using Connectionist Temporal Classification

    Authors: Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Simultaneous machine translation is a variant of machine translation that starts the translation process before the end of an input. This task faces a trade-off between translation accuracy and latency. We have to determine when we start the translation for observed inputs so far, to achieve good practical performance. In this work, we propose a neural machine translation method to determine this… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  25. arXiv:1910.13299  [pdf, other

    cs.CL

    Findings of the Third Workshop on Neural Generation and Translation

    Authors: Hiroaki Hayashi, Yusuke Oda, Alexandra Birch, Ioannis Konstas, Andrew Finch, Minh-Thang Luong, Graham Neubig, Katsuhito Sudoh

    Abstract: This document describes the findings of the Third Workshop on Neural Generation and Translation, held in concert with the annual conference of the Empirical Methods in Natural Language Processing (EMNLP 2019). First, we summarize the research trends of papers presented in the proceedings. Second, we describe the results of the two shared tasks 1) efficient neural machine translation (NMT) where pa… ▽ More

    Submitted 29 October, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: Fixed the metadata (author list)

  26. arXiv:1906.09795  [pdf, other

    cs.CL

    Conversational Response Re-ranking Based on Event Causality and Role Factored Tensor Event Embedding

    Authors: Shohei Tanaka, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: We propose a novel method for selecting coherent and diverse responses for a given dialogue context. The proposed method re-ranks response candidates generated from conversational models by using event causality relations between events in a dialogue history and response candidates (e.g., ``be stressed out'' precedes ``relieve stress''). We use distributed event representation based on the Role Fa… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Comments: Accepted by 1st Workshop NLP for Conversational AI, ACL 2019 Workshop (ConvAI)

  27. arXiv:1811.08100  [pdf, other

    cs.CL

    Another Diversity-Promoting Objective Function for Neural Dialogue Generation

    Authors: Ryo Nakamura, Katsuhito Sudoh, Koichiro Yoshino, Satoshi Nakamura

    Abstract: Although generation-based dialogue systems have been widely researched, the response generations by most existing systems have very low diversities. The most likely reason for this problem is Maximum Likelihood Estimation (MLE) with Softmax Cross-Entropy (SCE) loss. MLE trains models to generate the most frequent responses from enormous generation candidates, although in actual dialogues there are… ▽ More

    Submitted 20 November, 2018; v1 submitted 20 November, 2018; originally announced November 2018.

    Comments: AAAI 2019 Workshop on Reasoning and Learning for Human-Machine Dialogues (DEEP-DIAL 2019)

  28. arXiv:1810.06826  [pdf, other

    cs.CL

    Multi-Source Neural Machine Translation with Data Augmentation

    Authors: Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, Satoshi Nakamura

    Abstract: Multi-source translation systems translate from multiple languages to a single target language. By using information from these multiple sources, these systems achieve large gains in accuracy. To train these systems, it is necessary to have corpora with parallel text in multiple sources and the target language. However, these corpora are rarely complete in practice due to the difficulty of providi… ▽ More

    Submitted 8 November, 2018; v1 submitted 16 October, 2018; originally announced October 2018.

    Comments: 15th International Workshop on Spoken Language Translation 2018

  29. arXiv:1807.11219  [pdf, ps, other

    cs.CL

    Training Neural Machine Translation using Word Embedding-based Loss

    Authors: Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: In neural machine translation (NMT), the computational cost at the output layer increases with the size of the target-side vocabulary. Using a limited-size vocabulary instead may cause a significant decrease in translation quality. This trade-off is derived from a softmax-based loss function that handles in-dictionary words independently, in which word similarity is not considered. In this paper,… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.

  30. arXiv:1806.02525  [pdf, other

    cs.CL

    Multi-Source Neural Machine Translation with Missing Data

    Authors: Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, Satoshi Nakamura

    Abstract: Multi-source translation is an approach to exploit multiple inputs (e.g. in two different languages) to increase translation accuracy. In this paper, we examine approaches for multi-source neural machine translation (NMT) using an incomplete multilingual corpus in which some translations are missing. In practice, many multilingual corpora are not complete due to the difficulty to provide translati… ▽ More

    Submitted 7 June, 2018; v1 submitted 7 June, 2018; originally announced June 2018.

    Comments: ACL 2018 Workshop on Neural Machine Translation and Generation

  31. arXiv:1706.05765  [pdf, other

    cs.CL

    An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation

    Authors: Makoto Morishita, Yusuke Oda, Graham Neubig, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the a… ▽ More

    Submitted 18 June, 2017; originally announced June 2017.

    Comments: 8 pages, accepted to the First Workshop on Neural Machine Translation

  32. arXiv:1704.00380  [pdf, other

    cs.CL

    Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings

    Authors: Junki Matsuo, Mamoru Komachi, Katsuhito Sudoh

    Abstract: One of the most important problems in machine translation (MT) evaluation is to evaluate the similarity between translation hypotheses with different surface forms from the reference, especially at the segment level. We propose to use word embeddings to perform word alignment for segment-level MT evaluation. We performed experiments with three types of alignment methods using word embeddings. We e… ▽ More

    Submitted 2 April, 2017; originally announced April 2017.

    Comments: 5 pages

  33. Reading Comprehension using Entity-based Memory Network

    Authors: Xun Wang, Katsuhito Sudoh, Masaaki Nagata, Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

    Abstract: This paper introduces a novel neural network model for question answering, the \emph{entity-based memory network}. It enhances neural networks' ability of representing and calculating information over a long period by keeping records of entities contained in text. The core component is a memory pool which comprises entities' states. These entities' states are continuously updated according to the… ▽ More

    Submitted 1 February, 2017; v1 submitted 12 December, 2016; originally announced December 2016.