Skip to main content

Showing 1–4 of 4 results for author: Raffel, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.10443  [pdf, other

    cs.CL cs.LG

    Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation

    Authors: Matthew Raffel, Victor Agostinelli, Lizhong Chen

    Abstract: Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on prompting optimization strategies using either data augmentation or prompt structure modifications. However, these methods suffer from several issues, such… ▽ More

    Submitted 26 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  2. arXiv:2312.04691  [pdf, other

    cs.CL cs.AI

    Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models

    Authors: Victor Agostinelli, Max Wild, Matthew Raffel, Kazi Ahmed Asif Fuad, Lizhong Chen

    Abstract: Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks. Neural machine translation (NMT) is one such task that LLMs have been applied to with great success. However, little research has focused on applying LLMs to the more diff… ▽ More

    Submitted 4 July, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: ACL 2024

  3. arXiv:2307.01381  [pdf, other

    cs.CL cs.LG

    Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

    Authors: Matthew Raffel, Lizhong Chen

    Abstract: Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, in… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Accepted at Findings of ACL 2023

  4. arXiv:2307.01377  [pdf, other

    cs.CL cs.LG

    Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

    Authors: Matthew Raffel, Drew Penney, Lizhong Chen

    Abstract: Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation. However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maint… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Accepted at ICML 2023