Skip to main content

Showing 1–1 of 1 results for author: Ramos, M M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.09132  [pdf, other

    cs.CL

    Aligning Neural Machine Translation Models: Human Feedback in Training and Inference

    Authors: Miguel Moura Ramos, Patrick Fernandes, António Farinhas, André F. T. Martins

    Abstract: Reinforcement learning from human feedback (RLHF) is a recent technique to improve the quality of the text generated by a language model, making it closer to what humans would generate. A core ingredient in RLHF's success in aligning and improving large language models (LLMs) is its reward model, trained using human feedback on model outputs. In machine translation (MT), where metrics trained from… ▽ More

    Submitted 4 July, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: EAMT 2024