Zum Hauptinhalt springen

Showing 1–1 of 1 results for author: Solway, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16753  [pdf, ps, other

    cs.CL cs.LG

    Reinforcement Learning without Human Feedback for Last Mile Fine-Tuning of Large Language Models

    Authors: Alec Solway

    Abstract: Reinforcement learning is used to align language models with human preference signals after first pre-training the model to predict the next token of text within a large corpus using likelihood maximization. Before being deployed in a specific domain, models are often further fine-tuned on task specific data. Since human preferences are often unavailable for the last step, it is performed using li… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.