Zum Hauptinhalt springen

Showing 1–1 of 1 results for author: Grosch, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.00998  [pdf, other

    cs.CL

    Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods

    Authors: Polina Tsvilodub, Hening Wang, Sharon Grosch, Michael Franke

    Abstract: This paper systematically compares different methods of deriving item-level predictions of language models for multiple-choice tasks. It compares scoring methods for answer options based on free generation of responses, various probability-based scores, a Likert-scale style rating method, and embedding similarity. In a case study on pragmatic language interpretation, we find that LLM predictions a… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures