Skip to main content

Showing 1–3 of 3 results for author: Surikuchi, A K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04559  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition

    Authors: Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle

    Abstract: Visual storytelling consists in generating a natural language story given a temporally ordered sequence of images. This task is not only challenging for models, but also very difficult to evaluate with automatic metrics since there is no consensus about what makes a story 'good'. In this paper, we introduce a novel method that measures story quality in terms of human likeness regarding three key a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2406.18403  [pdf, other

    cs.CL

    LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

    Authors: Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni

    Abstract: There is an increasing trend towards evaluating NLP models with LLM-generated judgments instead of human judgments. In the absence of a comparison against human data, this raises concerns about the validity of these evaluations; in case they are conducted with proprietary models, this also raises concerns over reproducibility. We provide JUDGE-BENCH, a collection of 20 NLP datasets with human anno… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2310.17770  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    GROOViST: A Metric for Grounding Objects in Visual Storytelling

    Authors: Aditya K Surikuchi, Sandro Pezzelle, Raquel Fernández

    Abstract: A proper evaluation of stories generated for a sequence of images -- the task commonly referred to as visual storytelling -- must consider multiple aspects, such as coherence, grammatical correctness, and visual grounding. In this work, we focus on evaluating the degree of grounding, that is, the extent to which a story is about the entities shown in the images. We analyze current metrics, both de… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: In EMNLP 2023 main conference proceedings (to appear)