Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Rim, W B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.14966  [pdf, other

    cs.CL

    Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains

    Authors: Chia-Chien Hung, Wiem Ben Rim, Lindsay Frost, Lars Bruckner, Carolin Lawrence

    Abstract: High-risk domains pose unique challenges that require language models to provide accurate and safe responses. Despite the great success of large language models (LLMs), such as ChatGPT and its variants, their performance in high-risk domains remains unclear. Our study delves into an in-depth analysis of the performance of instruction-tuned LLMs, focusing on factual accuracy and safety adherence. T… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Workshop on Benchmarking Generalisation in NLP (GenBench)

  2. arXiv:2208.11024  [pdf, other

    cs.AI

    KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models

    Authors: Haris Widjaja, Kiril Gashteovski, Wiem Ben Rim, Pengfei Liu, Christopher Malon, Daniel Ruffinelli, Carolin Lawrence, Graham Neubig

    Abstract: Knowledge Graphs (KGs) store information in the form of (head, predicate, tail)-triples. To augment KGs with new knowledge, researchers proposed models for KG Completion (KGC) tasks such as link prediction; i.e., answering (h; p; ?) or (?; p; t) queries. Such models are usually evaluated with averaged metrics on a held-out test set. While useful for tracking progress, averaged single-score metrics… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  3. arXiv:2205.12749  [pdf, other

    cs.AI cs.HC

    A Human-Centric Assessment Framework for AI

    Authors: Sascha Saralajew, Ammar Shaker, Zhao Xu, Kiril Gashteovski, Bhushan Kotnis, Wiem Ben Rim, Jürgen Quittek, Carolin Lawrence

    Abstract: With the rise of AI systems in real-world applications comes the need for reliable and trustworthy AI. An essential aspect of this are explainable AI systems. However, there is no agreed standard on how explainable AI systems should be assessed. Inspired by the Turing test, we introduce a human-centric assessment framework where a leading domain expert accepts or rejects the solutions of an AI sys… ▽ More

    Submitted 1 July, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted as submission to ICML 2022 Workshop on Human-Machine Collaboration and Teaming