Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Voudouris, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12844  [pdf, other

    cs.CL cs.LG stat.ML

    $\texttt{metabench}$ -- A Sparse Benchmark to Measure General Ability in Large Language Models

    Authors: Alex Kipnis, Konstantinos Voudouris, Luca M. Schulze Buschoff, Eric Schulz

    Abstract: Large Language Models (LLMs) vary in their abilities on a range of tasks. Initiatives such as the $\texttt{Open LLM Leaderboard}$ aim to quantify these differences with several large benchmarks (sets of test items to which an LLM can respond either correctly or incorrectly). However, high correlations within and between benchmark scores suggest that (1) there exists a small set of common underlyin… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: LLMs, benchmarking, IRT, information, compression

  2. arXiv:2312.11414  [pdf, other

    cs.AI

    Animal-AI 3: What's New & Why You Should Care

    Authors: Konstantinos Voudouris, Ibrahim Alhas, Wout Schellaert, Matthew Crosby, Joel Holmes, John Burden, Niharika Chaubey, Niall Donnelly, Matishalin Patel, Marta Halina, José Hernández-Orallo, Lucy G. Cheke

    Abstract: The Animal-AI Environment is a unique game-based research platform designed to serve both the artificial intelligence and cognitive science research communities. In this paper, we present Animal-AI 3, the latest version of the environment, outlining several major new features that make the game more engaging for humans and more complex for AI systems. New features include interactive buttons, rewa… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2310.06167  [pdf

    cs.AI

    Predictable Artificial Intelligence

    Authors: Lexin Zhou, Pablo A. Moreno-Casares, Fernando Martínez-Plumed, John Burden, Ryan Burnell, Lucy Cheke, Cèsar Ferri, Alexandru Marcoci, Behzad Mehrbakhsh, Yael Moros-Daval, Seán Ó hÉigeartaigh, Danaja Rutar, Wout Schellaert, Konstantinos Voudouris, José Hernández-Orallo

    Abstract: We introduce the fundamental ideas and challenges of Predictable AI, a nascent research area that explores the ways in which we can anticipate key indicators of present and future AI ecosystems. We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems, and thus should be prioritised over performance. While distinctive from oth… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 11 pages excluding references, 4 figures, and 2 tables. Paper Under Review

    MSC Class: ACM-class: I.2

  4. arXiv:2309.11975  [pdf, other

    cs.AI

    Inferring Capabilities from Task Performance with Bayesian Triangulation

    Authors: John Burden, Konstantinos Voudouris, Ryan Burnell, Danaja Rutar, Lucy Cheke, José Hernández-Orallo

    Abstract: As machine learning models become more general, we need to characterise them in richer, more meaningful ways. We describe a method to infer the cognitive profile of a system from diverse experimental data. To do so, we introduce measurement layouts that model how task-instance features interact with system capabilities to affect performance. These features must be triangulated in complex ways to b… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 8 Pages + 14 pages of Appendices. 15 Figures. Submitted to AAAI 2024. Preprint

  5. Harms from Increasingly Agentic Algorithmic Systems

    Authors: Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj

    Abstract: Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems… ▽ More

    Submitted 11 May, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted at FAccT 2023