Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Madge, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12734  [pdf, other

    cs.CL

    A LLM Benchmark based on the Minecraft Builder Dialog Agent Task

    Authors: Chris Madge, Massimo Poesio

    Abstract: In this work we proposing adapting the Minecraft builder task into an LLM benchmark suitable for evaluating LLM ability in spatially orientated tasks, and informing builder agent design. Previous works have proposed corpora with varying complex structures, and human written instructions. We instead attempt to provide a comprehensive synthetic benchmark for testing builder agents over a series of d… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2402.08392  [pdf, other

    cs.CL

    Large Language Models as Minecraft Agents

    Authors: Chris Madge, Massimo Poesio

    Abstract: In this work we examine the use of Large Language Models (LLMs) in the challenging setting of acting as a Minecraft agent. We apply and evaluate LLMs in the builder and architect settings, introduce clarification questions and examining the challenges and opportunities for improvement. In addition, we present a platform for online interaction with the agents and an evaluation against previous work… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.