Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Parisi, A T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13094  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring and Benchmarking the Planning Capabilities of Large Language Models

    Authors: Bernd Bohnet, Azade Nova, Aaron T Parisi, Kevin Swersky, Katayoon Goshvadi, Hanjun Dai, Dale Schuurmans, Noah Fiedel, Hanie Sedghi

    Abstract: We seek to elevate the planning capabilities of Large Language Models (LLMs)investigating four main directions. First, we construct a comprehensive benchmark suite encompassing both classical planning domains and natural language scenarios. This suite includes algorithms to generate instances with varying levels of difficulty, allowing for rigorous and systematic evaluation of LLM performance. Sec… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2406.00179  [pdf, other

    cs.CL cs.AI

    Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation

    Authors: Bernd Bohnet, Kevin Swersky, Rosanne Liu, Pranjal Awasthi, Azade Nova, Javier Snaider, Hanie Sedghi, Aaron T Parisi, Michael Collins, Angeliki Lazaridou, Orhan Firat, Noah Fiedel

    Abstract: We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Previous efforts to construct such datasets relied on crowd-sourcing, but the emergence of transformers with a context size of 1 million or more tokens now enables entirely automatic approaches. Our objective is to test the capabilities of LLMs to analyze, unde… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.