Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: McLeish, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04229  [pdf, other

    cs.LG cs.AI cs.CL cs.DS stat.ML

    The CLRS-Text Algorithmic Reasoning Language Benchmark

    Authors: Larisa Markeeva, Sean McLeish, Borja Ibarz, Wilfried Bounsi, Olga Kozlova, Alex Vitvitskyi, Charles Blundell, Tom Goldstein, Avi Schwarzschild, Petar Veličković

    Abstract: Eliciting reasoning capabilities from language models (LMs) is a critical direction on the path towards building intelligent systems. Most recent studies dedicated to reasoning focus on out-of-distribution performance on procedurally-generated synthetic benchmarks, bespoke-built to evaluate specific skills only. This trend makes results hard to transfer across publications, slowing down progress.… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Preprint, under review. Comments welcome

  2. arXiv:2405.17399  [pdf, other

    cs.LG cs.AI

    Transformers Can Do Arithmetic with the Right Embeddings

    Authors: Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein

    Abstract: The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix ena… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2404.03441  [pdf, other

    cs.AI cs.CL cs.LG

    Benchmarking ChatGPT on Algorithmic Reasoning

    Authors: Sean McLeish, Avi Schwarzschild, Tom Goldstein

    Abstract: We evaluate ChatGPT's ability to solve algorithm problems from the CLRS benchmark suite that is designed for GNNs. The benchmark requires the use of a specified classical algorithm to solve a given problem. We find that ChatGPT outperforms specialist GNN models, using Python to successfully solve these problems. This raises new points in the discussion about learning algorithms with neural network… ▽ More

    Submitted 16 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.