Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Jones, C R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.08853  [pdf, other

    cs.HC cs.CL

    GPT-4 is judged more human than humans in displaced and inverted Turing tests

    Authors: Ishika Rathi, Sydney Taylor, Benjamin K. Bergen, Cameron R. Jones

    Abstract: Everyday AI detection requires differentiating between people and AI in informal, online conversations. In many cases, people will not interact directly with AI systems but instead read conversations between AI systems and other people. We measured how well people and large language models can discriminate using two modified versions of the Turing test: inverted and displaced. GPT-3.5, GPT-4, and… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  2. arXiv:2406.14737  [pdf, other

    cs.CL

    Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?

    Authors: Zhiqiang Pi, Annapurna Vadaparty, Benjamin K. Bergen, Cameron R. Jones

    Abstract: Recent empirical results have sparked a debate about whether or not Large Language Models (LLMs) are capable of Theory of Mind (ToM). While some have found LLMs to be successful on ToM evaluations such as the False Belief task (Kosinski, 2023), others have argued that LLMs solve these tasks by exploiting spurious correlations -- not representing beliefs -- since they fail on trivial alterations to… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2405.08007  [pdf, other

    cs.HC cs.AI

    People cannot distinguish GPT-4 from a human in a Turing test

    Authors: Cameron R. Jones, Benjamin K. Bergen

    Abstract: We evaluated 3 systems (ELIZA, GPT-3.5 and GPT-4) in a randomized, controlled, and preregistered Turing test. Human participants had a 5 minute conversation with either a human or an AI, and judged whether or not they thought their interlocutor was human. GPT-4 was judged to be a human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%). The results provide the first… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 23 pages, 13 figures

  4. arXiv:2310.20216  [pdf, other

    cs.AI cs.CL

    Does GPT-4 pass the Turing test?

    Authors: Cameron R. Jones, Benjamin K. Bergen

    Abstract: We evaluated GPT-4 in a public online Turing test. The best-performing GPT-4 prompt passed in 49.7% of games, outperforming ELIZA (22%) and GPT-3.5 (20%), but falling short of the baseline set by human participants (66%). Participants' decisions were based mainly on linguistic style (35%) and socioemotional traits (27%), supporting the idea that intelligence, narrowly conceived, is not sufficient… ▽ More

    Submitted 20 April, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: 28 pages, 21 figures

  5. Provenance tracking in the LHCb software

    Authors: Ana Trisovic, Chris R. Jones, Ben Couturier, Marco Clemencic

    Abstract: Even though computational reproducibility is widely accepted as necessary for research validation and reuse, it is often not considered during the research process. This is because reproducibility tools are typically stand-alone and require additional training to be employed. In this article, we present a solution to foster reproducibility, which is integrated within existing scientific software t… ▽ More

    Submitted 2 March, 2020; v1 submitted 3 October, 2019; originally announced October 2019.

    Journal ref: Computing in Science & Engineering ( Volume: 22 , Issue: 2 , March-April 2020 )