A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course

Will Yeadon; Alex Peach; Craig Testrow

doi:10.1038/s41598-024-73634-y

A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course

Sci Rep. 2024 Oct 7;14(1):23285. doi: 10.1038/s41598-024-73634-y.

Authors

Will Yeadon¹, Alex Peach², Craig Testrow²

Affiliations

¹ Department of Physics, Durham University, Durham, DH1 3LB, UK. [email protected].
² Department of Physics, Durham University, Durham, DH1 3LB, UK.

Abstract

This study evaluates the performance of ChatGPT variants, GPT-3.5 and GPT-4, both with and without prompt engineering, against solely student work and a mixed category containing both student and GPT-4 contributions in university-level physics coding assignments using the Python language. Comparing 50 student submissions to 50 AI-generated submissions across different categories, and marked blindly by three independent markers, we amassed $n = 300$ data points. Students averaged 91.9% (SE:0.4), surpassing the highest performing AI submission category, GPT-4 with prompt engineering, which scored 81.1% (SE:0.8)-a statistically significant difference (p = $2.482 \times 10^{- 10}$ ). Prompt engineering significantly improved scores for both GPT-4 (p = $1.661 \times 10^{- 4}$ ) and GPT-3.5 (p = $4.967 \times 10^{- 9}$ ). Additionally, the blinded markers were tasked with guessing the authorship of the submissions on a four-point Likert scale from 'Definitely AI' to 'Definitely Human'. They accurately identified the authorship, with 92.1% of the work categorized as 'Definitely Human' being human-authored. Simplifying this to a binary 'AI' or 'Human' categorization resulted in an average accuracy rate of 85.3%. These findings suggest that while AI-generated work closely approaches the quality of university students' work, it often remains detectable by human evaluators.

Keywords: Benchmark; ChatGPT; Coding; GPT-4.

Publication types

Comparative Study

MeSH terms

Humans
Students*
Universities