ChatGPT as a prospective undergraduate and medical school student

PLoS One. 2024 Oct 23;19(10):e0308157. doi: 10.1371/journal.pone.0308157. eCollection 2024.

Abstract

This article reports the results of an experiment conducted with ChatGPT to see how its performance compares to human performance on tests that require specific knowledge and skills, such as university admission tests. We chose a general undergraduate admission test and two tests for admission to biomedical programs: the Scholastic Assessment Test (SAT), the Cambridge BioMedical Admission Test (BMAT), and the Italian Medical School Admission Test (IMSAT). In particular, we looked closely at the difference in performance between ChatGPT-4 and its predecessor, ChatGPT-3.5, to assess its evolution. The performance of ChatGPT-4 showed a significant improvement over ChatGPT-3.5 and, compared to real students, was on average within the top 10% in the SAT test, while the score in the IMSAT test granted admission to the two highest ranked Italian medical schools. In addition to the performance analysis, we provide a qualitative analysis of incorrect answers and a classification of three different types of logical and computational errors made by ChatGPT-4, which reveal important weaknesses of the model. This provides insight into the skills needed to use these models effectively despite their weaknesses, and also suggests possible applications of our analysis in the field of education.

MeSH terms

  • Education, Medical, Undergraduate
  • Educational Measurement / methods
  • Female
  • Humans
  • Italy
  • Male
  • Prospective Studies
  • School Admission Criteria
  • Schools, Medical*
  • Students, Medical*

Grants and funding

R. Giuntini, G. Sergioli and S. Pinna are partially supported by the project “Ubiquitous Quantum Reality (UQR): understanding the natural processes under the light of quantum-like structures”, funded by Fondazione di Sardegna (https://www.fondazionedisardegna.it/) (code: F73C22001360007). R. Giuntini and G. Sergioli are also partially supported by the project PRIN-PNRR “Quantum Models for Logic, Computation and Natural Processes (Qm4Np)”, funded by Italian Ministry of University and Research (https://prin.mur.gov.it/) (code: F53D23011170001). R. Giuntini, M. Giunti, F. Garavaglia and G. Sergioli are partially supported by the project PRIN2022 “CORTEX The COst of Reasoning: Theory and EXperiments”, funded by Italian Ministry of University and Research (https://prin.mur.gov.it/) (code: F53D23004940006). R. Giuntini is partially funded by the TÜV SÜD Foundation, the Federal Ministry of Education and Research (BMBF) and the Free State of Bavaria under the Excellence Strategy of the Federal Government and the Länder, as well as by the Technical University of Munich-Institute for Advanced Study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.