Comparing new tools of artificial intelligence to the authentic intelligence of our global health students

BioData Min. 2024 Dec 18;17(1):58. doi: 10.1186/s13040-024-00408-7.

Abstract

Introduction: The transformative feature of Artificial Intelligence (AI) is the massive capacity for interpreting and transforming unstructured data into a coherent and meaningful context. In general, the potential that AI will alter traditional approaches to student research and its evaluation appears to be significant. With regard to research in global health, it is important for students and research experts to assess strengths and limitations of GenAI within this space. Thus, the goal of our research was to evaluate the information literacy of GenAI compared to expectations that graduate students meet in writing research papers.

Methods: After completing the course, Fundamentals of Global Health (INTH 401) at Case Western Reserve University (CWRU), Graduate students who successfully completed their required research paper were recruited to compare their original papers with a paper they generated by ChatGPT-4o using the original assignment prompt. Students also completed a Google Forms survey to evaluate different sections of the AI-generated paper (e.g., Adherence to Introduction guidelines, Presentation of three perspectives, Conclusion) and their original papers and their overall satisfaction with the AI work. The original student to ChatGPT-4o comparison also enabled evaluation of narrative elements and references.

Results: Of the 54 students who completed the required research paper, 28 (51.8%) agreed to collaborate in the comparison project. A summary of the survey responses suggested that students evaluated the AI-generated paper as inferior or similar to their own paper (overall satisfaction average = 2.39 (1.61-3.17); Likert scale: 1 to 5 with lower scores indicating inferiority). Evaluating the average individual student responses for 5 Likert item queries showed that 17 scores were < 2.9; 7 scores were between 3.0 to 3.9; 4 scores were ≥ 4.0, consistent with inferiority of the AI-generated paper. Evaluation of reference selection by ChatGPT-4o (n = 729 total references) showed that 54% (n = 396) were authentic, 46% (n = 333) did not exist. Of the authentic references, 26.5% (105/396) were relevant to the paper narrative; 14.4% of the 729 total references.

Discussion: Our findings reveal strengths and limitations on the potential of AI tools to assist in understanding the complexities of global health topics. Strengths mentioned by students included the ability of ChatGPT-4o to produce content very quickly and to suggest topics that they had not considered in the 3-perspective sections of their papers. Consistently presenting up-to-date facts and references, as well as further examining or summarizing the complexities of global health topics, appears to be a current limitation of ChatGPT-4o. Because ChatGPT-4o generated references from highly credible biomedical research journals that did not exist, our findings conclude that ChatGPT-4o failed an important component in using information effectively. Moreover, misrepresenting trusted sources of public health information is highly concerning, particularly given recent experiences from the COVID-19 pandemic and more recently in reporting on the impact of, and response to natural disasters. This is a significant limitation of GenAI's ability to meet information literacy standards expected of graduate students.