Can we use ChatGPT for Mental Health and Substance Use Education? Examining Its Quality and Potential Harms

Sophia Spallek; Louise Birrell; Stephanie Kershaw; Emma Krogh Devine; Louise Thornton

doi:10.2196/51243

Can we use ChatGPT for Mental Health and Substance Use Education? Examining Its Quality and Potential Harms

JMIR Med Educ. 2023 Nov 30:9:e51243. doi: 10.2196/51243.

Authors

Sophia Spallek^#¹, Louise Birrell^#¹, Stephanie Kershaw¹, Emma Krogh Devine¹, Louise Thornton¹

Affiliation

¹ The Matilda Centre for Research in Mental Health and Substance Use, The University of Sydney, Sydney, Australia.

^# Contributed equally.

PMID: 38032714
PMCID: PMC10722374
DOI: 10.2196/51243

Abstract

Background: The use of generative artificial intelligence, more specifically large language models (LLMs), is proliferating, and as such, it is vital to consider both the value and potential harms of its use in medical education. Their efficiency in a variety of writing styles makes LLMs, such as ChatGPT, attractive for tailoring educational materials. However, this technology can feature biases and misinformation, which can be particularly harmful in medical education settings, such as mental health and substance use education. This viewpoint investigates if ChatGPT is sufficient for 2 common health education functions in the field of mental health and substance use: (1) answering users' direct queries and (2) aiding in the development of quality consumer educational health materials.

Objective: This viewpoint includes a case study to provide insight into the accessibility, biases, and quality of ChatGPT's query responses and educational health materials. We aim to provide guidance for the general public and health educators wishing to utilize LLMs.

Methods: We collected real world queries from 2 large-scale mental health and substance use portals and engineered a variety of prompts to use on GPT-4 Pro with the Bing BETA internet browsing plug-in. The outputs were evaluated with tools from the Sydney Health Literacy Lab to determine the accessibility, the adherence to Mindframe communication guidelines to identify biases, and author assessments on quality, including tailoring to audiences, duty of care disclaimers, and evidence-based internet references.

Results: GPT-4's outputs had good face validity, but upon detailed analysis were substandard in comparison to expert-developed materials. Without engineered prompting, the reading level, adherence to communication guidelines, and use of evidence-based websites were poor. Therefore, all outputs still required cautious human editing and oversight.

Conclusions: GPT-4 is currently not reliable enough for direct-consumer queries, but educators and researchers can use it for creating educational materials with caution. Materials created with LLMs should disclose the use of generative artificial intelligence and be evaluated on their efficacy with the target audience.

Keywords: ChatGPT; artificial intelligence; educational intervention; generative artificial intelligence; health education; large language models; medical education; mental health; patient education handout; preventive health services; substance use.

©Sophia Spallek, Louise Birrell, Stephanie Kershaw, Emma Krogh Devine, Louise Thornton. Originally published in JMIR Medical Education (https://mededu.jmir.org), 30.11.2023.