Background: The release of ChatGPT for general use in 2023 by OpenAI has significantly expanded the possible applications of generative artificial intelligence in the healthcare sector, particularly in terms of information retrieval by patients, medical and nursing students, and healthcare personnel.
Objective: To compare the performance of ChatGPT-3.5 and ChatGPT-4.0 to clinical nurses on answering questions about tracheostomy care, as well as to determine whether using different prompts to pre-define the scope of the ChatGPT affects the accuracy of their responses.
Design: Cross-sectional study.
Setting: The data collected from the ChatGPT was collected using the ChatGPT-3.5 and 4.0 using access provided by the University of Hong Kong. The data from the clinical nurses working in mainland China was collected using the Qualtrics survey program.
Participants: No participants were needed for collecting the ChatGPT responses. A total of 272 clinical nurses, with 98.5 % of them working in tertiary care hospitals in mainland China, were recruited using a snowball sampling approach.
Method: We used 43 tracheostomy care-related questions in a multiple-choice format to evaluate the performance of ChatGPT-3.5, ChatGPT-4.0, and clinical nurses. ChatGPT-3.5 and GPT-4.0 were both queried three times with the same questions by different prompts: no prompt, patient-friendly prompt, and act-as-nurse prompt. All responses were independently graded by two qualified otorhinolaryngology nurses on a 3-point accuracy scale (correct, partially correct, and incorrect). The Chi-squared test and Fisher exact test with post-hoc Bonferroni adjustment were used to assess the differences in performance between the three groups, as well as the differences in accuracy between different prompts.
Results: ChatGPT-4.0 showed significantly higher accuracy, with 64.3 % of responses rated as 'correct', compared to 60.5 % in ChatGPT-3.5 and 36.7 % in clinical nurses (X 2 = 74.192, p < .001). Except for the 'care for the tracheostomy stoma and surrounding skin' domain (X2 = 6.227, p = .156), scores from ChatGPT-3.5 and -4.0 were significantly better than nurses' on domains related to airway humidification, cuff management, tracheostomy tube care, suction techniques, and management of complications. Overall, ChatGPT-4.0 consistently performed well in all domains, achieving over 50 % accuracy in each domain. Alterations to the prompt had no impact on the performance of ChatGPT-3.5 or -4.0.
Conclusion: ChatGPT may serve as a complementary medical information tool for patients and physicians to improve knowledge in tracheostomy care.
Tweetable abstract: ChatGPT-4.0 can answer tracheostomy care questions better than most clinical nurses. There is no reason nurses should not be using it.
Keywords: ChatGPT; Education; Generative artificial intelligence; Nursing; Tracheostomy.
© 2024 The Authors. Published by Elsevier Ltd.