ChatGPT often "hallucinates" or misleads, underscoring the need for formal validation at the professional level for reliable use in nursing education. We evaluated two free chatbots (Google Gemini and GPT-3.5) and a commercial version (GPT-4) on 250 standardized questions from a simulated nursing licensure exam, which closely matches the content and complexity of the actual exam. Gemini achieved 73.2 percent (183/250), GPT-3.5 achieved 72 percent (180/250), and GPT-4 reached a notably higher performance with 92.4 percent (231/250). GPT-4 exhibited its highest error rate (13.3%) in the psychosocial integrity category.
Copyright © 2024 National League for Nursing.