Can Large Language Models Aid Caregivers of Pediatric Cancer Patients in Information Seeking? A Cross-Sectional Investigation

Cancer Med. 2025 Jan;14(1):e70554. doi: 10.1002/cam4.70554.

Abstract

Purpose: Caregivers in pediatric oncology need accurate and understandable information about their child's condition, treatment, and side effects. This study assesses the performance of publicly accessible large language model (LLM)-supported tools in providing valuable and reliable information to caregivers of children with cancer.

Methods: In this cross-sectional study, we evaluated the performance of the four LLM-supported tools-ChatGPT (GPT-4), Google Bard (Gemini Pro), Microsoft Bing Chat, and Google SGE-against a set of frequently asked questions (FAQs) derived from the Children's Oncology Group Family Handbook and expert input (In total, 26 FAQs and 104 generated responses). Five pediatric oncology experts assessed the generated LLM responses using measures including accuracy, clarity, inclusivity, completeness, clinical utility, and overall rating. Additionally, the content quality was evaluated including readability, AI disclosure, source credibility, resource matching, and content originality. We used descriptive analysis and statistical tests including Shapiro-Wilk, Levene's, Kruskal-Wallis H-tests, and Dunn's post hoc tests for pairwise comparisons.

Results: ChatGPT shows high overall performance when evaluated by the experts. Bard also performed well, especially in accuracy and clarity of the responses, whereas Bing Chat and Google SGE had lower overall scores. Regarding the disclosure of responses being generated by AI, it was observed less frequently in ChatGPT responses, which may have affected the clarity of responses, whereas Bard maintained a balance between AI disclosure and response clarity. Google SGE generated the most readable responses whereas ChatGPT answered with the most complexity. LLM tools varied significantly (p < 0.001) across all expert evaluations except inclusivity. Through our thematic analysis of expert free-text comments, emotional tone and empathy emerged as a unique theme with mixed feedback on expectations from AI to be empathetic.

Conclusion: LLM-supported tools can enhance caregivers' knowledge of pediatric oncology. Each model has unique strengths and areas for improvement, indicating the need for careful selection based on specific clinical contexts. Further research is required to explore their application in other medical specialties and patient demographics, assessing broader applicability and long-term impacts.

Keywords: artificial intelligence; health care communication; health literacy; large language models; patient education; pediatric oncology.

MeSH terms

  • Caregivers* / psychology
  • Child
  • Cross-Sectional Studies
  • Female
  • Humans
  • Information Seeking Behavior
  • Language
  • Male
  • Neoplasms* / psychology
  • Neoplasms* / therapy