The aim of the study was to evaluate whether ChatGPT-3.5 and Bard provide safe and reliable medical answers to common topics related to soft tissue infections and their management according to the guidelines provided by the Infectious Disease Society of America (IDSA). IDSA's abridged recommendations for soft tissue infections were identified on the IDSA official website. Twenty-five queries were entered into the LLMs as they appear on the IDSA website. To assess the concordance and precision of the LLMs' responses with the IDSA guidelines, two infectious disease physicians independently compared and evaluated each response. This was done using a 5-point Likert scale, with 1 representing poor concordance and 5 excellent concordance, as adapted from the validated Global Quality Scale. The mean ± SD score for ChatGPT-generated responses was 4.34 ± 0.74, n = 25. This indicates that raters found the answers were good to excellent quality with the most important topics covered. Although some topics were not covered, the answers were in good concordance with the IDSA guidelines. The mean ± SD score for Bard-generate responses was 3.5 ± 1.2, n = 25, indicating moderate quality. Despite LLMs did not appear to provide wrong recommendations and covered most of the topics, the responses were often found to be generic, rambling, missing some details, and lacking actionability. As AI continues to evolve and researchers feed it with more extensive and diverse medical knowledge, it may be inching closer to becoming a reliable aid for clinicians, ultimately enhancing the accuracy of infectious disease diagnosis and management in the future.
Keywords: AI; Concordance; Guidelines; Infectious disease diagnosis; Soft tissue infections.
© 2023. The Author(s) under exclusive licence to Biomedical Engineering Society.