Two-Layer Retrieval-Augmented Generation Framework for Low-Resource Medical Question Answering Using Reddit Data: Proof-of-Concept Study

Sudeshna Das; Yao Ge; Yuting Guo; Swati Rajwal; JaMor Hairston; Jeanne Powell; Drew Walker; Snigdha Peddireddy; Sahithi Lakamana; Selen Bozkurt; Matthew Reyna; Reza Sameni; Yunyu Xiao; Sangmi Kim; Rasheeta Chandler; Natalie Hernandez; Danielle Mowery; Rachel Wightman; Jennifer Love; Anthony Spadaro; Jeanmarie Perrone; Abeed Sarker

doi:10.2196/66220

Two-Layer Retrieval-Augmented Generation Framework for Low-Resource Medical Question Answering Using Reddit Data: Proof-of-Concept Study

J Med Internet Res. 2025 Jan 6:27:e66220. doi: 10.2196/66220.

Authors

Sudeshna Das^#¹, Yao Ge^#¹, Yuting Guo¹, Swati Rajwal², JaMor Hairston¹, Jeanne Powell¹, Drew Walker¹, Snigdha Peddireddy³, Sahithi Lakamana¹, Selen Bozkurt¹, Matthew Reyna¹, Reza Sameni^{1

4}, Yunyu Xiao⁵, Sangmi Kim⁶, Rasheeta Chandler⁶, Natalie Hernandez⁷, Danielle Mowery⁸, Rachel Wightman⁹, Jennifer Love¹⁰, Anthony Spadaro¹¹, Jeanmarie Perrone¹², Abeed Sarker^{1

4}

Affiliations

¹ Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States.
² Department of Computer Science and Informatics, Emory University, Atlanta, GA, United States.
³ Department of Behavioral, Social & Health Education Sciences, Rollins School of Public Health, Emory University, Atlanta, GA, United States.
⁴ Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States.
⁵ Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States.
⁶ Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, United States.
⁷ Center for Maternal Health Equity, Morehouse School of Medicine, Atlanta, GA, United States.
⁸ Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
⁹ Department of Emergency Medicine, Warren Alpert Medical School of Brown University, Providence, RI, United States.
¹⁰ Department of Emergency Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
¹¹ Department of Emergency Medicine, Rutgers New Jersey Medical School, Newark, NJ, United States.
¹² Department of Emergency Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, United States.

^# Contributed equally.

PMID: 39761554
DOI: 10.2196/66220

Abstract

Background: The increasing use of social media to share lived and living experiences of substance use presents a unique opportunity to obtain information on side effects, use patterns, and opinions on novel psychoactive substances. However, due to the large volume of data, obtaining useful insights through natural language processing technologies such as large language models is challenging.

Objective: This paper aims to develop a retrieval-augmented generation (RAG) architecture for medical question answering pertaining to clinicians' queries on emerging issues associated with health-related topics, using user-generated medical information on social media.

Methods: We proposed a two-layer RAG framework for query-focused answer generation and evaluated a proof of concept for the framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. Our modular framework generates individual summaries followed by an aggregated summary to answer medical queries from large amounts of user-generated social media data in an efficient manner. We compared the performance of a quantized large language model (Nous-Hermes-2-7B-DPO), deployable in low-resource settings, with GPT-4. For this proof-of-concept study, we used user-generated data from Reddit to answer clinicians' questions on the use of xylazine and ketamine.

Results: Our framework achieves comparable median scores in terms of relevance, length, hallucination, coverage, and coherence when evaluated using GPT-4 and Nous-Hermes-2-7B-DPO, evaluated for 20 queries with 76 samples. There was no statistically significant difference between GPT-4 and Nous-Hermes-2-7B-DPO for coverage (Mann-Whitney U=733.0; n₁=37; n₂=39; P=.89 two-tailed), coherence (U=670.0; n₁=37; n₂=39; P=.49 two-tailed), relevance (U=662.0; n₁=37; n₂=39; P=.15 two-tailed), length (U=672.0; n₁=37; n₂=39; P=.55 two-tailed), and hallucination (U=859.0; n₁=37; n₂=39; P=.01 two-tailed). A statistically significant difference was noted for the Coleman-Liau Index (U=307.5; n₁=20; n₂=16; P<.001 two-tailed).

Conclusions: Our RAG framework can effectively answer medical questions about targeted topics and can be deployed in resource-constrained settings.

Keywords: GPT; artificial intelligence; large language models; natural language processing; psychoactive substance; retrieval-augmented generation; social media; substance use.

©Sudeshna Das, Yao Ge, Yuting Guo, Swati Rajwal, JaMor Hairston, Jeanne Powell, Drew Walker, Snigdha Peddireddy, Sahithi Lakamana, Selen Bozkurt, Matthew Reyna, Reza Sameni, Yunyu Xiao, Sangmi Kim, Rasheeta Chandler, Natalie Hernandez, Danielle Mowery, Rachel Wightman, Jennifer Love, Anthony Spadaro, Jeanmarie Perrone, Abeed Sarker. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 06.01.2025.

MeSH terms

Humans
Information Storage and Retrieval / methods
Natural Language Processing*
Proof of Concept Study*
Social Media* / statistics & numerical data