Applications of Natural Language Processing and Large Language Models for Social Determinants of Health: Protocol for a Systematic Review

JMIR Res Protoc. 2025 Jan 21:14:e66094. doi: 10.2196/66094.

Abstract

Background: In recent years, the intersection of natural language processing (NLP) and public health has opened innovative pathways for investigating social determinants of health (SDOH) in textual datasets. Despite the promise of NLP in the SDOH domain, the literature is dispersed across various disciplines, and there is a need to consolidate existing knowledge, identify knowledge gaps in the literature, and inform future research directions in this emerging field.

Objective: This research protocol describes a systematic review to identify and highlight NLP techniques, including large language models, used for SDOH-related studies.

Methods: A search strategy will be executed across PubMed, Web of Science, IEEE Xplore, Scopus, PsycINFO, HealthSource: Academic Nursing, and ACL Anthology to find studies published in English between 2014 and 2024. Three reviewers (SR, ZZ, and YC) will independently screen the studies to avoid voting bias, and two (AS and YX) additional reviewers will resolve any conflicts during the screening process. We will further screen studies that cited the included studies (forward search). Following the title abstract and full-text screening, the characteristics and main findings of the included studies and resources will be tabulated, visualized, and summarized.

Results: The search strategy was formulated and run across the 7 databases in August 2024. We expect the results to be submitted for peer review publication in early 2025. As of December 2024, the title and abstract screening was underway.

Conclusions: This systematic review aims to provide a comprehensive study of existing research on the application of NLP for various SDOH tasks across multiple textual datasets. By rigorously evaluating the methodologies, tools, and outcomes of eligible studies, the review will identify gaps in current knowledge and suggest directions for future research in the form of specific research questions. The findings will be instrumental in developing more effective NLP models for SDOH, ultimately contributing to improved health outcomes and a better understanding of social determinants in diverse populations.

International registered report identifier (irrid): DERR1-10.2196/66094.

Keywords: LLM; NLP; SDOH; large language models; natural language processing; social determinants of health; systematic review protocol.

MeSH terms

  • Humans
  • Natural Language Processing*
  • Social Determinants of Health*
  • Systematic Reviews as Topic*