Discovering new pathways toward integration between health and sustainable development goals with natural language processing and network science

Global Health. 2023 Jun 29;19(1):44. doi: 10.1186/s12992-023-00943-8.

Abstract

Background: Research on health and sustainable development is growing at a pace such that conventional literature review methods appear increasingly unable to synthesize all relevant evidence. This paper employs a novel combination of natural language processing (NLP) and network science techniques to address this problem and to answer two questions: (1) how is health thematically interconnected with the Sustainable Development Goals (SDGs) in global science? (2) What specific themes have emerged in research at the intersection between SDG 3 ("Good health and well-being") and other sustainability goals?

Methods: After a descriptive analysis of the integration between SDGs in twenty years of global science (2001-2020) as indexed by dimensions.ai, we analyze abstracts of articles that are simultaneously relevant to SDG 3 and at least one other SDG (N = 27,928). We use the top2vec algorithm to discover topics in this corpus and measure semantic closeness between these topics. We then use network science methods to describe the network of substantive relationships between the topics and identify 'zipper themes', actionable domains of research and policy to co-advance health and other sustainability goals simultaneously.

Results: We observe a clear increase in scientific research integrating SDG 3 and other SDGs since 2001, both in absolute and relative terms, especially on topics relevant to interconnections between health and SDGs 2 ("Zero hunger"), 4 ("Quality education"), and 11 ("Sustainable cities and communities"). We distill a network of 197 topics from literature on health and sustainable development, with 19 distinct network communities - areas of growing integration with potential to further bridge health and sustainability science and policy. Literature focused explicitly on the SDGs is highly central in this network, while topical overlaps between SDG 3 and the environmental SDGs (12-15) are under-developed.

Conclusion: Our analysis demonstrates the feasibility and promise of NLP and network science for synthesizing large amounts of health-related scientific literature and for suggesting novel research and policy domains to co-advance multiple SDGs. Many of the 'zipper themes' identified by our method resonate with the One Health perspective that human, animal, and plant health are closely interdependent. This and similar perspectives will help meet the challenge of 'rewiring' sustainability research to co-advance goals in health and sustainability.

Keywords: Natural language processing; Network science; One health; Sustainable development goals; Topic modeling.

MeSH terms

  • Animals
  • Cities
  • Educational Status
  • Humans
  • Natural Language Processing*
  • One Health*
  • Sustainable Development