The FAIR database: facilitating access to public health research literature

Zhixue Zhao; James Thomas; Gregory Kell; Claire Stansfield; Mark Clowes; Sergio Graziosi; Jeff Brunton; Iain James Marshall; Mark Stevenson

doi:10.1093/jamiaopen/ooae139

The FAIR database: facilitating access to public health research literature

JAMIA Open. 2024 Dec 13;7(4):ooae139. doi: 10.1093/jamiaopen/ooae139. eCollection 2024 Dec.

Authors

Zhixue Zhao¹, James Thomas², Gregory Kell³, Claire Stansfield², Mark Clowes⁴, Sergio Graziosi², Jeff Brunton², Iain James Marshall³, Mark Stevenson¹

Affiliations

¹ Department of Computer Science, University of Sheffield, Sheffield S10 2TN, United Kingdom.
² EPPI Centre, UCL Social Research Institute, Institute of Education, University College London, London WC1E 6BT, United Kingdom.
³ School of Medicine and Population Health, University of Sheffield, Sheffield S10 2TN, United Kingdom.
⁴ Department of Population Health Sciences, School of Life Course & Population Sciences, Faculty of Life Sciences & Medicine, Kings College London, London WC2R 2LS, United Kingdom.

Abstract

Objectives: In public health, access to research literature is critical to informing decision-making and to identify knowledge gaps. However, identifying relevant research is not a straightforward task since public health interventions are often complex, can have positive and negative impacts on health inequalities and are applied in diverse and rapidly evolving settings. We developed a "living" database of public health research literature to facilitate access to this information using Natural Language Processing tools.

Materials and methods: Classifiers were identified to identify the study design (eg, cohort study or clinical trial) and relationship to factors that may be relevant to inequalities using the PROGRESS-Plus classification scheme. Training data were obtained from existing MEDLINE labels and from a set of systematic reviews in which studies were annotated with PROGRESS-Plus categories.

Results: Evaluation of the classifiers showed that the study type classifier achieved average precision and recall of 0.803 and 0.930, respectively. The PROGRESS-Plus classification proved more challenging with average precision and recall of 0.608 and 0.534. The FAIR database uses information provided by these classifiers to facilitate access to inequality-related public health literature.

Discussion: Previous work on automation of evidence synthesis has focused on clinical areas rather than public health, despite the need being arguably greater.

Conclusion: The development of the FAIR database demonstrates that it is possible to create a publicly accessible and regularly updated database of public health research literature focused on inequalities. The database is freely available from https://eppi.ioe.ac.uk/eppi-vis/Fair.

Netscc id number: NIHR133603.

Keywords: automatic database curation; evidence synthesis; inequalities; machine learning; public health; research synthesis.