Background: Environmental health and other researchers can benefit from automated or semi-automated summaries of data within published studies as summarizing study methods and results is time and resource intensive. Automated summaries can be designed to identify and extract details of interest pertaining to the study design, population, testing agent/intervention, or outcome (etc.). Much of the data reported across existing publications lack unified structure, standardization and machine-readable formats or may be presented in complex tables which serve as barriers that impede the development of automated data extraction methodologies.As full automation of data extraction seems unlikely soon, encouraging investigators to submit structured summaries of methods and results in standardized formats with meta-data tagging of content may be of value during the publication process. This would produce machine-readable content to facilitate automated data extraction, establish sharable data repositories, help make research data FAIR, and could improve reporting quality.
Objectives: A pilot study was conducted to assess the feasibility of asking participants to summarize study methods and results using a structured, web-based data extraction model as a potential workflow that could be implemented during the manuscript submission process.
Methods: Eight participants entered study details and data into the Health Assessment Workplace Collaborative (HAWC). Participants were surveyed after the extraction exercise to ascertain 1) whether this extraction exercise will impact their conducting and reporting of future research, 2) the ease of data extraction, including which fields were easiest and relatively more problematic to extract and 3) the amount of time taken to perform data extractions and other related tasks. Investigators then presented participants the potential benefits of providing structured data in the format they were extracting. After this, participants were surveyed about 1) their willingness to provide structured data during the publication process and 2) whether they felt the potential application of structured data entry approaches and their implementation during the journal submission process should continue to be further explored.
Conclusions: Routine provision of structured data that summarizes key information from research studies could reduce the amount of effort required for reusing that data in the future, such as in systematic reviews or agency scientific assessments. Our pilot study suggests that directly asking authors to provide that data, via structured templates, may be a viable approach to achieving this: participants were willing to do so, and the overall process was not prohibitively arduous. We also found some support for the hypothesis that use of study templates may have halo benefits in improving the conduct and completeness of reporting of future research. While limitations in the generalizability of our findings mean that the conditions of success of templates cannot be assumed, further research into how such templates might be designed and implemented does seem to have enough chance of success that it ought to be undertaken.
Keywords: Author feedback; Author journal submission requirements; Author opinion; Author willingness; Automated data extraction; Data extraction; Data sharing; Data summary; Data templates; Manuscript submission; Natural language Processing (NLP); Science translation; Standardized data; Structured data; Study evaluation template; Systematic review.
© 2022 The Authors.