Data Science in Environmental Health Research

Curr Epidemiol Rep. 2019 Sep;6(3):291-299. doi: 10.1007/s40471-019-00205-5. Epub 2019 Jul 15.

Abstract

Purpose of review: Data science is an exploding trans-disciplinary field that aims to harness the power of data to gain information or insights on researcher-defined topics of interest. In this paper we review how data science can help advance environmental health research.

Recent findings: We discuss the concepts computationally scalable handling of Big Data and the design of efficient research data platforms, and how data science can provide solutions for methodological challenges in environmental health research, such as high-dimensional outcomes and exposures, and prediction models. Finally, we discuss tools for reproducible research.

Summary: In this paper we present opportunities to improve environmental research capabilities by embracing data science, and the pitfalls that environmental health researchers should avoid when employing data scientific approaches. Throughout the paper, we emphasize the need for environmental health researchers to collaborate more closely with biostatisticians and data scientists to ensure robust and interpretable results.

Keywords: Big Data; Data Science; Environmental Health Research; Environmental Mixtures; High-Dimensional; Reproducibility; Research Data Platforms.