Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators

PLoS Comput Biol. 2017 Oct 19;13(10):e1005755. doi: 10.1371/journal.pcbi.1005755. eCollection 2017 Oct.

Abstract

In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principal investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC-acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology.

MeSH terms

  • Computational Biology / statistics & numerical data*
  • Databases, Genetic*
  • Humans
  • Research Personnel / statistics & numerical data*
  • United States

Grants and funding

This study is an Education, Outreach and Training (EOT) activity of CyVerse, an NSF-funded project to develop a “cyber universe” to support life sciences research (DBI-0735191 and DBI-1265383). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.