Multi-study inference of regulatory networks for more accurate models of gene regulation

Dayanne M Castro; Nicholas R de Veaux; Emily R Miraldi; Richard Bonneau

doi:10.1371/journal.pcbi.1006591

Multi-study inference of regulatory networks for more accurate models of gene regulation

PLoS Comput Biol. 2019 Jan 24;15(1):e1006591. doi: 10.1371/journal.pcbi.1006591. eCollection 2019 Jan.

Authors

Dayanne M Castro¹, Nicholas R de Veaux², Emily R Miraldi^{3

4}, Richard Bonneau^{1

2}

Affiliations

¹ New York University, New York, NY 10003, USA.
² Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA.
³ Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA.
⁴ Divisions of Immunobiology & Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA.

Abstract

Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Bacillus subtilis / genetics
Computational Biology / methods*
Databases, Genetic
Gene Expression Regulation / genetics*
Gene Regulatory Networks / genetics*
Models, Genetic*
Saccharomyces cerevisiae / genetics

Grants and funding

This work was funded by the Simons Foundation and the U.S. National Institute of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.