Domain-adaptive neural networks improve cross-species prediction of transcription factor binding

Kelly Cochran; Divyanshi Srivastava; Avanti Shrikumar; Akshay Balsubramani; Ross C Hardison; Anshul Kundaje; Shaun Mahony

doi:10.1101/gr.275394.121

Domain-adaptive neural networks improve cross-species prediction of transcription factor binding

Genome Res. 2022 Mar;32(3):512-523. doi: 10.1101/gr.275394.121. Epub 2022 Jan 18.

Authors

Kelly Cochran^{1

2}, Divyanshi Srivastava^{1

3}, Avanti Shrikumar², Akshay Balsubramani⁴, Ross C Hardison^{1

3}, Anshul Kundaje^{2

4}, Shaun Mahony^{1

3}

Affiliations

¹ Center for Eukaryotic Gene Regulation, Pennsylvania State University, University Park, Pennsylvania 16802, USA.
² Department of Computer Science, Stanford University, Stanford, California 94305, USA.
³ Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA.
⁴ Department of Genetics, Stanford University, Stanford, California 94305, USA.

Abstract

The intrinsic DNA sequence preferences and cell type-specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type-specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species-specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Binding Sites
Chromatin Immunoprecipitation Sequencing
Computational Biology / methods
Neural Networks, Computer*
Protein Binding
Transcription Factors* / metabolism

Substances

Transcription Factors

Abstract

Publication types

MeSH terms

Substances

Grants and funding