GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding

Bioinformatics. 2016 Feb 15;32(4):490-6. doi: 10.1093/bioinformatics/btv565. Epub 2015 Oct 17.

Abstract

Motivation: The majority of disease-associated variants identified in genome-wide association studies reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of genome-wide association studies.

Results: We present GERV (generative evaluation of regulatory variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer-based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif and associated co-factor motifs. We show that GERV outperforms existing methods in predicting single-nucleotide polymorphisms associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked single-nucleotide polymorphisms and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis.

Availability and implementation: The implementation of GERV and related data are available at http://gerv.csail.mit.edu/.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Binding Sites
  • Chromatin Immunoprecipitation
  • Computational Biology / methods*
  • Genome, Human
  • Genome-Wide Association Study
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Models, Statistical*
  • Molecular Sequence Annotation
  • Polymorphism, Single Nucleotide / genetics*
  • Protein Binding
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors