A semi-supervised approach for predicting cell-type specific functional consequences of non-coding variation using MPRAs

Nat Commun. 2018 Dec 5;9(1):5199. doi: 10.1038/s41467-018-07349-w.

Abstract

Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. We propose here a semi-supervised approach, GenoNet, to jointly utilize experimentally confirmed regulatory variants (labeled variants), millions of unlabeled variants genome-wide, and more than a thousand cell/tissue type specific epigenetic annotations to predict functional consequences of non-coding variants. Through the application to several experimental datasets, we demonstrate that the proposed method significantly improves prediction accuracy compared to existing functional prediction methods at the tissue/cell type level, but especially so at the organism level. Importantly, we illustrate how the GenoNet scores can help in fine-mapping at GWAS loci, and in the discovery of disease associated genes in sequencing studies. As more comprehensive lists of experimentally validated variants become available over the next few years, semi-supervised methods like GenoNet can be used to provide increasingly accurate functional predictions for variants genome-wide and across a variety of cell/tissue types.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Genetic*
  • Genetic Variation*
  • Genome-Wide Association Study
  • Humans
  • Organ Specificity
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci
  • RNA, Untranslated / genetics*

Substances

  • RNA, Untranslated