STARRPeaker: uniform processing and accurate identification of STARR-seq active regions

Genome Biol. 2020 Dec 8;21(1):298. doi: 10.1186/s13059-020-02194-x.

Abstract

STARR-seq technology has employed progressively more complex genomic libraries and increased sequencing depths. An issue with the increased complexity and depth is that the coverage in STARR-seq experiments is non-uniform, overdispersed, and often confounded by sequencing biases, such as GC content. Furthermore, STARR-seq readout is confounded by RNA secondary structure and thermodynamic stability. To address these potential confounders, we developed a negative binomial regression framework for uniformly processing STARR-seq data, called STARRPeaker. Moreover, to aid our effort, we generated whole-genome STARR-seq data from the HepG2 and K562 human cell lines and applied STARRPeaker to comprehensively and unbiasedly call enhancers in them.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Enhancer Elements, Genetic
  • Genomic Library
  • Hep G2 Cells
  • High-Throughput Nucleotide Sequencing
  • Humans
  • K562 Cells
  • Male
  • Promoter Regions, Genetic
  • Sequence Analysis, DNA / methods*
  • Software*