A genome-wide scan statistic framework for whole-genome sequence data analysis

Nat Commun. 2019 Jul 9;10(1):3018. doi: 10.1038/s41467-019-11023-0.

Abstract

The analysis of whole-genome sequencing studies is challenging due to the large number of noncoding rare variants, our limited understanding of their functional effects, and the lack of natural units for testing. Here we propose a scan statistic framework, WGScan, to simultaneously detect the existence, and estimate the locations of association signals at genome-wide scale. WGScan can analytically estimate the significance threshold for a whole-genome scan; utilize summary statistics for a meta-analysis; incorporate functional annotations for enhanced discoveries in noncoding regions; and enable enrichment analyses using genome-wide summary statistics. Based on the analysis of whole genomes of 1,786 phenotypically discordant sibling pairs from the Simons Simplex Collection study for autism spectrum disorders, we derive genome-wide significance thresholds for whole genome sequencing studies and detect significant enrichments of regions showing associations with autism in promoter regions, functional categories related to autism, and enhancers predicted to regulate expression of autism associated genes.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Autism Spectrum Disorder / genetics*
  • Data Analysis*
  • Datasets as Topic
  • Female
  • Gene Expression Regulation
  • Genome, Human / genetics*
  • Genome-Wide Association Study / methods
  • Humans
  • Male
  • Models, Genetic*
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Siblings
  • Whole Genome Sequencing / methods