cgaTOH: extended approach for identifying tracts of homozygosity

PLoS One. 2013;8(3):e57772. doi: 10.1371/journal.pone.0057772. Epub 2013 Mar 1.

Abstract

Identification of disease variants via homozygosity mapping and investigation of the effects of genome-wide homozygosity regions on traits of biomedical importance have been widely applied recently. Nonetheless, the existing methods and algorithms to identify long tracts of homozygosity (TOH) are not able to provide efficient and rigorous regions for further downstream association investigation. We expanded current methods to identify TOHs by defining "surrogate-TOH", a region covering a cluster of TOHs with specific characteristics. Our defined surrogate-TOH includes cTOH, viz a common TOH region where at least ten TOHs present; gTOH, whereby a group of highly overlapping TOHs share proximal boundaries; and aTOH, which are allelically-matched TOHs. Searching for gTOH and aTOH was based on a repeated binary spectral clustering algorithm, where a hierarchy of clusters is created and represented by a TOH cluster tree. Based on the proposed method of identifying different species of surrogate-TOH, our cgaTOH software was developed. The software provides an intuitive and interactive visualization tool for better investigation of the high-throughput output with special interactive navigation rings, which will find its applicability in both conventional association studies and more sophisticated downstream analyses. NCBI genome map viewer is incorporated into the system. Moreover, we discuss the choice of implementing appropriate empirical ranges of critical parameters by applying to disease models. This method identifies various patterned clusters of SNPs demonstrating extended homozygosity, thus one can observe different aspects of the multi-faceted characteristics of TOHs.

MeSH terms

  • Algorithms
  • Chromosome Mapping
  • Cluster Analysis
  • Databases, Genetic
  • Genetic Predisposition to Disease*
  • Genome, Human
  • Homozygote*
  • Humans
  • Lung Neoplasms / genetics*
  • Models, Genetic
  • Polymorphism, Single Nucleotide*
  • Software*

Grants and funding

The authors have no support or funding to report.