A nearest-neighboring-end algorithm for genetic mapping

Bioinformatics. 2005 Apr 15;21(8):1579-91. doi: 10.1093/bioinformatics/bti164. Epub 2004 Nov 25.

Abstract

Motivation: High-throughput methods are beginning to make possible the genotyping of thousands of loci in thousands of individuals, which could be useful for tightly associating phenotypes to candidate loci. Current mapping algorithms cannot handle so many data without building hierarchies of framework maps.

Results: A version of Kruskal's minimum spanning tree algorithm can solve any genetic mapping problem that can be stated as marker deletion from a set of linkage groups. These include backcross, recombinant inbred, haploid and double-cross recombinational populations, in addition to conventional deletion and radiation hybrid populations. The algorithm progressively joins linkage groups at increasing recombination fractions between terminal markers, and attempts to recognize and correct erroneous joins at peaks in recombination fraction. The algorithm is O (mn3) for m individuals and n markers, but the mean run time scales close to mn2. It is amenable to parallel processing and has recovered true map order in simulations of large backcross, recombinant inbred and deletion populations with up to 37,005 markers. Simulations were used to investigate map accuracy in response to population size, allelic dominance, segregation distortion, missing data and random typing errors. It produced accurate maps when marker distribution was sufficiently uniform, although segregation distortion could induce translocated marker orders. The algorithm was also used to map 1003 loci in the F7 ITMI population of bread wheat, Triticum aestivum L. emend Thell., where it shortened an existing standard map by 16%, but it failed to associate blocks of markers properly across gaps within linkage groups. This was because it depends upon the rankings of recombination fractions at individual markers, and is susceptible to sampling error, typing error and joint selection involving the terminal markers of nearly finished linkage groups. Therefore, the current form of the algorithm is useful mainly to improve local marker ordering in linkage groups obtained in other ways.

Availability: The source code and supplemental data are http://www.iubio.bio.indiana.edu/soft/molbio/qtl/flipper/

Contact: [email protected].

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Chromosome Mapping / methods*
  • Computer Simulation
  • DNA Mutational Analysis / methods*
  • Gene Deletion
  • Genetic Linkage / genetics
  • Genetic Markers / genetics*
  • Genetics, Population*
  • Humans
  • Models, Genetic*
  • Models, Statistical
  • Software*

Substances

  • Genetic Markers