It is possible to estimate the proportionate contributions of ancestral populations to admixed individuals or populations using genetic markers, but different loci and alleles vary considerably in the amount of information that they provide. Conventionally, the allele frequency difference between parental populations (delta) has been used as the criterion to select informative markers. However, it is unclear how to use delta for multiallelic loci, or populations formed by the mixture of more than two groups. Moreover, several other factors, including the actual ancestral proportions and the relative genetic diversities of the parental populations, affect the information provided by genetic markers. We demonstrate here that using delta as the sole criterion for marker selection is inadequate, and we propose, instead, to use Fisher's information, which is the inverse of the variance of the estimated ancestral contributions. This measure is superior because it is directly related to the precision of ancestry estimates. Although delta is related to Fisher's information, the relationship is neither linear nor simple, and the information can vary widely for markers with identical deltas. Fortunately, Fisher's information is easily computed and formally extends to the situation of multiple alleles and/or parental populations. We examined the distribution of information for SNP and microsatellite loci available in the public domain for a variety of model admixed populations. The information, on average, is higher for microsatellite loci, but exceptional SNPs exceed the best microsatellites. Despite the large number of genetic markers that have been identified for admixture analysis, it appears that information for estimating admixture proportions is limited, and estimates will typically have wide confidence intervals.
Copyright 2004 Wiley-Liss,Inc.