Using parametric multipoint lods and mods for linkage analysis requires a shift in statistical thinking

Hum Hered. 2011;72(4):264-75. doi: 10.1159/000331463. Epub 2011 Dec 23.

Abstract

Multipoint (MP) linkage analysis represents a valuable tool for whole-genome studies but suffers from the disadvantage that its probability distribution is unknown and varies as a function of marker information and density, genetic model, number and structure of pedigrees, and the affection status distribution [Xing and Elston: Genet Epidemiol 2006;30:447-458; Hodge et al.: Genet Epidemiol 2008;32:800-815]. This implies that the MP significance criterion can differ for each marker and each dataset, and this fact makes planning and evaluation of MP linkage studies difficult. One way to circumvent this difficulty is to use simulations or permutation testing. Another approach is to use an alternative statistical paradigm to assess the statistical evidence for linkage, one that does not require computation of a p value. Here we show how to use the evidential statistical paradigm for planning, conducting, and interpreting MP linkage studies when the disease model is known (lod analysis) or unknown (mod analysis). As a key feature, the evidential paradigm decouples uncertainty (i.e. error probabilities) from statistical evidence. In the planning stage, the user calculates error probabilities, as functions of one's design choices (sample size, choice of alternative hypothesis, choice of likelihood ratio (LR) criterion k) in order to ensure a reliable study design. In the data analysis stage one no longer pays attention to those error probabilities. In this stage, one calculates the LR for two simple hypotheses (i.e. trait locus is unlinked vs. trait locus is located at a particular position) as a function of the parameter of interest (position). The LR directly measures the strength of evidence for linkage in a given data set and remains completely divorced from the error probabilities calculated in the planning stage. An important consequence of this procedure is that one can use the same criterion k for all analyses. This contrasts with the situation described above, in which the value one uses to conclude significance may differ for each marker and each dataset in order to accommodate a fixed test size, α. In this study we accomplish two goals that lead to a general algorithm for conducting evidential MP linkage studies. (1) We provide two theoretical results that translate into guidelines for investigators conducting evidential MP linkage: (a) Comparing mods to lods, error rates (including probabilities of weak evidence) are generally higher for mods when the null hypothesis is true, but lower for mods in the presence of true linkage. Royall [J Am Stat Assoc 2000;95:760-780] has shown that errors based on lods are bounded and generally small. Therefore when the true disease model is unknown and one chooses to use mods, one needs to control misleading evidence rates only under the null hypothesis; (b) for any given pair of contiguous marker loci, error rates under the null are greatest at the midpoint between the markers spaced furthest apart, which provides an obvious simple alternative hypothesis to specify for planning MP linkage studies. (2) We demonstrate through extensive simulation that this evidential approach can yield low error rates under the null and alternative hypotheses for both lods and mods, despite the fact that mod scores are not true LRs. Using these results we provide a coherent approach to implement a MP linkage study using the evidential paradigm.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Genetic Linkage*
  • Humans
  • Lod Score*
  • Models, Genetic*