Spinning convincing stories for both true and false association signals

Genet Epidemiol. 2019 Jun;43(4):356-364. doi: 10.1002/gepi.22189. Epub 2019 Jan 18.

Abstract

When interpreting genome-wide association peaks, it is common to annotate each peak by searching for genes with plausible relationships to the trait. However, "all that glitters is not gold"-one might interpret apparent patterns in the data as plausible even when the peak is a false positive. Accordingly, we sought to see how human annotators interpreted association results containing a mixture of peaks from both the original trait and a genetically uncorrelated "synthetic" trait. Two of us prepared a mix of original and synthetic peaks of three significance categories from five different scans along with relevant literature search results and then we all annotated these regions. Three annotators also scored the strength of evidence connecting each peak to the scanned trait and the likelihood of further studying that region. While annotators found original peaks to have stronger evidence (p Bonferroni = 0.017) and higher likelihood of further study ( p Bonferroni = 0.006) than synthetic peaks, annotators often made convincing connections between the synthetic peaks and the original trait, finding these connections 55% of the time. These results show that it is not difficult for annotators to make convincing connections between synthetic association signals and genes found in those regions.

Keywords: association peaks; false positives; genome-wide association studies; literature review.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Data Curation* / methods
  • Data Curation* / standards
  • Data Curation* / statistics & numerical data
  • Data Interpretation, Statistical*
  • Deception
  • False Positive Reactions*
  • Genome-Wide Association Study / standards
  • Genome-Wide Association Study / statistics & numerical data*
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide