Commonly used Hardy-Weinberg equilibrium filtering schemes impact population structure inferences using RADseq data

Mol Ecol Resour. 2022 Oct;22(7):2599-2613. doi: 10.1111/1755-0998.13646. Epub 2022 Jun 5.

Abstract

Reduced representation sequencing (RRS) is a widely used method to assay the diversity of genetic loci across the genome of an organism. The dominant class of RRS approaches assay loci associated with restriction sites within the genome (restriction site associated DNA sequencing, or RADseq). RADseq is frequently applied to non-model organisms since it enables population genetic studies without relying on well-characterized reference genomes. However, RADseq requires the use of many bioinformatic filters to ensure the quality of genotyping calls. These filters can have direct impacts on population genetic inference, and therefore require careful consideration. One widely used filtering approach is the removal of loci that do not conform to expectations of Hardy-Weinberg equilibrium (HWE). Despite being widely used, we show that this filtering approach is rarely described in sufficient detail to enable replication. Furthermore, through analyses of in silico and empirical data sets we show that some of the most widely used HWE filtering approaches dramatically impact inference of population structure. In particular, the removal of loci exhibiting departures from HWE after pooling across samples significantly reduces the degree of inferred population structure within a data set (despite this approach being widely used). Based on these results, we provide recommendations for best practice regarding the implementation of HWE filtering for RADseq data sets.

Keywords: Hardy-Weinberg; RADseq; population genetics; population genomics; reduced representation sequencing.

MeSH terms

  • Computational Biology* / methods
  • Genetics, Population*
  • Genome
  • Sequence Analysis, DNA / methods