gmRAD: an integrated SNP calling pipeline for genetic mapping with RADseq across a hybrid population

Brief Bioinform. 2020 Jan 17;21(1):329-337. doi: 10.1093/bib/bby114.

Abstract

Restriction site-associated DNA sequencing (RADseq) is a powerful technology that has been extensively applied in population genetics, phylogenetics and genetic mapping. Although many software packages are available for ecological and evolutionary studies, a few effective tools are available for extracting genotype data with RADseq for genetic mapping, a prerequisite for quantitative trait locus mapping, comparative genomics and genome scaffold assembly. Here, we present an integrated pipeline called gmRAD for generating single nucleotide polymorphism (SNP) genotypes from RADseq data, de novo, across a genetic mapping population derived by crossing two parents. As an analytical strategy, the software takes five steps to implement the whole algorithms, including clustering the first (forward) reads of each parent, building two parental references, generating parental SNP catalogs, calling SNP genotypes across all individuals and filtering the genotype data for genetic linkage mapping. All the steps can be completed with a simple command line, but they can be also performed optionally if prerequisite files are available. To validate its application, we also performed a real data analysis with RADseq data from an F1 hybrid population derived by crossing Populus deltoides and Populus simonii. The software gmRAD is freely available at https://github.com/tongchf/gmRAD.

Keywords: genetic mapping; genotype; restriction site-associated DNA sequencing; single nucleotide polymorphism.