Fluidigm2PURC: Automated processing and haplotype inference for double-barcoded PCR amplicons

Paul D Blischak; Maribeth Latvis; Diego F Morales-Briones; Jens C Johnson; Verónica S Di Stilio; Andrea D Wolfe; David C Tank

doi:10.1002/aps3.1156

Fluidigm2PURC: Automated processing and haplotype inference for double-barcoded PCR amplicons

Appl Plant Sci. 2018 Jun 28;6(6):e01156. doi: 10.1002/aps3.1156. eCollection 2018 Jun.

Authors

Paul D Blischak¹, Maribeth Latvis², Diego F Morales-Briones³, Jens C Johnson⁴, Verónica S Di Stilio⁴, Andrea D Wolfe¹, David C Tank^{5

6

7}

Affiliations

¹ Department of Evolution, Ecology, and Organismal Biology The Ohio State University 318 W. 12th Avenue Columbus Ohio 43210-1242 USA.
² Department of Natural Resource Management South Dakota State University 1390 College Avenue Brookings South Dakota 57007-1696 USA.
³ Department of Plant and Microbial Biology University of Minnesota 1479 Gortner Avenue Saint Paul Minnesota 55108-1095 USA.
⁴ Department of Biology University of Washington Seattle Washington 98195-1800 USA.
⁵ Department of Biological Sciences University of Idaho 875 Perimeter Drive, MS 3051 Moscow Idaho 83844-3051 USA.
⁶ Stillinger Herbarium University of Idaho 875 Perimeter Drive, MS 1133 Moscow Idaho 83844-1133 USA.
⁷ Institute for Bioinformatics and Evolutionary Studies (IBEST) University of Idaho 875 Perimeter Drive, MS 3051 Moscow Idaho 83844-3051 USA.

Abstract

Premise of the study: Targeted enrichment strategies for phylogenomic inference are a time- and cost-efficient way to collect DNA sequence data for large numbers of individuals at multiple, independent loci. Automated and reproducible processing of these data is a crucial step for researchers conducting phylogenetic studies.

Methods and results: We present Fluidigm2PURC, an open source Python utility for processing paired-end Illumina data from double-barcoded PCR amplicons. In combination with the program PURC (Pipeline for Untangling Reticulate Complexes), our scripts process raw FASTQ files for analysis with PURC and use its output to infer haplotypes for diploids, polyploids, and samples with unknown ploidy. We demonstrate the use of the pipeline with an example data set from the genus Thalictrum (Ranunculaceae).

Conclusions: Fluidigm2PURC is freely available for Unix-like operating systems on GitHub (https://github.com/pblischak/fluidigm2purc) and for all operating systems through Docker (https://hub.docker.com/r/pblischak/fluidigm2purc).

Keywords: bioinformatics; haplotype inference; high‐throughput sequencing; microfluidic PCR; phylogenomics; polyploidy.

Associated data

Dryad/10.5061/dryad.89k5k30