vcferr: Development, validation, and application of a single nucleotide polymorphism genotyping error simulation framework

F1000Res. 2022 Jul 11:11:775. doi: 10.12688/f1000research.122840.1. eCollection 2022.

Abstract

Motivation: Genotyping error can impact downstream single nucleotide polymorphism (SNP)-based analyses. Simulating various modes and levels of error can help investigators better understand potential biases caused by miscalled genotypes. Methods: We have developed and validated vcferr, a tool to probabilistically simulate genotyping error and missingness in variant call format (VCF) files. We demonstrate how vcferr could be used to address a research question by introducing varying levels of error of different type into a sample in a simulated pedigree, and assessed how kinship analysis degrades as a function of the kind and type of error. Software availability: vcferr is available for installation via PyPi (https://pypi.org/project/vcferr/) or conda (https://anaconda.org/bioconda/vcferr). The software is released under the MIT license with source code available on GitHub (https://github.com/signaturescience/vcferr).

Keywords: GWAS; benchmarking; bioinformatics; genealogy; kinship; python; simulation.

Grants and funding

This work was supported in part by award 2019-DU-BX-0046 (Dense DNA Data for Enhanced Missing Persons Identification) to B.B., awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice and by internal funds from the Center for Human Identification. The opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect those of the U.S. Department of Justice.