The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes

Danny Challis; Lilian Antunes; Erik Garrison; Eric Banks; Uday S Evani; Donna Muzny; Ryan Poplin; Richard A Gibbs; Gabor Marth; Fuli Yu

doi:10.1186/s12864-015-1333-7

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes

BMC Genomics. 2015 Feb 28;16(1):143. doi: 10.1186/s12864-015-1333-7.

Authors

Danny Challis^{1

2

3}, Lilian Antunes^{4

5

6}, Erik Garrison⁷, Eric Banks⁸, Uday S Evani^{9

10

11}, Donna Muzny^{12

13}, Ryan Poplin¹⁴, Richard A Gibbs^{15

16}, Gabor Marth^{17

18}, Fuli Yu^{19

20

21}

Affiliations

¹ Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
² Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
³ Present address: Monsanto Company, Ankeny, IA, 50021, USA. [email protected].
⁴ Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
⁵ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
⁶ Present address: Washington University School of Medicine, Saint Louis, MO, 63110, USA. [email protected].
⁷ Department of Biology, Boston College, Wellcome Trust Sanger Institute, Chestnut Hill, MA, 02467, USA. [email protected].
⁸ Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA. [email protected].
⁹ Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
¹⁰ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
¹¹ Present address: New York Genome Center, New York, NY, 10013, USA. [email protected].
¹² Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
¹³ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
¹⁴ Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA. [email protected].
¹⁵ Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
¹⁶ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
¹⁷ Department of Biology, Boston College, Wellcome Trust Sanger Institute, Chestnut Hill, MA, 02467, USA. [email protected].
¹⁸ Present address: Department of Human Genetics and Utah Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, 84112, USA. [email protected].
¹⁹ Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
²⁰ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. [email protected].
²¹ Institute of Neurology, Tianjin Medical University General Hospital, Tianjin, 300052, China. [email protected].

Abstract

Background: Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls.

Results: This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%.

Conclusions: In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Computational Biology
Exome / genetics*
Genome, Human
High-Throughput Nucleotide Sequencing
Human Genome Project
Humans
INDEL Mutation / genetics*
Machine Learning
Mutagenesis*

Abstract

Publication types

MeSH terms

Grants and funding