A Novel Framework for Characterizing Genomic Haplotype Diversity in the Human Immunoglobulin Heavy Chain Locus

Oscar L Rodriguez; William S Gibson; Tom Parks; Matthew Emery; James Powell; Maya Strahl; Gintaras Deikus; Kathryn Auckland; Evan E Eichler; Wayne A Marasco; Robert Sebra; Andrew J Sharp; Melissa L Smith; Ali Bashir; Corey T Watson

doi:10.3389/fimmu.2020.02136

A Novel Framework for Characterizing Genomic Haplotype Diversity in the Human Immunoglobulin Heavy Chain Locus

Front Immunol. 2020 Sep 23:11:2136. doi: 10.3389/fimmu.2020.02136. eCollection 2020.

Authors

Oscar L Rodriguez¹, William S Gibson², Tom Parks³, Matthew Emery¹, James Powell¹, Maya Strahl¹, Gintaras Deikus¹, Kathryn Auckland³, Evan E Eichler^{4

5}, Wayne A Marasco⁶, Robert Sebra^{1

7}, Andrew J Sharp¹, Melissa L Smith^{1

2

7}, Ali Bashir¹, Corey T Watson²

Affiliations

¹ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
² Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States.
³ Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
⁴ Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States.
⁵ Howard Hughes Medical Institute, University of Washington, Seattle, WA, United States.
⁶ Department of Cancer Immunology and AIDS, Dana-Farber Cancer Institute, Department of Medicine, Harvard Medical School, Boston, MA, United States.
⁷ Icahn Institute of Data Science and Genomic Technology, New York, NY, United States.

Abstract

An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody-mediated processes. Due to locus complexity, standard high-throughput approaches have failed to accurately and comprehensively capture IGH polymorphism. As a result, the locus has only been fully characterized two times, severely limiting our knowledge of human IGH diversity. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize IGH variation in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, identifying 2 novel structural variants and 15 novel IGH alleles. We show multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and >97% decrease in false-positives) and array/imputation-based datasets. This framework establishes a desperately needed foundation for leveraging IG genomic data to study population-level variation in antibody-mediated immunity, critical for bettering our understanding of disease risk, and responses to vaccines and therapeutics.

Keywords: B cell receptor; antibody; immunoglobulin heavy chain locus; long-read sequencing; single nucleotide variation; structural variation.

Publication types

Comparative Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Cell Line
Computational Biology / methods*
Data Display
Datasets as Topic
Family
Gene Library
Genes, Immunoglobulin*
Genetic Variation*
Genomic Structural Variation
Genotyping Techniques*
Haplotypes / genetics*
Humans
Immunoglobulin Heavy Chains / genetics*
Molecular Sequence Annotation
Polymorphism, Genetic*
Sequence Alignment
Sequence Analysis, DNA
Sequence Homology, Nucleic Acid
User-Computer Interface
Workflow

Substances

Immunoglobulin Heavy Chains

Abstract

Publication types

MeSH terms

Substances

Grants and funding