Benchmarking the performance of human antibody gene alignment utilities using a 454 sequence dataset

Katherine J L Jackson; Scott Boyd; Bruno A Gaëta; Andrew M Collins

doi:10.1093/bioinformatics/btq604

Benchmarking the performance of human antibody gene alignment utilities using a 454 sequence dataset

Bioinformatics. 2010 Dec 15;26(24):3129-30. doi: 10.1093/bioinformatics/btq604. Epub 2010 Oct 29.

Authors

Katherine J L Jackson¹, Scott Boyd, Bruno A Gaëta, Andrew M Collins

Affiliation

¹ School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia. [email protected]

PMID: 21036814
DOI: 10.1093/bioinformatics/btq604

Abstract

Motivation: Immunoglobulin heavy chain genes are formed by recombination of genes randomly selected from sets of IGHV, IGHD and IGHJ genes. Utilities have been developed to identify genes that contribute to observed VDJ rearrangements, but in the absence of datasets of known rearrangements, the evaluation of these utilities is problematic. We have analyzed thousands of VDJ rearrangements from an individual (S22) whose IGHV, IGHD and IGHJ genotype can be inferred from the dataset. Knowledge of this genotype means that the Stanford_S22 dataset can serve to benchmark the performance of IGH alignment utilities.

Results: We evaluated the performance of seven utilities. Failure to partition a sequence into genes present in the S22 genome was considered an error, and error rates for different utilities ranged from 7.1% to 13.7%.

Availability: Supplementary data includes the S22 genotypes and alignments. The Stanford_S22 dataset and an evaluation tool is available at http://www.emi.unsw.edu.au/~ihmmune/IGHUtilityEval/.

Publication types

Evaluation Study

MeSH terms

Benchmarking
Gene Rearrangement, B-Lymphocyte, Heavy Chain*
Genes, Immunoglobulin Heavy Chain*
Genotype
Humans
Sequence Alignment / methods*
Sequence Analysis, DNA