Benchmarking the performance of human antibody gene alignment utilities using a 454 sequence dataset

Bioinformatics. 2010 Dec 15;26(24):3129-30. doi: 10.1093/bioinformatics/btq604. Epub 2010 Oct 29.

Abstract

Motivation: Immunoglobulin heavy chain genes are formed by recombination of genes randomly selected from sets of IGHV, IGHD and IGHJ genes. Utilities have been developed to identify genes that contribute to observed VDJ rearrangements, but in the absence of datasets of known rearrangements, the evaluation of these utilities is problematic. We have analyzed thousands of VDJ rearrangements from an individual (S22) whose IGHV, IGHD and IGHJ genotype can be inferred from the dataset. Knowledge of this genotype means that the Stanford_S22 dataset can serve to benchmark the performance of IGH alignment utilities.

Results: We evaluated the performance of seven utilities. Failure to partition a sequence into genes present in the S22 genome was considered an error, and error rates for different utilities ranged from 7.1% to 13.7%.

Availability: Supplementary data includes the S22 genotypes and alignments. The Stanford_S22 dataset and an evaluation tool is available at http://www.emi.unsw.edu.au/~ihmmune/IGHUtilityEval/.

Publication types

  • Evaluation Study

MeSH terms

  • Benchmarking
  • Gene Rearrangement, B-Lymphocyte, Heavy Chain*
  • Genes, Immunoglobulin Heavy Chain*
  • Genotype
  • Humans
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA