Motivation: The antigen receptors of adaptive immunity-T-cell receptors and immunoglobulins-are encoded by genes assembled stochastically from combinatorial libraries of gene segments. Immunoglobulin genes then experience further diversification through hypermutation. Analysis of the somatic genetics of the immune response depends explicitly on inference of the details of the recombinatorial process giving rise to each of the participating antigen receptor genes. We have developed a dynamic programming algorithm to perform this reconstruction and have implemented it as web-accessible software called SoDA (Somatic Diversification Analysis).
Results: We tested SoDA against a set of 120 artificial immunoglobulin sequences generated by simulation of recombination and compared the results with two other widely used programs. SoDA inferred the correct gene segments more frequently than the other two programs. We further tested these programs using 30 human immunoglobulin genes from Genbank and here highlight instances where the recombinations inferred by the three programs differ. SoDA appears generally to find more likely recombinations.