Every human possesses millions of distinct antibodies. It is now possible to analyze this diversity via next-generation sequencing of immunoglobulin genes (Ig-seq). This technique produces large volume sequence snapshots of B-cell receptors that are indicative of the antibody repertoire. In this paper, we enrich these large-scale sequence datasets with structural information. Enriching a sequence with its structural data allows better approximation of many vital features, such as its binding site and specificity. Here, we describe the structural annotation of antibodies pipeline that maps the outputs of large Ig-seq experiments to known antibody structures. We demonstrate the viability of our protocol on five separate Ig-seq datasets covering ca. 35 m unique amino acid sequences from ca. 600 individuals. Despite the great theoretical diversity of antibodies, we find that the majority of sequences coming from such studies can be reliably mapped to an existing structure.
Keywords: B-cell receptor; antibody specificity; bioinformatics tools; next-generation sequencing; protein; structural homology.