While widespread genome sequencing ushers in a new era of preventive medicine, the tools for predictive genomics are still lacking. Time and resource limitations mean that human diseases remain uncharacterized because of an inability to predict clinically relevant genetic variants. A strategy of targeting highly conserved protein regions is used commonly in functional studies. However, this benefit is lost for rare diseases where the attributable genes are mostly conserved. An immunological disorder exemplifying this challenge occurs through damaging mutations in RAG1 and RAG2 which presents at an early age with a distinct phenotype of life-threatening immunodeficiency or autoimmunity. Many tools exist for variant pathogenicity prediction, but these cannot account for the probability of variant occurrence. Here, we present a method that predicts the likelihood of mutation for every amino acid residue in the RAG1 and RAG2 proteins. Population genetics data from approximately 146,000 individuals was used for rare variant analysis. Forty-four known pathogenic variants reported in patients and recombination activity measurements from 110 RAG1/2 mutants were used to validate calculated scores. Probabilities were compared with 98 currently known human cases of disease. A genome sequence dataset of 558 patients who have primary immunodeficiency but that are negative for RAG deficiency were also used as validation controls. We compared the difference between mutation likelihood and pathogenicity prediction. Our method builds a map of most probable mutations allowing pre-emptive functional analysis. This method may be applied to other diseases with hopes of improving preparedness for clinical diagnosis.
Keywords: Recombination activating genes 1 and 2 (RAG1, RAG2); genomics; pathogenic variant; predictive.