Background: CRISPR is a versatile gene editing tool which has revolutionized genetic research in the past few years. Optimizing sgRNA design to improve the efficiency of target/DNA cleavage is critical to ensure the success of CRISPR screens.
Results: By borrowing knowledge from oligonucleotide design and nucleosome occupancy models, we systematically evaluated candidate features computed from a number of nucleic acid, thermodynamic and secondary structure models on real CRISPR datasets. Our results showed that taking into account position-dependent dinucleotide features improved the design of effective sgRNAs with area under the receiver operating characteristic curve (AUC) >0.8, and the inclusion of additional features offered marginal improvement (∼2% increase in AUC).
Conclusion: Using a machine-learning approach, we proposed an accurate prediction model for sgRNA design efficiency. An R package predictSGRNA implementing the predictive model is available at http://www.ams.sunysb.edu/~pfkuan/softwares.html#predictsgrna .
Keywords: CRISPR; Machine learning; Predictive modeling; Thermodynamics.