Automated identification of sequence-tailored Cas9 proteins using massive metagenomic data

Nat Commun. 2022 Oct 29;13(1):6474. doi: 10.1038/s41467-022-34213-9.

Abstract

The identification of the protospacer adjacent motif (PAM) sequences of Cas9 nucleases is crucial for their exploitation in genome editing. Here we develop a computational pipeline that was used to interrogate a massively expanded dataset of metagenome and virome assemblies for accurate and comprehensive PAM predictions. This procedure allows the identification and isolation of sequence-tailored Cas9 nucleases by using the target sequence as bait. As proof of concept, starting from the disease-causing mutation P23H in the RHO gene, we find, isolate and experimentally validate a Cas9 which uses the mutated sequence as PAM. Our PAM prediction pipeline will be instrumental to generate a Cas9 nuclease repertoire responding to any PAM requirement.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • CRISPR-Associated Protein 9* / genetics
  • CRISPR-Associated Protein 9* / metabolism
  • CRISPR-Cas Systems* / genetics
  • Endonucleases / metabolism
  • Gene Editing / methods
  • Metagenome
  • RNA, Guide, CRISPR-Cas Systems

Substances

  • CRISPR-Associated Protein 9
  • Endonucleases