Genome mining for anti-CRISPR operons using machine learning

Bioinformatics. 2023 May 4;39(5):btad309. doi: 10.1093/bioinformatics/btad309.

Abstract

Motivation: Encoded by (pro-)viruses, anti-CRISPR (Acr) proteins inhibit the CRISPR-Cas immune system of their prokaryotic hosts. As a result, Acr proteins can be employed to develop more controllable CRISPR-Cas genome editing tools. Recent studies revealed that known acr genes often coexist with other acr genes and with phage structural genes within the same operon. For example, we found that 47 of 98 known acr genes (or their homologs) co-exist in the same operons. None of the current Acr prediction tools have considered this important genomic context feature. We have developed a new software tool AOminer to facilitate the improved discovery of new Acrs by fully exploiting the genomic context of known acr genes and their homologs.

Results: AOminer is the first machine learning based tool focused on the discovery of Acr operons (AOs). A two-state HMM (hidden Markov model) was trained to learn the conserved genomic context of operons that contain known acr genes or their homologs, and the learnt features could distinguish AOs and non-AOs. AOminer allows automated mining for potential AOs from query genomes or operons. AOminer outperformed all existing Acr prediction tools with an accuracy = 0.85. AOminer will facilitate the discovery of novel anti-CRISPR operons.

Availability and implementation: The webserver is available at: http://aca.unl.edu/AOminer/AOminer_APP/. The python program is at: https://github.com/boweny920/AOminer.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteriophages* / genetics
  • CRISPR-Cas Systems / genetics
  • Gene Editing
  • Machine Learning
  • Operon
  • Viral Proteins* / genetics

Substances

  • Viral Proteins