Accurate microRNA annotation of animal genomes using trained covariance models of curated microRNA complements in MirMachine

Cell Genom. 2023 Jun 23;3(8):100348. doi: 10.1016/j.xgen.2023.100348. eCollection 2023 Aug 9.

Abstract

The annotation of microRNAs depends on the availability of transcriptomics data and expert knowledge. This has led to a gap between the availability of novel genomes and high-quality microRNA complements. Using >16,000 microRNAs from the manually curated microRNA gene database MirGeneDB, we generated trained covariance models for all conserved microRNA families. These models are available in our tool MirMachine, which annotates conserved microRNAs within genomes. We successfully applied MirMachine to a range of animal species, including those with large genomes and genome duplications and extinct species, where small RNA sequencing is hard to achieve. We further describe a microRNA score of expected microRNAs that can be used to assess the completeness of genome assemblies. MirMachine closes a long-persisting gap in the microRNA field by facilitating automated genome annotation pipelines and deeper studies into the evolution of genome regulation, even in extinct organisms.

Keywords: evolution; genome annotation; genomics; machine learning; microRNAs.