Precursors of microRNAs (pre-miRNAs) are less used in silico to mine miRNAs. This study developed PmiR-Select® based on covariance models (CMs) to identify new pre-miRNAs, detecting conserved secondary structural features across RNA sequences and eliminating the redundancy. The pipeline preceded PmiR-Select® filtered 20% plant pre-miRNAs (from 38589 to 8677) from miRBase. The second filter reduced pre-miRNAs by 7% (from 8677 to 8045) through length limit to pre-miRNAs (70-300 nt) and miRNAs (20-24 nt). The 80% redundancy threshold was statistically the best, eliminating 55% pre-miRNAs (from 8045 to 3608). Angiosperms retained the highest number of pre-miRNAs and their families (2981 and 2202), followed by gymnosperms (362 and 271), bryophytes (183 and 119), and algae (82 and 78). Thirty-seven conserved pre-miRNA families happened among plant land clades, but none with algae. The PmiR-Select® was applied to the rice genome, producing 8536 pre-miRNAs from 36 families. The 80% redundancy threshold retained 3% pre-miRNAs (n = 264) from 36 families, valuable experimental and computational research resources. 14% (n = 1216) of 8536 were new pre-miRNAs from 19 new families in rice. Only 16 new sequences from six families overlapped (39 to 54% identities) with rice pre-miRNAs and five species on miRBase. The validation against mature miRNAs identified 8086 pre-miRNAs from 13 families. Eleven ones have already been recorded, but two new and abundant pre-miRNAs [miR437 (n = 296) and miR1435 (n = 725)] scattered in all 12-rice chromosomes. PmiR-Select® identified pre-miRNAs, decreased the redundancy, and discovered new miRNAs. These findings pave the way to delineating benchtop and computational experiments.
Keywords: Conserved pre-miRNAs; Covariance models; New pre-miRNAs; Plant pre-miRNAs; Sequence redundancy.
© 2024. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.