While the production of a draft genome has become more accessible due to long-read sequencing, the annotation of these new genomes has not been developed at the same pace. Long-read RNA sequencing (lrRNA-seq) offers a promising solution for enhancing gene annotation. In this study, we explore how sequencing platforms, Oxford Nanopore R9.4.1 chemistry or PacBio Sequel II CCS, and data processing methods influence evidence-driven genome annotation using long reads. Incorporating PacBio transcripts into our annotation pipeline significantly outperformed traditional methods, such as ab initio predictions and short-read-based annotations. We applied this strategy to a nonmodel species, the Florida manatee, and compared our results to existing short-read-based annotation. At the loci level, both annotations were highly concordant, with 90% agreement. However, at the transcript level, the agreement was only 35%. We identified 4,906 novel loci, represented by 5,707 isoforms, with 64% of these isoforms matching known sequences in other mammalian species. Overall, our findings underscore the importance of using high-quality curated transcript models in combination with ab initio methods for effective genome annotation.
Published by Cold Spring Harbor Laboratory Press.