IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data

Genomics Proteomics Bioinformatics. 2020 Apr;18(2):161-172. doi: 10.1016/j.gpb.2018.12.011. Epub 2020 Jul 16.

Abstract

Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function. Although the availability of massive RNA-seq data provides great opportunities for gene model refinement, few efforts have been made to adopt these precious data in rice genome reannotation. Here we reannotate the rice (Oryza sativa L. ssp. japonica) genome based on integration of large-scale RNA-seq data and release a new annotation system IC4R-2.0. In general, IC4R-2.0 significantly improves the completeness of gene structure, identifies a number of novel genes, and integrates a variety of functional annotations. Furthermore, long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) are systematically characterized in the rice genome. Performance evaluation shows that compared to previous annotation systems, IC4R-2.0 achieves higher integrity and quality, primarily attributable to massive RNA-seq data applied in genome annotation. Consequently, we incorporate the improved annotations into the Information Commons for Rice (IC4R), a database integrating multiple omics data of rice, and accordingly update IC4R by providing more user-friendly web interfaces and implementing a series of practical online tools. Together, the updated IC4R, which is equipped with the improved annotations, bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species. The IC4R-2.0 annotation system and related resources are freely accessible at http://ic4r.org/.

Keywords: Gene model; Genome reannotation; IC4R; RNA-seq; Rice.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Gene Expression Regulation, Plant
  • Genes, Plant
  • Genome, Plant*
  • Molecular Sequence Annotation*
  • Organ Specificity / genetics
  • Oryza / genetics*
  • Phylogeny
  • Plant Proteins / chemistry
  • Plant Proteins / genetics
  • RNA, Long Noncoding / genetics
  • RNA, Long Noncoding / metabolism
  • RNA-Seq*
  • Statistics as Topic

Substances

  • Plant Proteins
  • RNA, Long Noncoding