Progress, challenge and prospect of plant plastome annotation

Front Plant Sci. 2023 May 30:14:1166140. doi: 10.3389/fpls.2023.1166140. eCollection 2023.

Abstract

The plastome (plastid genome) represents an indispensable molecular data source for studying phylogeny and evolution in plants. Although the plastome size is much smaller than that of nuclear genome, and multiple plastome annotation tools have been specifically developed, accurate annotation of plastomes is still a challenging task. Different plastome annotation tools apply different principles and workflows, and annotation errors frequently occur in published plastomes and those issued in GenBank. It is therefore timely to compare available annotation tools and establish standards for plastome annotation. In this review, we review the basic characteristics of plastomes, trends in the publication of new plastomes, the annotation principles and application of major plastome annotation tools, and common errors in plastome annotation. We propose possible methods to judge pseudogenes and RNA-editing genes, jointly consider sequence similarity, customed algorithms, conserved domain or protein structure. We also propose the necessity of establishing a database of reference plastomes with standardized annotations, and put forward a set of quantitative standards for evaluating plastome annotation quality for the scientific community. In addition, we discuss how to generate standardized GenBank annotation flatfiles for submission and downstream analysis. Finally, we prospect future technologies for plastome annotation integrating plastome annotation approaches with diverse evidences and algorithms of nuclear genome annotation tools. This review will help researchers more efficiently use available tools to achieve high-quality plastome annotation, and promote the process of standardized annotation of the plastome.

Keywords: RNA-editing genes; annotation standards; chloroplast genome; plastome; protein structure; pseudogenes.

Publication types

  • Review

Grants and funding

This work was funded by the Natural Science Foundation of Shandong Province (no. ZR2020QC022 to X-JQ), the National Natural Science Foundation of China (key international (regional) cooperative research project no. 31720103903 to T-SY and DES), the CAS President’s International Fellowship Initiative (no. 2020PB0009 to GS) and the China Postdoctoral Science Foundation (CPSF) International Postdoctoral Exchange Program (to GS).