Peanut (Arachis hypogaea L.) is a staple crop in semiarid tropical and subtropical regions. Although the genome of peanut has been fully sequenced, the current gene annotations are still incomplete. New technologies in genomics and proteomics have resulted in the emergence of proteogenomics, which can integrate genomic, transcriptomic, and proteomic data for improving gene annotation. In the present study, we collected RNA-seq and proteomic data from multiple tissues such as seed, shell, and gynophore of peanut and utilized a proteogenomic approach to improve the gene annotation of peanut based on these data. A total of 1 935 655 904 RNA-seq reads and 7 490 280 MS/MS spectra were collected. Ultimately, 13 767 annotated genes were found with evidence at the protein level, and seven novel protein-coding genes were found with both RNA-seq and proteomics evidence. In addition, 35 gene models were updated based on proteomics data. Proteogenomic approaches improved the gene annotation in certain aspects by integrating both RNA-seq and proteomic data. We expect that these approaches could help improve existing genome annotations of other species.
Keywords: RNA-seq; bioinformatics; gene annotation; proteogenomics; proteomics.