The GFF3toolkit: QC and Merge Pipeline for Genome Annotation

Methods Mol Biol. 2019:1858:75-87. doi: 10.1007/978-1-4939-8775-7_7.

Abstract

The GFF3toolkit ( https://github.com/NAL-i5K/GFF3toolkit ) supported by the i5k Workspace@NAL provides a suite of tools to handle gene annotations in GFF3 format from arthropod genome projects and their research communities. To improve GFF3 formatting of gene annotations, a quality control and merge procedure is proposed along with the GFF3toolkit. In particular, the toolkit provides functions to sort a GFF3 file, detect GFF3 format errors, merge two GFF3 files, and generate biological sequences from a GFF3 file. This chapter explains when and how to use the provided tools to obtain nonredundant arthropod gene sets in high quality.

Keywords: Arthropods; Community annotation; GFF3; Gene annotations; Genomics; I5k; Insects.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Computational Biology / methods*
  • Genome, Insect*
  • High-Throughput Nucleotide Sequencing / methods
  • Insecta / genetics*
  • Molecular Sequence Annotation / methods*
  • Quality Control*
  • Sequence Analysis, DNA / methods*
  • Software*