Curation Principles Derived from the Analysis of the SBOL iGEM Data Set

ACS Synth Biol. 2021 Oct 15;10(10):2592-2606. doi: 10.1021/acssynbio.1c00225. Epub 2021 Sep 21.

Abstract

As an engineering endeavor, synthetic biology requires effective sharing of genetic design information that can be reused in the construction of new designs. While there are a number of large community repositories of design information, curation of this information has been limited. This in turn limits the ways in which design information can be put to use. The aim of this work was to improve this situation by creating a curated library of parts from the International Genetically Engineered Machines (iGEM) registry data set. To this end, an analysis of the Synthetic Biology Open Language (SBOL) version of the iGEM registry was carried out using four different approaches-simple statistics, SnapGene autoannotation, SYNBICT autoannotation, and expert analysis-the results of which are presented herein. Key challenges encountered include the use of free text, insufficient part provenance, part duplication, lack of part removal, and insufficient continuous curation. On the basis of these analyses, the focus has shifted from the creation of a curated iGEM part library to instead the extraction of a set of lessons, which are presented here. These lessons can be exploited to facilitate the creation and curation of other part libraries using a simpler and less labor intensive process.

Keywords: SBOL; SYNBICT; SynBioHub; analysis; annotation; automation; curation; iGEM.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Automation
  • Datasets as Topic*
  • Programming Languages
  • Synthetic Biology / methods*