The progress and utility of synthetic biology is currently hindered by the lengthy process of studying literature and replicating poorly documented work. Reconstruction of crucial design information through post hoc curation is highly noisy and error-prone. To combat this, author participation during the curation process is crucial. To encourage author participation without overburdening them, an ML-assisted curation tool called SeqImprove has been developed. Using named entity recognition, called entity normalization, and sequence matching, SeqImprove creates machine-accessible sequence data and metadata annotations, which authors can then review and edit before submitting a final sequence file. SeqImprove makes it easier for authors to submit sequence data that is FAIR (findable, accessible, interoperable, and reusable).
Keywords: SBOL; machine learning; named entity normalization; named entity recognition; ontologies.