SpecDB: A relational database for archiving biomolecular NMR spectral data

Keith J Fraga; Yuanpeng J Huang; Theresa A Ramelot; G V T Swapna; Arwin Lashawn Anak Kendary; Ethan Li; Ian Korf; Gaetano T Montelione

doi:10.1016/j.jmr.2022.107268

SpecDB: A relational database for archiving biomolecular NMR spectral data

J Magn Reson. 2022 Sep:342:107268. doi: 10.1016/j.jmr.2022.107268. Epub 2022 Jul 16.

Authors

Keith J Fraga¹, Yuanpeng J Huang², Theresa A Ramelot³, G V T Swapna⁴, Arwin Lashawn Anak Kendary⁵, Ethan Li⁶, Ian Korf⁷, Gaetano T Montelione⁸

Affiliations

¹ Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA. Electronic address: [email protected].
² Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA. Electronic address: [email protected].
³ Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA. Electronic address: [email protected].
⁴ Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA; Department of Pharmacology, Robert Wood Johnson Medical School, Rutgers The State University of New Jersey, Piscataway, NJ 08854, USA. Electronic address: [email protected].
⁵ Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA. Electronic address: [email protected].
⁶ Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA. Electronic address: [email protected].
⁷ Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA. Electronic address: [email protected].
⁸ Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA. Electronic address: [email protected].

Abstract

NMR is a valuable experimental tool in the structural biologist's toolkit to elucidate the structures, functions, and motions of biomolecules. The progress of machine learning, particularly in structural biology, reveals the critical importance of large, diverse, and reliable datasets in developing new methods and understanding in structural biology and science more broadly. Biomolecular NMR research groups produce large amounts of data, and there is renewed interest in organizing these data to train new, sophisticated machine learning architectures and to improve biomolecular NMR analysis pipelines. The foundational data type in NMR is the free-induction decay (FID). There are opportunities to build sophisticated machine learning methods to tackle long-standing problems in NMR data processing, resonance assignment, dynamics analysis, and structure determination using NMR FIDs. Our goal in this study is to provide a lightweight, broadly available tool for archiving FID data as it is generated at the spectrometer, and grow a new resource of FID data and associated metadata. This study presents a relational schema for storing and organizing the metadata items that describe an NMR sample and FID data, which we call Spectral Database (SpecDB). SpecDB is implemented in SQLite and includes a Python software library providing a command-line application to create, organize, query, backup, share, and maintain the database. This set of software tools and database schema allow users to store, organize, share, and learn from NMR time domain data. SpecDB is freely available under an open source license at https://github.rpi.edu/RPIBioinformatics/SpecDB.

Keywords: Biomolecular NMR; Machine learning; SQL; Spectrum database.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Magnetic Resonance Spectroscopy / methods
Nuclear Magnetic Resonance, Biomolecular / methods
Software*

Abstract

Publication types

MeSH terms

Grants and funding