MetReS, an Efficient Database for Genomic Applications

J Comput Biol. 2018 Feb;25(2):200-213. doi: 10.1089/cmb.2017.0103. Epub 2017 Nov 29.

Abstract

MetReS (Metabolic Reconstruction Server) is a genomic database that is shared between two software applications that address important biological problems. Biblio-MetReS is a data-mining tool that enables the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the processes of interest and their function. The main goal of this work was to identify the areas where the performance of the MetReS database performance could be improved and to test whether this improvement would scale to larger datasets and more complex types of analysis. The study was started with a relational database, MySQL, which is the current database server used by the applications. We also tested the performance of an alternative data-handling framework, Apache Hadoop. Hadoop is currently used for large-scale data processing. We found that this data handling framework is likely to greatly improve the efficiency of the MetReS applications as the dataset and the processing needs increase by several orders of magnitude, as expected to happen in the near future.

Keywords: Big data.; Hadoop; MySQL; genomic database.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Data Mining / methods*
  • Databases, Genetic*
  • Genomics / methods*
  • Humans
  • Software*
  • Whole Genome Sequencing / methods*