A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL

F1000Res. 2019 Oct 29:8:1822. doi: 10.12688/f1000research.21027.2. eCollection 2019.

Abstract

The increasing use of Semantic Web technologies in the life sciences, in particular the use of the Resource Description Framework (RDF) and the RDF query language SPARQL, opens the path for novel integrative analyses, combining information from multiple data sources. However, analyzing evolutionary data in RDF is not trivial, due to the steep learning curve required to understand both the data models adopted by different RDF data sources, as well as the equivalent SPARQL constructs required to benefit from this data - in particular, recursive property paths. In this article, we provide a hands-on introduction to querying evolutionary data across several data sources that publish orthology information in RDF, namely: The Orthologous MAtrix (OMA), the European Bioinformatics Institute (EBI) RDF platform, the Database of Orthologous Groups (OrthoDB) and the Microbial Genome Database (MBGD). We present four protocols in increasing order of complexity. In these protocols, we demonstrate through SPARQL queries how to retrieve pairwise orthologs, homologous groups, and hierarchical orthologous groups. Finally, we show how orthology information in different data sources can be compared, through the use of federated SPARQL queries.

Keywords: Comparative Genomics; Orthology; Resource Description Framework (RDF); SPARQL; Sequence Homology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Evolution*
  • Computational Biology*
  • Databases, Factual
  • Genome, Microbial
  • Information Storage and Retrieval*
  • Programming Languages*

Grants and funding

This work was funded by the Swiss National Research Programme 75 “Big Data” (Grant 167149) and a Swiss National Science Foundation Professorship grant to CD (Grant 150654).