Generic and queryable data integration schema for transcriptomics and epigenomics studies

Comput Struct Biotechnol J. 2024 Nov 19:23:4232-4241. doi: 10.1016/j.csbj.2024.11.022. eCollection 2024 Dec.

Abstract

The expansion of multi-omics datasets raises significant challenges for data integration and querying. To overcome these challenges, we developed a generic RDF-based integration schema that connects various types of differential -omics data, epigenomics, and regulatory information. This schema employs the FALDO ontology to enable querying based on genomic locations. It is designed to be fully or partially populated, providing both flexibility and extensibility while supporting complex queries. We validated the schema by reproducing two recently published studies, one in biomedicine and the other in environmental science, proving its genericity and its ability to integrate data efficiently. This schema serves as an effective tool for managing and querying a wide range of multi-omics datasets.

Keywords: Data integration; Integration schema; Multi-omics analysis; Semantic web.