Purpose: Osteosarcoma research advancement requires enhanced data integration across different modalities and sources. Current osteosarcoma research, encompassing clinical, genomic, protein, and tissue imaging data, is hindered by the siloed landscape of data generation and storage.
Materials and methods: Clinical, molecular profiling, and tissue imaging data for 573 patients with pediatric osteosarcoma were collected from four public and institutional sources. A common data model incorporating standardized terminology was created to facilitate the transformation, integration, and load of source data into a relational database. On the basis of this database, a data commons accompanied by a user-friendly web portal was developed, enabling various data exploration and analytics functions.
Results: The Osteosarcoma Explorer (OSE) was released to the public in 2021. Leveraging a comprehensive and harmonized data set on the backend, the OSE offers a wide range of functions, including Cohort Discovery, Patient Dashboard, Image Visualization, and Online Analysis. Since its initial release, the OSE has experienced an increasing utilization by the osteosarcoma research community and provided solid, continuous user support. To our knowledge, the OSE is the largest (N = 573) and most comprehensive research data commons for pediatric osteosarcoma, a rare disease. This project demonstrates an effective framework for data integration and data commons development that can be readily applied to other projects sharing similar goals.
Conclusion: The OSE offers an online exploration and analysis platform for integrated clinical, molecular profiling, and tissue imaging data of osteosarcoma. Its underlying data model, database, and web framework support continuous expansion onto new data modalities and sources.