Background: Genotype-phenotype analyses of rare diseases often suffer from a lack of power, due to small sample size, which makes identifying significant associations difficult. Sinusoidal obstruction syndrome (SOS) of the liver is a rare but life-threatening complication of hematopoietic stem cell transplantation (HSCT). The alkylating agent busulfan is commonly used in HSCT and known to trigger SOS. We developed a novel pipeline to identify genetic determinants in rare diseases by combining in vitro information with clinical whole-exome sequencing (WES) data and applied it in SOS patients and controls.
Methods: First, we analysed differential gene expression in six lymphoblastoid cell lines (LCLs) before and after incubation with busulfan. Second, we used WES data from 87 HSCT patients and estimated the association with SOS at the SNP and the gene levels. We then combined the results of the expression and the association analyses into an association statistic at the gene level. We used an over-representation analysis to functionally characterize the genes that were associated with a significant combined test statistic.
Results: After treatment of LCLs with busulfan, 1708 genes were significantly up-, and 1385 down-regulated. The combination of the expression experiment and the association analysis of WES data into a single test statistic revealed 35 genes associated with the outcome. These genes are involved in various biological functions and processes, such as "Cell growth and death", "Signalling molecules and interaction", "Cancer", and "Infectious disease".
Conclusions: This novel data analysis pipeline integrates two independent omics datasets and increases statistical power for identifying genotype-phenotype associations. The analysis of the transcriptomics profile of cell lines treated with busulfan and WES data from HSCT patients allowed us to identify potential genetic contributors to SOS. Our pipeline could be useful for identifying genetic contributors to other rare diseases where limited power renders genome-wide analyses unpromising.
Trial registration: For the clinical dataset: Clinicaltrials.gov: NCT01257854. https://clinicaltrials.gov/ct2/history/NCT01257854.
Copyright: © 2023 Waespe et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.