Long non-coding RNAs (lncRNAs) are emerging as key gene regulators in the pathogenesis and development of various cancers including B lymphoblastic leukaemia (B-ALL). In this pilot study, we used RNA-Seq transcriptomic data for identifying novel lncRNA-mRNA cooperative pairs involved in childhood B-ALL pathogenesis. We conceived a bioinformatic pipeline based on unsupervised PCA feature extraction approach and stringent statistical criteria to extract potential childhood B-ALL lncRNA signatures. We then constructed a co-expression network of the aberrantly expressed lncRNAs (30) and protein-coding genes (754). We cross-validated our in-silico findings on an independent dataset and assessed the expression levels of the most differentially expressed lncRNAs and their co-expressed mRNAs through ex vivo experiments. Using the guilt-by-association approach, we predicted lncRNA functions based on their perfectly co-expressed mRNAs (Spearman's correlation) that resulted closely disease-associated. We shed light on 24 key lncRNAs and their co-expressed mRNAs which may play an important role in B-ALL pathogenesis. Our results may be of clinical utility for diagnostic and/or prognostic purposes in paediatric B-ALL management.
Keywords: NGS; RNA–Sequencing; bioinformatics; biomarker; co–expression; diagnostic; feature extraction; leukaemia; long non–coding RNA; network.