High-throughput next-generation sequencing technologies have led to a rapid increase in the number of sequence variants identified in clinical practice via diagnostic genetic tests. Current bioinformatic analysis pipelines fail to take adequate account of the possible splicing effects of such variants, particularly where variants fall outwith canonical splice site sequences, and consequently the pathogenicity of such variants may often be missed. The regulation of splicing is highly complex and as a result, in silico prediction tools lack sufficient sensitivity and specificity for reliable use. Variants of all kinds can be linked to aberrant splicing in disease and the need for correct identification and diagnosis grows ever more crucial as novel splice-switching antisense oligonucleotide therapies start to enter clinical usage. RT-PCR provides a useful targeted assay of the splicing effects of identified variants, while minigene assays, massive parallel reporter assays and animal models can also be used for more detailed study of a particular splicing system, given enough time and resources. However, RNA-sequencing (RNA-seq) has the potential to be used as a rapid diagnostic tool in genomic medicine. By utilising data science approaches and machine learning, it may prove possible to finally understand and interpret the 'splicing code' and apply this knowledge in human disease diagnostics.
Keywords: Clinical diagnosis; Machine learning; RNA-sequencing; Sequence variants; Splicing.
Copyright © 2018 Elsevier Ltd. All rights reserved.