Search | arXiv e-print repository

doi 10.1016/j.nicl.2024.103668

VASARI-auto: equitable, efficient, and economical featurisation of glioma MRI

Authors: James K Ruffle, Samia Mohinta, Kelly Pegoretti Baruteau, Rebekah Rajiah, Faith Lee, Sebastian Brandner, Parashkev Nachev, Harpreet Hyare

Abstract: The VASARI MRI feature set is a quantitative system designed to standardise glioma imaging descriptions. Though effective, deriving VASARI is time-consuming and seldom used in clinical practice. This is a problem that machine learning could plausibly automate. Using glioma data from 1172 patients, we developed VASARI-auto, an automated labelling software applied to both open-source lesion masks an… ▽ More The VASARI MRI feature set is a quantitative system designed to standardise glioma imaging descriptions. Though effective, deriving VASARI is time-consuming and seldom used in clinical practice. This is a problem that machine learning could plausibly automate. Using glioma data from 1172 patients, we developed VASARI-auto, an automated labelling software applied to both open-source lesion masks and our openly available tumour segmentation model. In parallel, two consultant neuroradiologists independently quantified VASARI features in a subsample of 100 glioblastoma cases. We quantified: 1) agreement across neuroradiologists and VASARI-auto; 2) calibration of performance equity; 3) an economic workforce analysis; and 4) fidelity in predicting patient survival. Tumour segmentation was compatible with the current state of the art and equally performant regardless of age or sex. A modest inter-rater variability between in-house neuroradiologists was comparable to between neuroradiologists and VASARI-auto, with far higher agreement between VASARI-auto methods. The time taken for neuroradiologists to derive VASARI was substantially higher than VASARI-auto (mean time per case 317 vs. 3 seconds). A UK hospital workforce analysis forecast that three years of VASARI featurisation would demand 29,777 consultant neuroradiologist workforce hours (£1,574,935), reducible to 332 hours of computing time (and £146 of power) with VASARI-auto. The best-performing survival model utilised VASARI-auto features as opposed to those derived by neuroradiologists. VASARI-auto is a highly efficient automated labelling system with equitable performance across patient age or sex, a favourable economic profile if used as a decision support tool, and with non-inferior fidelity in downstream patient survival prediction. Future work should iterate upon and integrate such tools to enhance patient care. △ Less

Submitted 26 August, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 36 pages, 8 figures, 2 tables

arXiv:2310.16113 [pdf]

doi 10.1002/hbm.26795

Compressed representation of brain genetic transcription

Authors: James K Ruffle, Henry Watkins, Robert J Gray, Harpreet Hyare, Michel Thiebaut de Schotten, Parashkev Nachev

Abstract: The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use st… ▽ More The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain. △ Less

Submitted 20 June, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 22 pages, 5 main figures, 1 supplementary figure

arXiv:2309.07096 [pdf]

doi 10.1016/j.neuroimage.2024.120600

Computational limits to the legibility of the imaged human brain

Authors: James K Ruffle, Robert J Gray, Samia Mohinta, Guilherme Pombo, Chaitanya Kaul, Harpreet Hyare, Geraint Rees, Parashkev Nachev

Abstract: Our knowledge of the organisation of the human brain at the population-level is yet to translate into power to predict functional differences at the individual-level, limiting clinical applications, and casting doubt on the generalisability of inferred mechanisms. It remains unknown whether the difficulty arises from the absence of individuating biological patterns within the brain, or from limite… ▽ More Our knowledge of the organisation of the human brain at the population-level is yet to translate into power to predict functional differences at the individual-level, limiting clinical applications, and casting doubt on the generalisability of inferred mechanisms. It remains unknown whether the difficulty arises from the absence of individuating biological patterns within the brain, or from limited power to access them with the models and compute at our disposal. Here we comprehensively investigate the resolvability of such patterns with data and compute at unprecedented scale. Across 23 810 unique participants from UK Biobank, we systematically evaluate the predictability of 25 individual biological characteristics, from all available combinations of structural and functional neuroimaging data. Over 4526 GPU hours of computation, we train, optimize, and evaluate out-of-sample 700 individual predictive models, including fully-connected feed-forward neural networks of demographic, psychological, serological, chronic disease, and functional connectivity characteristics, and both uni- and multi-modal 3D convolutional neural network models of macro- and micro-structural brain imaging. We find a marked discrepancy between the high predictability of sex (balanced accuracy 99.7%), age (mean absolute error 2.048 years, R2 0.859), and weight (mean absolute error 2.609Kg, R2 0.625), for which we set new state-of-the-art performance, and the surprisingly low predictability of other characteristics. Neither structural nor functional imaging predicted psychology better than the coincidence of chronic disease (p<0.05). Serology predicted chronic disease (p<0.05) and was best predicted by it (p<0.001), followed by structural neuroimaging (p<0.05). Our findings suggest either more informative imaging or more powerful models are needed to decipher individual level characteristics from the human brain. △ Less

Submitted 2 April, 2024; v1 submitted 23 August, 2023; originally announced September 2023.

Comments: 38 pages, 6 figures, 1 table, 2 supplementary figures, 1 supplementary table

arXiv:2308.07039 [pdf]

The minimal computational substrate of fluid intelligence

Authors: Amy PK Nelson, Joe Mole, Guilherme Pombo, Robert J Gray, James K Ruffle, Edgar Chan, Geraint E Rees, Lisa Cipolotti, Parashkev Nachev

Abstract: The quantification of cognitive powers rests on identifying a behavioural task that depends on them. Such dependence cannot be assured, for the powers a task invokes cannot be experimentally controlled or constrained a priori, resulting in unknown vulnerability to failure of specificity and generalisability. Evaluating a compact version of Raven's Advanced Progressive Matrices (RAPM), a widely use… ▽ More The quantification of cognitive powers rests on identifying a behavioural task that depends on them. Such dependence cannot be assured, for the powers a task invokes cannot be experimentally controlled or constrained a priori, resulting in unknown vulnerability to failure of specificity and generalisability. Evaluating a compact version of Raven's Advanced Progressive Matrices (RAPM), a widely used clinical test of fluid intelligence, we show that LaMa, a self-supervised artificial neural network trained solely on the completion of partially masked images of natural environmental scenes, achieves human-level test scores a prima vista, without any task-specific inductive bias or training. Compared with cohorts of healthy and focally lesioned participants, LaMa exhibits human-like variation with item difficulty, and produces errors characteristic of right frontal lobe damage under degradation of its ability to integrate global spatial patterns. LaMa's narrow training and limited capacity -- comparable to the nervous system of the fruit fly -- suggest RAPM may be open to computationally simple solutions that need not necessarily invoke abstract reasoning. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 26 pages, 5 figures

arXiv:2207.12043 [pdf, other]

Representational Ethical Model Calibration

Authors: Robert Carruthers, Isabel Straw, James K Ruffle, Daniel Herron, Amy Nelson, Danilo Bzdok, Delmiro Fernandez-Reyes, Geraint Rees, Parashkev Nachev

Abstract: Equity is widely held to be fundamental to the ethics of healthcare. In the context of clinical decision-making, it rests on the comparative fidelity of the intelligence -- evidence-based or intuitive -- guiding the management of each individual patient. Though brought to recent attention by the individuating power of contemporary machine learning, such epistemic equity arises in the context of an… ▽ More Equity is widely held to be fundamental to the ethics of healthcare. In the context of clinical decision-making, it rests on the comparative fidelity of the intelligence -- evidence-based or intuitive -- guiding the management of each individual patient. Though brought to recent attention by the individuating power of contemporary machine learning, such epistemic equity arises in the context of any decision guidance, whether traditional or innovative. Yet no general framework for its quantification, let alone assurance, currently exists. Here we formulate epistemic equity in terms of model fidelity evaluated over learnt multi-dimensional representations of identity crafted to maximise the captured diversity of the population, introducing a comprehensive framework for Representational Ethical Model Calibration. We demonstrate use of the framework on large-scale multimodal data from UK Biobank to derive diverse representations of the population, quantify model performance, and institute responsive remediation. We offer our approach as a principled solution to quantifying and assuring epistemic equity in healthcare, with applications across the research, clinical, and regulatory domains. △ Less

Submitted 18 October, 2022; v1 submitted 25 July, 2022; originally announced July 2022.

Comments: 13 pages

arXiv:2206.06120 [pdf]

doi 10.1093/braincomms/fcad118

Brain tumour segmentation with incomplete imaging data

Authors: James K Ruffle, Samia Mohinta, Robert J Gray, Harpreet Hyare, Parashkev Nachev

Abstract: The complex heterogeneity of brain tumours is increasingly recognized to demand data of magnitudes and richness only fully-inclusive, large-scale collections drawn from routine clinical care could plausibly offer. This is a task contemporary machine learning could facilitate, especially in neuroimaging, but its ability to deal with incomplete data common in real world clinical practice remains unk… ▽ More The complex heterogeneity of brain tumours is increasingly recognized to demand data of magnitudes and richness only fully-inclusive, large-scale collections drawn from routine clinical care could plausibly offer. This is a task contemporary machine learning could facilitate, especially in neuroimaging, but its ability to deal with incomplete data common in real world clinical practice remains unknown. Here we apply state-of-the-art methods to large scale, multi-site MRI data to quantify the comparative fidelity of automated tumour segmentation models replicating the various levels of sequence availability observed in the clinical reality. We compare deep learning (nnU-Net-derived) segmentation models with all possible combinations of T1, contrast-enhanced T1, T2, and FLAIR sequences, trained and validated with five-fold cross-validation on the 2021 BraTS-RSNA glioma population of 1251 patients, with further testing on a real-world 50 patient sample diverse in not only MRI scanner and field strength, but a random selection of pre- and post-operative imaging also. Models trained on incomplete imaging data segmented lesions well, often equivalently to those trained on complete data, exhibiting Dice coefficients of 0.907 (single sequence) to 0.945 (full datasets) for whole tumours, and 0.701 (single sequence) to 0.891 (full datasets) for component tissue types. Incomplete data segmentation models could accurately detect enhancing tumour in the absence of contrast imaging, quantifying its volume with an R2 between 0.95-0.97, and were invariant to lesion morphometry. Deep learning segmentation models characterize tumours well when missing data and can even detect enhancing tissue without the use of contrast. This suggests translation to clinical practice, where incomplete data is common, may be easier than hitherto believed, and may be of value in reducing dependence on contrast use. △ Less

Submitted 22 February, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: 26 pages, 8 figures, 4 supplementary tables

arXiv:2110.08904 [pdf]

Deep forecasting of translational impact in medical research

Authors: Amy PK Nelson, Robert J Gray, James K Ruffle, Henry C Watkins, Daniel Herron, Nick Sorros, Danil Mikhailov, M. Jorge Cardoso, Sebastien Ourselin, Nick McNally, Bryan Williams, Geraint E. Rees, Parashkev Nachev

Abstract: The value of biomedical research--a $1.7 trillion annual investment--is ultimately determined by its downstream, real-world impact. Current objective predictors of impact rest on proxy, reductive metrics of dissemination, such as paper citation rates, whose relation to real-world translation remains unquantified. Here we sought to determine the comparative predictability of future real-world trans… ▽ More The value of biomedical research--a $1.7 trillion annual investment--is ultimately determined by its downstream, real-world impact. Current objective predictors of impact rest on proxy, reductive metrics of dissemination, such as paper citation rates, whose relation to real-world translation remains unquantified. Here we sought to determine the comparative predictability of future real-world translation--as indexed by inclusion in patents, guidelines or policy documents--from complex models of the abstract-level content of biomedical publications versus citations and publication meta-data alone. We develop a suite of representational and discriminative mathematical models of multi-scale publication data, quantifying predictive performance out-of-sample, ahead-of-time, across major biomedical domains, using the entire corpus of biomedical research captured by Microsoft Academic Graph from 1990 to 2019, encompassing 43.3 million papers across all domains. We show that citations are only moderately predictive of translational impact as judged by inclusion in patents, guidelines, or policy documents. By contrast, high-dimensional models of publication titles, abstracts and metadata exhibit high fidelity (AUROC > 0.9), generalise across time and thematic domain, and transfer to the task of recognising papers of Nobel Laureates. The translational impact of a paper indexed by inclusion in patents, guidelines, or policy documents can be predicted--out-of-sample and ahead-of-time--with substantially higher fidelity from complex models of its abstract-level content than from models of publication meta-data or citation metrics. We argue that content-based models of impact are superior in performance to conventional, citation-based measures, and sustain a stronger evidence-based claim to the objective measurement of translational potential. △ Less

Submitted 17 October, 2021; originally announced October 2021.

Comments: 28 pages, 6 figures

Showing 1–7 of 7 results for author: Ruffle, J K