Statistical analysis of a low cost method for multiple disease prediction

Mohsen Bayati; Sonia Bhaskar; Andrea Montanari

doi:10.1177/0962280216680242

Statistical analysis of a low cost method for multiple disease prediction

Stat Methods Med Res. 2018 Aug;27(8):2312-2328. doi: 10.1177/0962280216680242. Epub 2016 Dec 8.

Authors

Mohsen Bayati^{1

2}, Sonia Bhaskar², Andrea Montanari^{2

3}

Affiliations

¹ 1 Graduate School of Business, Stanford University, Stanford, USA.
² 2 Department of Electrical Engineering, Stanford University, Stanford, USA.
³ 3 Department of Statistics, Stanford University, Stanford, USA.

PMID: 27932665
DOI: 10.1177/0962280216680242

Abstract

Early identification of individuals at risk for chronic diseases is of significant clinical value. Early detection provides the opportunity to slow the pace of a condition, and thus help individuals to improve or maintain their quality of life. Additionally, it can lessen the financial burden on health insurers and self-insured employers. As a solution to mitigate the rise in chronic conditions and related costs, an increasing number of employers have recently begun using wellness programs, which typically involve an annual health risk assessment. Unfortunately, these risk assessments have low detection capability, as they should be low-cost and hence rely on collecting relatively few basic biomarkers. Thus one may ask, how can we select a low-cost set of biomarkers that would be the most predictive of multiple chronic diseases? In this paper, we propose a statistical data-driven method to address this challenge by minimizing the number of biomarkers in the screening procedure while maximizing the predictive power over a broad spectrum of diseases. Our solution uses multi-task learning and group dimensionality reduction from machine learning and statistics. We provide empirical validation of the proposed solution using data from two different electronic medical records systems, with comparisons over a statistical benchmark.

Keywords: Clinical disease prediction; feature selection; group regularization; multitask learning.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Biomarkers
Chronic Disease / trends*
Costs and Cost Analysis
Forecasting / methods
Logistic Models
Models, Statistical*

Substances

Biomarkers