Aim: We propose a method for screening full blood count metadata for evidence of communicable and noncommunicable diseases using machine learning (ML).
Materials & methods: High dimensional hematology metadata was extracted over an 11-month period from Sysmex hematology analyzers from 43,761 patients. Predictive models for age, sex and individuality were developed to demonstrate the personalized nature of hematology data. Both numeric and raw flow cytometry data were used for both supervised and unsupervised ML to predict the presence of pneumonia, urinary tract infection and COVID-19. Heart failure was used as an objective to prove method generalizability.
Results: Chronological age was predicted by a deep neural network with R2: 0.59; mean absolute error: 12; sex with AUROC: 0.83, phi: 0.47; individuality with 99.7% accuracy, phi: 0.97; pneumonia with AUROC: 0.74, sensitivity 58%, specificity 79%, 95% CI: 0.73-0.75, p < 0.0001; urinary tract infection AUROC: 0.68, sensitivity 52%, specificity 79%, 95% CI: 0.67-0.68, p < 0.0001; COVID-19 AUROC: 0.8, sensitivity 82%, specificity 75%, 95% CI: 0.79-0.8, p = 0.0006; and heart failure area under the receiver operator curve (AUROC): 0.78, sensitivity 72%, specificity 72%, 95% CI: 0.77-0.78; p < 0.0001.
Conclusion: ML applied to hematology data could predict communicable and noncommunicable diseases, both at local and global levels.
Keywords: COVID-19; biological age; full blood count; heart failure; hematology; machine learning; pneumonia.
© 2021 The authors.