Identifying dynamical persistent biomarker structures for rare events using modern integrative machine learning approach

Sreejata Dutta; Andrew C Box; Yanming Li; Mihaela E Sardiu

doi:10.1002/pmic.202200290

Identifying dynamical persistent biomarker structures for rare events using modern integrative machine learning approach

Proteomics. 2023 Nov;23(21-22):e2200290. doi: 10.1002/pmic.202200290. Epub 2023 Mar 10.

Authors

Sreejata Dutta¹, Andrew C Box², Yanming Li^{1

3}, Mihaela E Sardiu^{1

3

4}

Affiliations

¹ Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
² Stowers Institute for Medical Research, Kansas City, Missouri, USA.
³ University of Kansas Cancer Center, Kansas City, Kansas, USA.
⁴ Kansas Institute for Precision Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA.

Abstract

The evolution of omics and computational competency has accelerated discoveries of the underlying biological processes in an unprecedented way. High throughput methodologies, such as flow cytometry, can reveal deeper insights into cell processes, thereby allowing opportunities for scientific discoveries related to health and diseases. However, working with cytometry data often imposes complex computational challenges due to high-dimensionality, large size, and nonlinearity of the data structure. In addition, cytometry data frequently exhibit diverse patterns across biomarkers and suffer from substantial class imbalances which can further complicate the problem. The existing methods of cytometry data analysis either predict cell population or perform feature selection. Through this study, we propose a "wisdom of the crowd" approach to simultaneously predict rare cell populations and perform feature selection by integrating a pool of modern machine learning (ML) algorithms. Given that our approach integrates superior performing ML models across different normalization techniques based on entropy and rank, our method can detect diverse patterns existing across the model features. Furthermore, the method identifies a dynamic biomarker structure that divides the features into persistently selected, unselected, and fluctuating assemblies indicating the role of each biomarker in rare cell prediction, which can subsequently aid in studies of disease progression.

Keywords: biomarker importance; entropy; feature selection; machine learning; rare events prediction.

Identifying dynamical persistent biomarker structures for rare events using modern integrative machine learning approach

Authors

Affiliations

Abstract

MeSH terms

Substances

Grants and funding