Combining bioinformatics and machine learning to identify diagnostic biomarkers of TB associated with immune cell infiltration

Tuberculosis (Edinb). 2024 Oct 11:149:102570. doi: 10.1016/j.tube.2024.102570. Online ahead of print.

Abstract

Objective: The asymptomatic nature of tuberculosis (TB) during its latent phase, combined with limitations in current diagnostic methods, makes accurate diagnosis challenging. This study aims to identify TB diagnostic biomarkers by integrating gene expression screening with machine learning, evaluating their diagnostic potential and correlation with immune cell infiltration.

Methods: We analyzed GSE19435, GSE19444, and GSE54992 datasets to identify differentially expressed genes (DEGs). GO and KEGG enrichment characterized gene functions. Three machine learning algorithms identified potential biomarkers, validated with GSE83456, GSE62525, and RT-qPCR on clinical samples. Immune cell infiltration was analyzed and verified with blood data.

Results: 249 DEGs were identified, with PDE7A and DOK3 emerging as potential biomarkers. RT-qPCR confirmed their expression, showing AUCs above 0.75 and a combined AUC of 0.926 for TB diagnosis. Immune infiltration analysis revealed strong correlations between PDE7A, DOK3, and immune cells.

Conclusion: PDE7A and DOK3 show strong diagnostic potential for TB, closely linked to immune cell infiltration, and may serve as promising biomarkers and therapeutic targets.

Keywords: Diagnostic biomarkers; Immune cell infiltration; Machine learning; Tuberculosis.