Identifying protein biomarkers for chronic obstructive pulmonary disease (COPD) has been challenging. Most previous studies have used individual proteins or preselected protein panels measured in blood samples. Mass spectrometry proteomic studies of lung tissue have been based on small sample sizes. We used mass spectrometry proteomic approaches to discover protein biomarkers from 150 lung tissue samples representing COPD cases and controls. Top COPD-associated proteins were identified based on multiple linear regression analysis with false discovery rate (FDR) < 0.05. Correlations between pairs of COPD-associated proteins were examined. Machine learning models were also evaluated to identify potential combinations of protein biomarkers related to COPD. We identified 4,407 proteins passing quality controls. Twenty-five proteins were significantly associated with COPD at FDR < 0.05, including interleukin 33, ferritin (light chain and heavy chain), and two proteins related to caveolae (CAV1 and CAVIN1). Multiple previously reported plasma protein biomarkers for COPD were not significantly associated with proteomic analysis of COPD in lung tissue, although RAGE was borderline significant. Eleven pairs of top significant proteins were highly correlated (r > 0.8), including several strongly correlated with RAGE (EHD2 and CAVIN1). Machine learning models using Random Forests with the top 5% of protein biomarkers demonstrated reasonable accuracy (0.707) and area under the curve (0.714) for COPD prediction. Mass spectrometry-based proteomic analysis of lung tissue is a promising approach for the identification of biomarkers for COPD.
Keywords: biomarkers; chronic obstructive pulmonary disease; machine learning; mass spectrometry; proteomics.