Objectives: Lung cancer is continuously the leading cause of cancer related death, resulting from the lack of specific symptoms at early stage. A large-scale screening method may be the key point to find asymptomatic patients, leading to the reduction of mortality.
Methods: An alternative method combining breath test and a machine learning algorithm is proposed. 236 breath samples were analyzed by TD-GCMS. Breath profile of each sample is composed of 308 features extracted from chromatogram. Gradient boost decision trees algorithm was employed to recognize lung cancer patients. Bootstrap is performed to simulate real diagnostic practice, with which we evaluated the confidence of our methods.
Results: An accuracy of 85 % is shown in 6-fold cross validations. In statistical bootstrap, 72 % samples are marked as "confident", and the accuracy of confident samples is 93 % throughout the cross validations.
Conclusion: We have proposed such a non-invasive, accurate and confident method that might contribute to large-scale screening of lung cancer. As a consequence, more asymptomatic patients with early lung cancer may be detected.
Keywords: Bootstrap statistics; Exhaled breath analysis; Gradient boost decision trees algorithm; Lung cancer.
Copyright © 2021 Elsevier B.V. All rights reserved.