Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models

Justin Cosentino; Babak Behsaz; Babak Alipanahi; Zachary R McCaw; Davin Hill; Tae-Hwi Schwantes-An; Dongbing Lai; Andrew Carroll; Brian D Hobbs; Michael H Cho; Cory Y McLean; Farhad Hormozdiari

doi:10.1038/s41588-023-01372-4

Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models

Nat Genet. 2023 May;55(5):787-795. doi: 10.1038/s41588-023-01372-4. Epub 2023 Apr 17.

Authors

Justin Cosentino^#¹, Babak Behsaz^#², Babak Alipanahi^#³, Zachary R McCaw^#³, Davin Hill^{4

5}, Tae-Hwi Schwantes-An^{6

7}, Dongbing Lai⁶, Andrew Carroll³, Brian D Hobbs^{5

8

9}, Michael H Cho^{5

8

9}, Cory Y McLean², Farhad Hormozdiari¹⁰

Affiliations

¹ Google Health AI, Palo Alto, CA, USA. [email protected].
² Google Health AI, Cambridge, MA, USA.
³ Google Health AI, Palo Alto, CA, USA.
⁴ Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA.
⁵ Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.
⁶ Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA.
⁷ Division of Cardiology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA.
⁸ Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA.
⁹ Harvard Medical School, Boston, MA, USA.
¹⁰ Google Health AI, Cambridge, MA, USA. [email protected].

^# Contributed equally.

PMID: 37069358
DOI: 10.1038/s41588-023-01372-4

Abstract

Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify genetic signals. Here we train a deep convolutional neural network on noisy self-reported and International Classification of Diseases labels to predict COPD case-control status from high-dimensional raw spirograms and use the model's predictions as a liability score. The machine-learning-based (ML-based) liability score accurately discriminates COPD cases and controls, and predicts COPD-related hospitalization without any domain-specific knowledge. Moreover, the ML-based liability score is associated with overall survival and exacerbation events. A genome-wide association study on the ML-based liability score replicates existing COPD and lung function loci and also identifies 67 new loci. Lastly, our method provides a general framework to use ML methods and medical-record-based labels that does not require domain knowledge or expert curation to improve disease prediction and genomic discovery for drug design.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Deep Learning*
Genetic Loci
Genome-Wide Association Study / methods
Humans
Polymorphism, Single Nucleotide / genetics
Pulmonary Disease, Chronic Obstructive* / genetics

Abstract

Publication types

MeSH terms

Grants and funding