Modeling Unobserved Heterogeneity in Susceptibility to Ambient Benzo[a]pyrene Concentration among Children with Allergic Asthma Using an Unsupervised Learning Algorithm

Daniel Fernández; Radim J Sram; Miroslav Dostal; Anna Pastorkova; Hans Gmuender; Hyunok Choi

doi:10.3390/ijerph15010106

Modeling Unobserved Heterogeneity in Susceptibility to Ambient Benzo[a]pyrene Concentration among Children with Allergic Asthma Using an Unsupervised Learning Algorithm

Int J Environ Res Public Health. 2018 Jan 10;15(1):106. doi: 10.3390/ijerph15010106.

Authors

Daniel Fernández^{1

2}, Radim J Sram³, Miroslav Dostal⁴, Anna Pastorkova⁵, Hans Gmuender⁶, Hyunok Choi⁷

Affiliations

¹ Research and Development Unit, Parc Sanitari Sant Joan de Déu, Fundació Sant Joan de Déu, CIBERSAM, Dr. Antoni Pujadas, 42, Sant Boi de Llobregat, 08830 Barcelona, Spain. [email protected].
² School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand. [email protected].
³ Department of Genetic Ecotoxicology, Institute of Experimental Medicine, Academy of Sciences of the Czech Republic, v.v.i., Vídeňská 1083, 142 20 Prague 4, Czech Republic. [email protected].
⁴ Department of Genetic Ecotoxicology, Institute of Experimental Medicine, Academy of Sciences of the Czech Republic, v.v.i., Vídeňská 1083, 142 20 Prague 4, Czech Republic. [email protected].
⁵ Department of Genetic Ecotoxicology, Institute of Experimental Medicine, Academy of Sciences of the Czech Republic, v.v.i., Vídeňská 1083, 142 20 Prague 4, Czech Republic. [email protected].
⁶ Genedata AG, Margarethenstrasse 38, CH-4053 Basel, Switzerland. [email protected].
⁷ Departments of Environmental Health Sciences, Epidemiology, and Biostatistics State University of New York at Albany School of Public Health, Rensselaer, NY 12144, USA. [email protected].

Abstract

Current studies of gene × air pollution interaction typically seek to identify unknown heritability of common complex illnesses arising from variability in the host's susceptibility to environmental pollutants of interest. Accordingly, a single component generalized linear models are often used to model the risk posed by an environmental exposure variable of interest in relation to a priori determined DNA variants. However, reducing the phenotypic heterogeneity may further optimize such approach, primarily represented by the modeled DNA variants. Here, we reduce phenotypic heterogeneity of asthma severity, and also identify single nucleotide polymorphisms (SNP) associated with phenotype subgroups. Specifically, we first apply an unsupervised learning algorithm method and a non-parametric regression to find a biclustering structure of children according to their allergy and asthma severity. We then identify a set of SNPs most closely correlated with each sub-group. We subsequently fit a logistic regression model for each group against the healthy controls using benzo[a]pyrene (B[a]P) as a representative airborne carcinogen. Application of such approach in a case-control data set shows that SNP clustering may help to partly explain heterogeneity in children's asthma susceptibility in relation to ambient B[a]P concentration with greater efficiency.

Keywords: air pollution; asthma; gene-environment interaction; polycyclic aromatic hydrocarbon; single nucleotide polymorphism.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Air Pollutants / toxicity
Air Pollution / adverse effects
Algorithms
Asthma / chemically induced*
Asthma / genetics*
Benzo(a)pyrene / toxicity*
Case-Control Studies
Child
Environmental Exposure / adverse effects
Female
Gene-Environment Interaction
Genetic Predisposition to Disease*
Humans
Male
Multifactorial Inheritance*
Polymorphism, Single Nucleotide
Statistics as Topic
Unsupervised Machine Learning

Substances

Air Pollutants
Benzo(a)pyrene