Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder

Mateusz Garbulowski; Karolina Smolinska; Klev Diamanti; Gang Pan; Khurram Maqbool; Lars Feuk; Jan Komorowski

doi:10.3389/fgene.2021.618277

Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder

Front Genet. 2021 Feb 25:12:618277. doi: 10.3389/fgene.2021.618277. eCollection 2021.

Authors

Mateusz Garbulowski¹, Karolina Smolinska¹, Klev Diamanti², Gang Pan², Khurram Maqbool², Lars Feuk², Jan Komorowski^{1

3

4

5}

Affiliations

¹ Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.
² Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.
³ Swedish Collegium for Advanced Study, Uppsala, Sweden.
⁴ Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.
⁵ Washington National Primate Research Center, Seattle, WA, United States.

Abstract

Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models that allow explaining biological mechanisms and support analysis of clinical subgroups. In this work, we investigated several case-control studies of gene expression measurements of ASD individuals. We constructed a rule-based learning model from three independent datasets that we further visualized as a nonlinear gene-gene co-predictive network. To find dissimilarities between ASD subtypes, we scrutinized a topological structure of the network and estimated a centrality distance. Our analysis revealed that autism is the most severe subtype of ASD, while pervasive developmental disorder-not otherwise specified and Asperger syndrome are closely related and milder ASD subtypes. Furthermore, we analyzed the most important ASD-related features that were described in terms of gene co-predictors. Among others, we found a strong co-predictive mechanism between EMC4 and TMEM30A, which may suggest a co-regulation between these genes. The present study demonstrates the potential of applying interpretable machine learning in bioinformatics analyses. Although the proposed methodology was designed for transcriptomics data, it can be applied to other omics disciplines.

Keywords: autism spectrum disorder; autism spectrum disorder subtypes; data integration; gene expression; interpretable machine learning; rule-based classification; transcriptomics.