Predicting type 2 diabetes via machine learning integration of multiple omics from human pancreatic islets

Sci Rep. 2024 Jun 25;14(1):14637. doi: 10.1038/s41598-024-64846-3.

Abstract

Type 2 diabetes (T2D) is the fastest growing non-infectious disease worldwide. Impaired insulin secretion from pancreatic beta-cells is a hallmark of T2D, but the mechanisms behind this defect are insufficiently characterized. Integrating multiple layers of biomedical information, such as different Omics, may allow more accurate understanding of complex diseases such as T2D. Our aim was to explore and use Machine Learning to integrate multiple sources of biological/molecular information (multiOmics), in our case RNA-sequening, DNA methylation, SNP and phenotypic data from islet donors with T2D and non-diabetic controls. We exploited Machine Learning to perform multiOmics integration of DNA methylation, expression, SNPs, and phenotypes from pancreatic islets of 110 individuals, with ~ 30% being T2D cases. DNA methylation was analyzed using Infinium MethylationEPIC array, expression was analyzed using RNA-sequencing, and SNPs were analyzed using HumanOmniExpress arrays. Supervised linear multiOmics integration via DIABLO based on Partial Least Squares (PLS) achieved an accuracy of 91 ± 15% of T2D prediction with an area under the curve of 0.96 ± 0.08 on the test dataset after cross-validation. Biomarkers identified by this multiOmics integration, including SACS and TXNIP DNA methylation, OPRD1 and RHOT1 expression and a SNP annotated to ANO1, provide novel insights into the interplay between different biological mechanisms contributing to T2D. This Machine Learning approach of multiOmics cross-sectional data from human pancreatic islets achieved a promising accuracy of T2D prediction, which may potentially find broad applications in clinical diagnostics. In addition, it delivered novel candidate biomarkers for T2D and links between them across the different Omics.

Keywords: Beta-cell; DNA methylation; EWAS; Epigenetics; GWAS; Genetic variation; Insulin secretion; Machine learning; Metabolic disease; MultiOmics analysis; Omics integration; RNA-sequencing.

MeSH terms

  • Adult
  • Aged
  • Biomarkers
  • DNA Methylation*
  • Diabetes Mellitus, Type 2* / genetics
  • Diabetes Mellitus, Type 2* / metabolism
  • Female
  • Humans
  • Islets of Langerhans* / metabolism
  • Machine Learning*
  • Male
  • Middle Aged
  • Polymorphism, Single Nucleotide*

Substances

  • Biomarkers