GUEST: an R package for handling estimation of graphical structure and multiclassification for error-prone gene expression data

Bioinformatics. 2024 Nov 28;40(12):btae731. doi: 10.1093/bioinformatics/btae731.

Abstract

Summary: In bioinformatics studies, understanding the network structure of gene expression variables is one of the main interests. In the framework of data science, graphical models have been widely used to characterize the dependence structure among multivariate random variables. However, the gene expression data possibly suffer from ultrahigh-dimensionality and measurement error, which make the detection of network structure challenging and difficult. The other important application of gene expression variables is to provide information to classify subjects into various tumors or diseases. In supervised learning, while linear discriminant analysis is a commonly used approach, the conventional implementation is limited in precisely measured variables and computation of their inverse covariance matrix, which is known as the precision matrix. To tackle those challenges and provide a reliable estimation procedure for public use, we develop the R package GUEST, which is known as Graphical models for Ultrahigh-dimensional and Error-prone data by the booSTing algorithm. This R package aims to deal with measurement error effects in high-dimensional variables under various distributions and then applies the boosting algorithm to identify the network structure and estimate the precision matrix. When the precision matrix is estimated, it can be used to construct the linear discriminant function and improve the accuracy of the classification.

Availability and implementation: The R package is available on https://cran.r-project.org/web/packages/GUEST/index.html.

MeSH terms

  • Algorithms*
  • Computational Biology* / methods
  • Discriminant Analysis
  • Gene Expression
  • Gene Expression Profiling / methods
  • Humans
  • Software*