Model selection characteristics when using MCP-Mod for dose-response gene expression data

Biom J. 2022 Jun;64(5):883-897. doi: 10.1002/bimj.202000250. Epub 2022 Feb 20.

Abstract

We extend the scope of application for MCP-Mod (Multiple Comparison Procedure and Modeling) to in vitro gene expression data and assess its characteristics regarding model selection for concentration gene expression curves. Precisely, we apply MCP-Mod on single genes of a high-dimensional gene expression data set, where human embryonic stem cells were exposed to eight concentration levels of the compound valproic acid (VPA). As candidate models we consider the sigmoid Emax$E_{\max }$ (four-parameter log-logistic), linear, quadratic, Emax$E_{\max }$ , exponential, and beta model. Through simulations we investigate the impact of omitting one or more models from the candidate model set to uncover possibly superfluous models and to evaluate the precision and recall rates of selected models. Each model is selected according to Akaike information criterion (AIC) for a considerable number of genes. For less noisy cases the popular sigmoid Emax$E_{\max }$ model is frequently selected. For more noisy data, often simpler models like the linear model are selected, but mostly without relevant performance advantage compared to the second best model. Also, the commonly used standard Emax$E_{\max }$ model has an unexpected low performance.

Keywords: MCP-mod; dose-response curves; gene expression; model selection; toxicology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression
  • Humans
  • Linear Models*