DSRIG: Incorporating graphical structure in the regularized modeling of SNP data

J Bioinform Comput Biol. 2019 Jun;17(3):1950017. doi: 10.1142/S0219720019500173.

Abstract

Genetic selection of farm animals plays an important role in genetic improvement programs. Regularized regression methods on single nucleotide polymorphism (SNP) data from a set of candidate genes can help to identify genes that are associated with the trait of interest. This complex task must also consider the relative effect sizes on the desired trait and account for the relationships among the candidate SNPs so that selection of a SNP does not promote other undesirable traits through breeding. We present the Doubly Sparse Regression Incorporating Graphical structure (DSRIG), a novel regularized method for genetic selection that exploits the relationships among candidate SNPs to improve prediction. DSRIG was applied in the prediction of skatole and androstenone levels, two compounds known to be associated with boar taint. DSRIG was shown to provide a predictive benefit when compared to ordinary least squares (OLS) and the least absolute shrinkage and selection operator (LASSO) in a cross-validation procedure. The relative sizes of the coefficient estimates over the cross-validation procedure were compared to determine which SNPs may have the greatest impact on expression of the boar taint compounds and a consensus graph was used to infer the relationships among SNPs.

Keywords: Regression; structured prediction; undirected graphical models.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Androsterone / genetics
  • Animals
  • Breeding / methods*
  • Computer Graphics*
  • Male
  • Models, Genetic*
  • Polymorphism, Single Nucleotide*
  • Reproducibility of Results
  • Selection, Genetic
  • Skatole
  • Swine / genetics*
  • Swine / physiology

Substances

  • Skatole
  • Androsterone