LEA 3: Factor models in population genetics and ecological genomics with R

Mol Ecol Resour. 2021 Nov;21(8):2738-2748. doi: 10.1111/1755-0998.13366. Epub 2021 Mar 29.

Abstract

A major objective of evolutionary biology is to understand the processes by which organisms have adapted to various environments, and to predict the response of organisms to new or future conditions. The availability of large genomic and environmental data sets provides an opportunity to address those questions, and the R package LEA has been introduced to facilitate population and ecological genomic analyses in this context. By using latent factor models, the program computes ancestry coefficients from population genetic data and performs genotype-environment association analyses with correction for unobserved confounding variables. In this study, we present new functionalities of LEA, which include imputation of missing genotypes, fast algorithms for latent factor mixed models using multivariate predictors for genotype-environment association studies, population differentiation tests for admixed or continuous populations, and estimation of genetic offset based on climate models. The new functionalities are implemented in version 3.1 and higher releases of the package. Using simulated and real data sets, our study provides evaluations and examples of applications, outlining important practical considerations when analysing ecological genomic data in R.

Keywords: genotype-environment association tests; latent factor models; population structure; predictive ecological genomics; unsupervised machine learning.

MeSH terms

  • Adaptation, Physiological
  • Algorithms
  • Genetics, Population*
  • Genomics*
  • Genotype
  • Models, Genetic