Walking through the statistical black boxes of plant breeding

Theor Appl Genet. 2016 Oct;129(10):1933-49. doi: 10.1007/s00122-016-2750-y. Epub 2016 Jul 19.

Abstract

The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Alleles
  • Genomics / methods
  • Linear Models
  • Models, Genetic
  • Normal Distribution
  • Phenotype
  • Plant Breeding / methods*
  • Plants / genetics*
  • Selection, Genetic
  • Statistics as Topic*