Machine learning algorithms translate big data into predictive breeding accuracy

Trends Plant Sci. 2024 Oct 26:S1360-1385(24)00259-0. doi: 10.1016/j.tplants.2024.09.011. Online ahead of print.

Abstract

Statistical machine learning (ML) extracts patterns from extensive genomic, phenotypic, and environmental data. ML algorithms automatically identify relevant features and use cross-validation to ensure robust models and improve prediction reliability in new lines. Furthermore, ML analyses of genotype-by-environment (G×E) interactions can offer insights into the genetic factors that affect performance in specific environments. By leveraging historical breeding data, ML streamlines strategies and automates analyses to reveal genomic patterns. In this review we examine the transformative impact of big data, including multi-trait genomics, phenomics, and environmental covariables, on genomic-enabled prediction in plant breeding. We discuss how big data and ML are revolutionizing the field by enhancing prediction accuracy, deepening our understanding of G×E interactions, and optimizing breeding strategies through the analysis of extensive and diverse datasets.

Keywords: big genomics; climate change; environmental data; genomic prediction; modern breeding programs; phenomics; statistical machine learning.

Publication types

  • Review