Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms

Front Plant Sci. 2024 Feb 20:15:1339864. doi: 10.3389/fpls.2024.1339864. eCollection 2024.

Abstract

Peanut is a critical food crop worldwide, and the development of high-throughput phenotyping techniques is essential for enhancing the crop's genetic gain rate. Given the obvious challenges of directly estimating peanut yields through remote sensing, an approach that utilizes above-ground phenotypes to estimate underground yield is necessary. To that end, this study leveraged unmanned aerial vehicles (UAVs) for high-throughput phenotyping of surface traits in peanut. Using a diverse set of peanut germplasm planted in 2021 and 2022, UAV flight missions were repeatedly conducted to capture image data that were used to construct high-resolution multitemporal sigmoidal growth curves based on apparent characteristics, such as canopy cover and canopy height. Latent phenotypes extracted from these growth curves and their first derivatives informed the development of advanced machine learning models, specifically random forest and eXtreme Gradient Boosting (XGBoost), to estimate yield in the peanut plots. The random forest model exhibited exceptional predictive accuracy (R2 = 0.93), while XGBoost was also reasonably effective (R2 = 0.88). When using confusion matrices to evaluate the classification abilities of each model, the two models proved valuable in a breeding pipeline, particularly for filtering out underperforming genotypes. In addition, the random forest model excelled in identifying top-performing material while minimizing Type I and Type II errors. Overall, these findings underscore the potential of machine learning models, especially random forests and XGBoost, in predicting peanut yield and improving the efficiency of peanut breeding programs.

Keywords: artificial intelligence; crop yield; growth curves; machine learning; peanut; plant breeding; remote sensing; unmanned aerial vehicle.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. The funding for this research was partly received from New Mexico Peanut Research Board, National Peanut Research Board, and was supported [in part] by the intramural research program of the U.S. Department of Agriculture, National Institute of Food and Agriculture, Hatch Capacity funds.