Maize yield prediction with trait-missing data via bipartite graph neural network

Front Plant Sci. 2024 Oct 4:15:1433552. doi: 10.3389/fpls.2024.1433552. eCollection 2024.

Abstract

The timely and accurate prediction of maize (Zea mays L.) yields prior to harvest is critical for food security and agricultural policy development. Currently, many researchers are using machine learning and deep learning to predict maize yields in specific regions with high accuracy. However, existing methods typically have two limitations. One is that they ignore the extensive correlation in maize planting data, such as the association of maize yields between adjacent planting locations and the combined effect of meteorological features and maize traits on maize yields. The other issue is that the performance of existing models may suffer significantly when some data in maize planting records is missing, or the samples are unbalanced. Therefore, this paper proposes an end-to-end bipartite graph neural network-based model for trait data imputation and yield prediction. The maize planting data is initially converted to a bipartite graph data structure. Then, a yield prediction model based on a bipartite graph neural network is developed to impute missing trait data and predict maize yield. This model can mine correlations between different samples of data, correlations between different meteorological features and traits, and correlations between different traits. Finally, to address the issue of unbalanced sample size at each planting location, we propose a loss function based on the gradient balancing mechanism that effectively reduces the impact of data imbalance on the prediction model. When compared to other data imputation and prediction models, our method achieves the best yield prediction result even when missing data is not pre-processed.

Keywords: bipartite graph; data imputation; gradient harmonization; graph neural network; yield prediction.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. The current work is supported by Biological Breeding-Major Projects (2023ZD04076).