Using the Pearson's correlation coefficient as the sole metric to measure the accuracy of quantitative trait prediction: is it sufficient?

Shouhui Pan; Zhongqiang Liu; Yanyun Han; Dongfeng Zhang; Xiangyu Zhao; Jinlong Li; Kaiyi Wang

doi:10.3389/fpls.2024.1480463

Using the Pearson's correlation coefficient as the sole metric to measure the accuracy of quantitative trait prediction: is it sufficient?

Front Plant Sci. 2024 Dec 10:15:1480463. doi: 10.3389/fpls.2024.1480463. eCollection 2024.

Authors

Shouhui Pan^{1

2}, Zhongqiang Liu^{1

2}, Yanyun Han^{1

2}, Dongfeng Zhang^{1

2}, Xiangyu Zhao^{1

2}, Jinlong Li^{1

2}, Kaiyi Wang^{1

2}

Affiliations

¹ Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China.
² National Engineering Research Center for Information Technology in Agriculture, Beijing, China.

Abstract

How to evaluate the accuracy of quantitative trait prediction is crucial to choose the best model among several possible choices in plant breeding. Pearson's correlation coefficient (PCC), serving as a metric for quantifying the strength of the linear association between two variables, is widely used to evaluate the accuracy of the quantitative trait prediction models, and generally performs well in most circumstances. However, PCC may not always offer a comprehensive view of predictive accuracy, especially in cases involving nonlinear relationships or complex dependencies in machine learning-based methods. It has been found that many papers on quantitative trait prediction solely use PCC as a single metric to evaluate the accuracy of their models, which is insufficient and limited from a formal perspective. This study addresses this crucial issue by presenting a typical example and conducting a comparative analysis of PCC and nine other evaluation metrics using four traditional methods and four machine learning-based methods, thereby contributing to the improvement of practical applicability and reliability of plant quantitative trait prediction models. It is recommended to employ PCC in conjunction with other evaluation metrics in a targeted manner based on specific application scenarios to reduce the likelihood of drawing misleading conclusions.

Keywords: Pearson’s correlation coefficient; evaluation metric; genomic selection; quantitative trait prediction; regression prediction.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the National Science and Technology Major Project (No. 2022ZD0115703).