Statistical Approach for Improving Genomic Prediction Accuracy through Efficient Diagnostic Measure of Influential Observation

Sci Rep. 2020 May 21;10(1):8408. doi: 10.1038/s41598-020-65323-3.

Abstract

It is expected the predictive performance of genomic prediction methods may be adversely affected in the presence of outliers. In agriculture science an outlier may arise due to wrong data imputation, outlying response, and in a series of trials over the time or location. Although several statistical procedures are already there in literature for identification of outlier but identification of true outlier is still a challenge especially in case of high dimensional genomic data. Here we have proposed an efficient approach for detecting outlier in high dimensional genomic data, our approach is p-value based combination methods to produce single p-value for detecting the outliers. Robustness of our approach has been tested using simulated data through the evaluation measures like precision, recall etc. It has been observed that significant improvement in the performance of genomic prediction has been obtained by detecting the outliers and handling them accordingly through our proposed approach using real data.

MeSH terms

  • Bayes Theorem
  • Gene Frequency
  • Genetic Markers
  • Genomics / methods*
  • Genomics / statistics & numerical data*
  • Models, Genetic*
  • Plant Breeding / methods*
  • Plant Breeding / statistics & numerical data
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci
  • Selection, Genetic
  • Triticum / genetics
  • Zea mays / genetics

Substances

  • Genetic Markers