Background: Two genetic marker-based methods are compared for use in breed prediction, using a New Zealand sheep resource. The methods were a genomic selection (GS) method, using genomic BLUP, and a regression method (Regp) using the allele frequencies estimated from a subset of purebred animals. Four breed proportions, Romney, Coopworth, Perendale and Texel, were predicted, using Illumina OvineSNP50 genotypes.
Results: Both methods worked well with correlations of predicted proportions and recorded proportions ranging between 0.91 and 0.97 across methods and prediction breeds, except for the Regp method for Perendales, where the correlation was 0.85. The Regp method gives predictions that appear as a gradient (when viewed as the first few principal components of the genomic relatedness matrix), decreasing away from the breed centre. In contrast the GS method gives predictions dominated by the breeds of the closest relatives in the training set. Some Romneys appear close to the main Perendale group, which is why the Regp method worked less well for predicting Perendale proportion. The GS method works better than the Regp method when the breed groups do not form tight, distinct clusters, but is less robust to breed errors in the training set (for predicting relatives of those animals). Predictions were found to be similar to those obtained using STRUCTURE software, especially those using Regp. The methods appear to overpredict breed proportions in animals that are far removed from the training set. It is suggested that the training set should include animals spanning the range where predictions are made.
Conclusions: Breeds can be predicted using either of the two methods investigated. The choice of method will depend on the structure of the breeds in the population. The use of genomic selection methodology for breed prediction appears promising. As applied, it worked well for predicting proportions in animals that were predominantly of the breed types present in the training set, or to put it another way, that were in the range of genetic diversity represented by the training set. Therefore, it would be advisable that the training set covered the breed diversity of where predictions will be made.