Genebanks are a rich source of genetic variation. Most of this variation is absent in breeding programs but may be useful for further crop plant improvement. However, the lack of phenotypic information forms a major obstacle for the educated choice of genebank accessions for research and breeding. A promising approach to fill this information gap is to exploit historical information gathered routinely during seed regeneration cycles. Still, this data is characterized by a high non-orthogonality hampering their analysis. By examining historical data records for flowering time, plant height, and thousand grain weight collected during 70 years of regeneration of 6,207 winter wheat (Triticum aestivum L.) accessions at the German Federal ex situ Genebank, we aimed to elaborate a strategy to analyze and validate non-orthogonal historical data in order to charge genebank information platforms with high quality ready-to-use phenotypic information. First, a three-step quality control assessment considering the plausibility of trait values and a standard as well as a weather parameter index based outlier detection was implemented, resulting in heritability estimates above 0.90 for all three traits. Then, the data was analyzed by estimating best linear unbiased estimations (BLUEs) applying a linear mixed-model approach. An in silico resampling study mimicking different missing data patterns revealed that accessions should be regenerated in a random fashion and not blocked by origin or acquisition date in order to minimize estimation biases in historical data sets. Validation data was obtained from multi-environmental orthogonal field trials considering a random subsample of 3,083 accessions. Correlations above 0.84 between BLUEs estimated for historical data and validation trials outperformed previous approaches and confirmed the robustness of our strategy as well as the high quality of the historical data. The results indicate that the IPK winter wheat collection reveals an extraordinary high phenotypic diversity compared to other collections. The quality checked ready-to-use phenotypic information resulting from this study is the first brick to extend traditional, conservation driven genebanks into bio-digital resource centers.
Keywords: bio-digital resource center; data quality assessment; genebank; genetic resources; historical data; winter wheat.