HIBLUP: an integration of statistical models on the BLUP framework for efficient genetic evaluation using big genomic data

Lilin Yin; Haohao Zhang; Zhenshuang Tang; Dong Yin; Yuhua Fu; Xiaohui Yuan; Xinyun Li; Xiaolei Liu; Shuhong Zhao

doi:10.1093/nar/gkad074

HIBLUP: an integration of statistical models on the BLUP framework for efficient genetic evaluation using big genomic data

Nucleic Acids Res. 2023 May 8;51(8):3501-3512. doi: 10.1093/nar/gkad074.

Authors

Lilin Yin^{1

2}, Haohao Zhang³, Zhenshuang Tang¹, Dong Yin¹, Yuhua Fu^{1

2}, Xiaohui Yuan³, Xinyun Li^{1

2}, Xiaolei Liu^{1

2

4}, Shuhong Zhao^{1

2

4}

Affiliations

¹ Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education & College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China.
² Frontiers Science Center for Animal Breeding and Sustainable Production, Wuhan 430070, PR China.
³ School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, PR China.
⁴ Hubei Hongshan Laboratory, Wuhan 430070, PR China.

Abstract

Human diseases and agricultural traits can be predicted by modeling a genetic random polygenic effect in linear mixed models. To estimate variance components and predict random effects of the model efficiently with limited computational resources has always been of primary concern, especially when it involves increasing the genotype data scale in the current genomic era. Here, we thoroughly reviewed the development history of statistical algorithms used in genetic evaluation and theoretically compared their computational complexity and applicability for different data scenarios. Most importantly, we presented a computationally efficient, functionally enriched, multi-platform and user-friendly software package named 'HIBLUP' to address the challenges that are faced currently using big genomic data. Powered by advanced algorithms, elaborate design and efficient programming, HIBLUP computed fastest while using the lowest memory in analyses, and the greater the number of individuals that are genotyped, the greater the computational benefits from HIBLUP. We also demonstrated that HIBLUP is the only tool which can accomplish the analyses for a UK Biobank-scale dataset within 1 h using the proposed efficient 'HE + PCG' strategy. It is foreseeable that HIBLUP will facilitate genetic research for human, plants and animals. The HIBLUP software and user manual can be accessed freely at https://www.hiblup.com.

Plain language summary

Both human diseases and agricultural traits can be predicted by incorporating phenotypic observations and a relationship matrix among individuals in a linear mixed model. Due to the great demand for processing massive data of genotyped individuals, the existing algorithms that require several repetitions of inverse computing on increasingly big dense matrices (e.g. the relationship matrix and the coefficient matrix of mixed model equations) have encountered a bottleneck. Here, we presented a software tool named ‘HIBLUP’ to address the challenges. Powered by our advanced algorithms (e.g. HE + PCG), elaborate design and efficient programming, HIBLUP can successfully avoid the inverse computing for any big matrix and compute fastest under the lowest memory, which makes it very promising for genetic evaluation using big genomic data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Genome
Genomics*
Genotype
Humans
Linear Models
Models, Genetic*