HIBLUP: an integration of statistical models on the BLUP framework for efficient genetic evaluation using big genomic data

Nucleic Acids Res. 2023 May 8;51(8):3501-3512. doi: 10.1093/nar/gkad074.

Abstract

Human diseases and agricultural traits can be predicted by modeling a genetic random polygenic effect in linear mixed models. To estimate variance components and predict random effects of the model efficiently with limited computational resources has always been of primary concern, especially when it involves increasing the genotype data scale in the current genomic era. Here, we thoroughly reviewed the development history of statistical algorithms used in genetic evaluation and theoretically compared their computational complexity and applicability for different data scenarios. Most importantly, we presented a computationally efficient, functionally enriched, multi-platform and user-friendly software package named 'HIBLUP' to address the challenges that are faced currently using big genomic data. Powered by advanced algorithms, elaborate design and efficient programming, HIBLUP computed fastest while using the lowest memory in analyses, and the greater the number of individuals that are genotyped, the greater the computational benefits from HIBLUP. We also demonstrated that HIBLUP is the only tool which can accomplish the analyses for a UK Biobank-scale dataset within 1 h using the proposed efficient 'HE + PCG' strategy. It is foreseeable that HIBLUP will facilitate genetic research for human, plants and animals. The HIBLUP software and user manual can be accessed freely at https://www.hiblup.com.

Plain language summary

Both human diseases and agricultural traits can be predicted by incorporating phenotypic observations and a relationship matrix among individuals in a linear mixed model. Due to the great demand for processing massive data of genotyped individuals, the existing algorithms that require several repetitions of inverse computing on increasingly big dense matrices (e.g. the relationship matrix and the coefficient matrix of mixed model equations) have encountered a bottleneck. Here, we presented a software tool named ‘HIBLUP’ to address the challenges. Powered by our advanced algorithms (e.g. HE + PCG), elaborate design and efficient programming, HIBLUP can successfully avoid the inverse computing for any big matrix and compute fastest under the lowest memory, which makes it very promising for genetic evaluation using big genomic data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Genome
  • Genomics*
  • Genotype
  • Humans
  • Linear Models
  • Models, Genetic*