Integration of eQTL and machine learning to dissect causal genes with pleiotropic effects in genetic regulation networks of seed cotton yield

Cell Rep. 2023 Sep 26;42(9):113111. doi: 10.1016/j.celrep.2023.113111. Epub 2023 Sep 6.

Abstract

The dissection of a gene regulatory network (GRN) that complements the genome-wide association study (GWAS) locus and the crosstalk underlying multiple agronomical traits remains a major challenge. In this study, we generate 558 transcriptional profiles of lint-bearing ovules at one day post-anthesis from a selective core cotton germplasm, from which 12,207 expression quantitative trait loci (eQTLs) are identified. Sixty-six known phenotypic GWAS loci are colocalized with 1,090 eQTLs, forming 38 functional GRNs associated predominantly with seed yield. Of the eGenes, 34 exhibit pleiotropic effects. Combining the eQTLs within the seed yield GRNs significantly increases the portion of narrow-sense heritability. The extreme gradient boosting (XGBoost) machine learning approach is applied to predict seed cotton yield phenotypes on the basis of gene expression. Top-ranking eGenes (NF-YB3, FLA2, and GRDP1) derived with pleiotropic effects on yield traits are validated, along with their potential roles by correlation analysis, domestication selection analysis, and transgenic plants.

Keywords: CP: Genomics; CP: Plants; GWAS; XGBoost; cotton; eQTL; machine learning; seed size; yield.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Regulatory Networks*
  • Genome-Wide Association Study*
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci / genetics