A Graph-based Goat Pangenome Reveals Structural Variations Involved in Domestication and Adaptation

Mol Biol Evol. 2024 Dec 6;41(12):msae251. doi: 10.1093/molbev/msae251.

Abstract

Pangenomes can facilitate a deeper understanding of genome complexity. Using de novo phased long-read assemblies of eight representative goat breeds, we constructed a graph-based pangenome of goats (Capra hircus) and discovered 113-Mb autosomal novel sequences. Combining this multi-assembly pangenome with low-coverage PacBio HiFi sequences, we constructed a long-read structural variations (SVs) database containing 59,325 SV deletions, 84,910 SV insertions, and 24,954 other complex SV alleles. This resource allowed reliable graph-based genotyping from short reads of 79 wild and 1,148 worldwide domestic goats. Selection signal analysis of SV captured a novel immune-related domestication locus containing the galectin-9 gene and extra copies of the ruminant-specific galectin-9-like genes (LGALS9L), which have high tissue specificity. A segmental duplication in domestic goats generates three additional LGALS9L copies. Ancient goat genome sequences show a gradual increase in frequency of this duplication from the Neolithic to the present. Two other newly detected SVs also have higher selection signals than adjacent SNPs, a truncated-LINE1 deletion in EDAR2 associated with cashmere production and a VNTR-related insertion in PAPSS2 linked to high-altitude adaptation. In summary, the multi-assembly goat pangenome and long-read SV database facilitates detecting complex variations that are important in evolution and selection.

Keywords: de novo assembly; domestication; graph-based pangenome; low-coverage sequencing; structural variations.

MeSH terms

  • Adaptation, Physiological / genetics
  • Animals
  • Domestication*
  • Galectins / genetics
  • Genome*
  • Genomic Structural Variation
  • Goats* / genetics
  • Selection, Genetic

Substances

  • Galectins