Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data

Sci China Life Sci. 2020 May;63(5):750-763. doi: 10.1007/s11427-019-9551-7. Epub 2019 Jul 8.

Abstract

Pigs were domesticated independently in the Near East and China, indicating that a single reference genome from one individual is unable to represent the full spectrum of divergent sequences in pigs worldwide. Therefore, 12 de novo pig assemblies from Eurasia were compared in this study to identify the missing sequences from the reference genome. As a result, 72.5 Mb of non-redundant sequences (∼3% of the genome) were found to be absent from the reference genome (Sscrofa11.1) and were defined as pan-sequences. Of the pan-sequences, 9.0 Mb were dominant in Chinese pigs, in contrast with their low frequency in European pigs. One sequence dominant in Chinese pigs contained the complete genic region of the tazarotene-induced gene 3 (TIG3) gene which is involved in fatty acid metabolism. Using flanking sequences and Hi-C based methods, 27.7% of the sequences could be anchored to the reference genome. The supplementation of these sequences could contribute to the accurate interpretation of the 3D chromatin structure. A web-based pan-genome database was further provided to serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.

Keywords: 3D chromatin structure; pan-genome; pig; presence-absence variation; reference genome.

MeSH terms

  • Animals
  • Base Sequence
  • Chromatin / genetics*
  • Chromosome Mapping
  • Female
  • Genome / genetics*
  • High-Throughput Nucleotide Sequencing
  • Liver
  • Mutation
  • Sequence Alignment
  • Sequence Analysis, DNA*
  • Swine / genetics*

Substances

  • Chromatin