A map of canine sequence variation relative to a Greenland wolf outgroup

Anthony K Nguyen; Peter Z Schall; Jeffrey M Kidd

doi:10.1007/s00335-024-10056-1

A map of canine sequence variation relative to a Greenland wolf outgroup

Mamm Genome. 2024 Dec;35(4):565-576. doi: 10.1007/s00335-024-10056-1. Epub 2024 Aug 1.

Authors

Anthony K Nguyen^#^{1

2}, Peter Z Schall^#¹, Jeffrey M Kidd^{3

4}

Affiliations

¹ Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
² Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
³ Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA. [email protected].
⁴ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. [email protected].

^# Contributed equally.

PMID: 39088040
DOI: 10.1007/s00335-024-10056-1

Abstract

For over 15 years, canine genetics research relied on a reference assembly from a Boxer breed dog named Tasha (i.e., canFam3.1). Recent advances in long-read sequencing and genome assembly have led to the development of numerous high-quality assemblies from diverse canines. These assemblies represent notable improvements in completeness, contiguity, and the representation of gene promoters and gene models. Although genome graph and pan-genome approaches have promise, most genetic analyses in canines rely upon the mapping of Illumina sequencing reads to a single reference. The Dog10K consortium, and others, have generated deep catalogs of genetic variation through an alignment of Illumina sequencing reads to a reference genome obtained from a German Shepherd Dog named Mischka (i.e., canFam4, UU_Cfam_GSD_1.0). However, alignment to a breed-derived genome may introduce bias in genotype calling across samples. Since the use of an outgroup reference genome may remove this effect, we have reprocessed 1929 samples analyzed by the Dog10K consortium using a Greenland wolf (mCanLor1.2) as the reference. We efficiently performed remapping and variant calling using a GPU-implementation of common analysis tools. The resulting call set removes the variability in genetic differences seen across samples and breed relationships revealed by principal component analysis are not affected by the choice of reference genome. Using this sequence data, we inferred the history of population sizes and found that village dog populations experienced a 9-13 fold reduction in historic effective population size relative to wolves.

MeSH terms

Animals
Chromosome Mapping
Dogs / genetics
Genetic Variation*
Genome* / genetics
Genotype
Greenland
High-Throughput Nucleotide Sequencing / methods
Polymorphism, Single Nucleotide
Wolves* / genetics