Characterization of a strain-specific CD-1 reference genome reveals potential inter- and intra-strain functional variability

Yoon Hee Jung; Hsiao-Lin V Wang; Samir Ali; Victor G Corces; Isaac Kremsky

doi:10.1186/s12864-023-09523-x

Characterization of a strain-specific CD-1 reference genome reveals potential inter- and intra-strain functional variability

BMC Genomics. 2023 Aug 3;24(1):437. doi: 10.1186/s12864-023-09523-x.

Authors

Yoon Hee Jung^#¹, Hsiao-Lin V Wang^#¹, Samir Ali², Victor G Corces¹, Isaac Kremsky^{3

4}

Affiliations

¹ Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA.
² Department of Basic Sciences, Loma Linda University School of Medicine, Loma Linda, CA, 92350, USA.
³ Department of Basic Sciences, Loma Linda University School of Medicine, Loma Linda, CA, 92350, USA. [email protected].
⁴ Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA. [email protected].

^# Contributed equally.

Abstract

Background: CD-1 is an outbred mouse stock that is frequently used in toxicology, pharmacology, and fundamental biomedical research. Although inbred strains are typically better suited for such studies due to minimal genetic variability, outbred stocks confer practical advantages over inbred strains, such as improved breeding performance and low cost. Knowledge of the full genetic variability of CD-1 would make it more useful in toxicology, pharmacology, and fundamental biomedical research.

Results: We performed deep genomic DNA sequencing of CD-1 mice and used the data to identify genome-wide SNPs, indels, and germline transposable elements relative to the mm10 reference genome. We used multiple genome-wide sequencing data types and previously published CD-1 SNPs to validate our called variants. We used the called variants to construct a strain-specific CD-1 reference genome, which we show can improve mappability and reduce experimental biases from genome-wide sequencing data derived from CD-1 mice. Based on previously published ChIP-seq and ATAC-seq data, we find evidence that genetic variation between CD-1 mice can lead to alterations in transcription factor binding. We also identified a number of variants in the coding region of genes which could have effects on translation of genes.

Conclusions: We have identified millions of previously unidentified CD-1 variants with the potential to confound studies involving CD-1. We used the identified variants to construct a CD-1-specific reference genome, which can improve accuracy and reduce bias when aligning genomics data derived from CD-1 mice.

Keywords: CD-1; Indels; Reference genome; SNPs; Transposable elements.

MeSH terms

Animals
Chromosome Mapping
Genome*
Genomics*
Mice
Polymorphism, Single Nucleotide
Protein Binding

Abstract

MeSH terms

Grants and funding