Background: CD-1 is an outbred mouse stock that is frequently used in toxicology, pharmacology, and fundamental biomedical research. Although inbred strains are typically better suited for such studies due to minimal genetic variability, outbred stocks confer practical advantages over inbred strains, such as improved breeding performance and low cost. Knowledge of the full genetic variability of CD-1 would make it more useful in toxicology, pharmacology, and fundamental biomedical research.
Results: We performed deep genomic DNA sequencing of CD-1 mice and used the data to identify genome-wide SNPs, indels, and germline transposable elements relative to the mm10 reference genome. We used multiple genome-wide sequencing data types and previously published CD-1 SNPs to validate our called variants. We used the called variants to construct a strain-specific CD-1 reference genome, which we show can improve mappability and reduce experimental biases from genome-wide sequencing data derived from CD-1 mice. Based on previously published ChIP-seq and ATAC-seq data, we find evidence that genetic variation between CD-1 mice can lead to alterations in transcription factor binding. We also identified a number of variants in the coding region of genes which could have effects on translation of genes.
Conclusions: We have identified millions of previously unidentified CD-1 variants with the potential to confound studies involving CD-1. We used the identified variants to construct a CD-1-specific reference genome, which can improve accuracy and reduce bias when aligning genomics data derived from CD-1 mice.
Keywords: CD-1; Indels; Reference genome; SNPs; Transposable elements.
© 2023. BioMed Central Ltd., part of Springer Nature.