Somatic copy number variations (CNVs) exist in the brain, but their genesis, prevalence, forms, and biological impact remain unclear, even within experimentally tractable animal models. We combined a transposase-based amplification (TbA) methodology for single-cell whole-genome sequencing with a bioinformatic approach for filtering unreliable CNVs (FUnC), developed from machine learning trained on lymphocyte V(D)J recombination. TbA-FUnC offered superior genomic coverage and removed >90% of false-positive CNV calls, allowing extensive examination of submegabase CNVs from over 500 cells throughout the neurogenic period of cerebral cortical development in Mus musculus Thousands of previously undocumented CNVs were identified. Half were less than 1 Mb in size, with deletions 4× more common than amplification events, and were randomly distributed throughout the genome. However, CNV prevalence during embryonic cortical development was nonrandom, peaking at midneurogenesis with levels triple those found at younger ages before falling to intermediate quantities. These data identify pervasive small and large CNVs as early contributors to neural genomic mosaicism, producing genomically diverse cellular building blocks that form the highly organized, mature brain.
Keywords: CNV; brain development; genomic mosaicism; machine learning; single-cell sequencing.
Copyright © 2018 the Author(s). Published by PNAS.