Somatic copy number alterations drive aberrant gene expression in cancer cells. In tumors with high levels of chromosomal instability, subclonal copy number alterations (CNAs) are a prevalent feature which often result in heterogeneous cancer cell populations with distinct phenotypes1. However, the extent to which subclonal CNAs contribute to clone-specific phenotypes remains poorly understood, in part due to the lack of methods to quantify how CNAs influence gene expression at a subclone level. We developed TreeAlign, which computationally integrates independently sampled single-cell DNA and RNA sequencing data from the same cell population and explicitly models gene dosage effects from subclonal alterations. We show through quantitative benchmarking data and application to human cancer data with single cell DNA and RNA libraries that TreeAlign accurately encodes clone-specific transcriptional effects of subclonal CNAs, the impact of allelic imbalance on allele-specific transcription, and obviates the need to arbitrarily define genotypic clones from a phylogenetic tree a priori. Combined, these advances lead to highly granular definitions of clones with distinct copy-number driven expression programs with increased resolution and accuracy over competing methods. The resulting improvement in assignment of transcriptional phenotypes to genomic clones enables clone-clone gene expression comparisons and explicit inference of genes that are mechanistically altered through CNAs, and identification of expression programs that are genomically independent. Our approach sets the stage for dissecting the relative contribution of fixed genomic alterations and dynamic epigenetic processes on gene expression programs in cancer.
Keywords: clonal phenotypes; genotype phenotype; single cell DNA; single cell RNA.