Background: Somatic mutational signatures elucidate molecular vulnerabilities to therapy, and therefore detecting signatures and classifying tumors with respect to signatures has clinical value. However, identifying the etiology of the mutational signatures remains a statistical challenge, with both small sample sizes and high variability in classification algorithms posing barriers. As a result, few signatures have been strongly linked to particular risk factors.
Methods: Here, we develop a statistical model, Diffsig, for estimating the association of one or more continuous or categorical risk factors with DNA mutational signatures. Diffsig takes into account the uncertainty associated with assigning signatures to samples as well as multiple risk factors' simultaneous effect on observed DNA mutations.
Results: We applied Diffsig to breast cancer data to assess relationships between five established breast-relevant mutational signatures and etiologic variables, confirming known mechanisms of cancer development. In simulation, our model was capable of accurately estimating expected associations in a variety of contexts.
Conclusions: Diffsig allows researchers to quantify and perform inference on the associations of risk factors with mutational signatures.
Impact: We expect Diffsig to provide more robust associations of risk factors with signatures to lead to better understanding of the tumor development process and improved models of tumorigenesis.
©2024 American Association for Cancer Research.