Motivation: Genomic biotechnology has rapidly advanced, allowing for the inference and modification of genetic and epigenetic information at the single-cell level. While these tools hold enormous potential for basic and clinical research, they also raise difficult issues of how to design studies to deploy them most effectively. In designing a genomic study, a modern researcher might combine many sequencing modalities and sampling protocols, each with different utility, costs, and other tradeoffs. This is especially relevant for studies of somatic variation, which may involve highly heterogeneous cell populations whose differences can be probed via an extensive set of biotechnological tools. Efficiently deploying genomic technologies in this space will require principled ways to create study designs that recover desired genomic information while minimizing various measures of cost.
Results: The central problem this paper attempts to address is how one might create an optimal study design for a genomic analysis, with particular focus on studies involving somatic variation that occur most often with application to cancer genomics. We pose the study design problem as a stochastic constrained nonlinear optimization problem. We introduce a Bayesian optimization framework that iteratively optimizes for an objective function using surrogate modeling combined with pattern and gradient search. We demonstrate our procedure on several test cases to derive resource and study design allocations optimized for various goals and criteria, demonstrating its ability to optimize study designs efficiently across diverse scenarios.
Availability and implementation: https://github.com/CMUSchwartzLab/StudyDesignOptimization.
© The Author(s) 2024. Published by Oxford University Press.