Inferring sparse structure in genotype-phenotype maps

Samantha Petti; Gautam Reddy; Michael M Desai

doi:10.1093/genetics/iyad127

Inferring sparse structure in genotype-phenotype maps

Genetics. 2023 Aug 31;225(1):iyad127. doi: 10.1093/genetics/iyad127.

Authors

Samantha Petti¹, Gautam Reddy^{1

2

3}, Michael M Desai⁴

Affiliations

¹ NSF-Simons Center for the Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, MA 02138, USA.
² Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA 94085, USA.
³ Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.
⁴ Department of Organismic and Evolutionary Biology and Department of Physics, Harvard University, Cambridge, MA 02138, USA.

Abstract

Correlation among multiple phenotypes across related individuals may reflect some pattern of shared genetic architecture: individual genetic loci affect multiple phenotypes (an effect known as pleiotropy), creating observable relationships between phenotypes. A natural hypothesis is that pleiotropic effects reflect a relatively small set of common "core" cellular processes: each genetic locus affects one or a few core processes, and these core processes in turn determine the observed phenotypes. Here, we propose a method to infer such structure in genotype-phenotype data. Our approach, sparse structure discovery (SSD) is based on a penalized matrix decomposition designed to identify latent structure that is low-dimensional (many fewer core processes than phenotypes and genetic loci), locus-sparse (each locus affects few core processes), and/or phenotype-sparse (each phenotype is influenced by few core processes). Our use of sparsity as a guide in the matrix decomposition is motivated by the results of a novel empirical test indicating evidence of sparse structure in several recent genotype-phenotype datasets. First, we use synthetic data to show that our SSD approach can accurately recover core processes if each genetic locus affects few core processes or if each phenotype is affected by few core processes. Next, we apply the method to three datasets spanning adaptive mutations in yeast, genotoxin robustness assay in human cell lines, and genetic loci identified from a yeast cross, and evaluate the biological plausibility of the core process identified. More generally, we propose sparsity as a guiding prior for resolving latent structure in empirical genotype-phenotype maps.

Keywords: genotype–phenotype map; penalized matrix decomposition; sparsity; structure discovery.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Genotype
Humans
Mutation
Phenotype
Saccharomyces cerevisiae* / genetics

Grants and funding

R01 GM104239/GM/NIGMS NIH HHS/United States