The central dogma of molecular biology delineates a unidirectional causal flow, i.e., DNA → RNA → protein → trait. Genome-wide association studies, next-generation sequencing association studies, and their meta-analyses have successfully identified ~12,000 susceptibility genetic variants that are associated with a broad array of human physiological traits. However, such conventional association studies ignore the mediate causers (i.e., RNA, protein) and the unidirectional causal pathway. Such studies may not be ideally powerful; and the genetic variants identified may not necessarily be genuine causal variants. In this article, we model the central dogma by a mediate causal model and analytically prove that the more remote an omics level is from a physiological trait, the smaller the magnitude of their correlation is. Under both random and extreme sampling schemes, we numerically demonstrate that the proteome-trait correlation test is more powerful than the transcriptome-trait correlation test, which in turn is more powerful than the genotype-trait association test. In conclusion, integrating RNA and protein expressions with DNA data and causal inference are necessary to gain a full understanding of how genetic causal variants contribute to phenotype variations.
Keywords: associations; causations; data integration; proteomics; systems biology; transcriptomics.