With the advance of next generation sequencing technologies, researchers now routinely obtain a collection of microbial sequences with complex phylogenetic relationships. It is often of interest to analyze the association between certain environmental factors and characteristics of the microbial collection. Though methods have been developed to test for association between the microbial composition with environmental factors as well as between coevolving traits, a flexible model that can provide a comprehensive picture of the relationship between microbial community characteristics and environmental variables will be tremendously beneficial. We developed a Bayesian approach for association analysis while incorporating the phylogenetic structure to account for the dependence between observations. To overcome the computational difficulty related to the phylogenetic tree, a variational algorithm was developed to evaluate the posterior distribution. As the posterior distribution can be readily obtained for parameters of interest and any derived variables, the association relationship can be examined comprehensively. With two application examples, we demonstrated that the Bayesian approach can uncover nuanced details of the microbial assemblage with regard to the environmental factor. The proposed Bayesian approach and variational algorithm can be extended for other problems involving dependence over tree-like structures.
Keywords: Bayesian; microbial community; phylogenetic trees; variational inference.
This work was authored as part of the Contributor's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 USC. 105, no copyright protection is available for such works under US Law.