Motivation: Biological assays are often carried out on tissues that contain many cell lineages and active pathways. Microarray data produced using such material therefore reflect superimpositions of biological processes. Analysing such data for shared gene function by means of well-matched assays may help to provide a better focus on specific cell types and processes. The identification of genes that behave similarly in different biological systems also has the potential to reveal new insights into preserved biological mechanisms.
Results: In this article, we propose a hierarchical Bayesian model allowing integrated analysis of several microarray data sets for shared gene function. Each gene is associated with an indicator variable that selects whether binary class labels are predicted from expression values or by a classifier which is common to all genes. Each indicator selects the component models for all involved data sets simultaneously. A quantitative measure of shared gene function is obtained by inferring a probability measure over these indicators. Through experiments on synthetic data, we illustrate potential advantages of this Bayesian approach over a standard method. A shared analysis of matched microarray experiments covering (a) a cycle of mouse mammary gland development and (b) the process of in vitro endothelial cell apoptosis is proposed as a biological gold standard. Several useful sanity checks are introduced during data analysis, and we confirm the prior biological belief that shared apoptosis events occur in both systems. We conclude that a Bayesian analysis for shared gene function has the potential to reveal new biological insights, unobtainable by other means.
Availability: An online supplement and MatLab code are available at http://www.sykacek.net/research.html#mcabf