Motivation: To evaluate microarray data, clustering is widely used to group biological samples or genes. However, problems arise when comparing heterologous databases. As the clustering algorithm searches for similarities between experiments, it will most likely first separate the data sets, masking relationships that exist between samples from different databases.
Results: We developed a program, Venn Mapper, to calculate the statistical significance of the number of co-occurring differentially expressed genes in any of the two experiments. For proof of principle, we analysed a heterologous data set of 170 microarrays including breast and prostate cancer microarray analyses. Significant overlap was found in an unsupervised analysis between metastasized prostate cancer and metastasized breast cancer and BRCA mutated breast cancer. A comparison between single microarray data and the averaged breast and prostate data sets was also evaluated. This analysis suggests that genes expressed higher in stromal cells are also implicated in metastatic prostate cancer and BRCA mutated breast cancer. The Venn Mapper program identifies overlaps between samples from heterologous data sets and directly extracts the genes responsible for the overlap. From this information novel biological hypotheses may be addressed.
Availability: Venn Mapper is freely available on http://www.erasmusmc.nl/gatcplatform.
Supplementary information: http://www.erasmusmc.nl/gatcplatform/vennmapper.html.