Motivation: With the emergence of genome-wide expression profiling data sets, the guilt by association (GBA) principle has been a cornerstone for deriving gene functional interpretations in silico. Given the limited success of traditional methods for producing clusters of genes with great amounts of functional similarity, new data-mining algorithms are required to fully exploit the potential of high-throughput genomic approaches.
Results: Ontology-based pattern identification (OPI) is a novel data-mining algorithm that systematically identifies expression patterns that best represent existing knowledge of gene function. Instead of relying on a universal threshold of expression similarity to define functionally related groups of genes, OPI finds the optimal analysis settings that yield gene expression patterns and gene lists that best predict gene function using the principle of GBA. We applied OPI to a publicly available gene expression data set on the life cycle of the malarial parasite Plasmodium falciparum and systematically annotated genes for 320 functional categories based on current Gene Ontology annotations. An ontology-based hierarchical tree of the 320 categories provided a systems-wide biological view of this important malarial parasite.