Biclustering of linear patterns in gene expression data

Qinghui Gao; Christine Ho; Yingmin Jia; Jingyi Jessica Li; Haiyan Huang

doi:10.1089/cmb.2012.0032

Biclustering of linear patterns in gene expression data

J Comput Biol. 2012 Jun;19(6):619-31. doi: 10.1089/cmb.2012.0032.

Authors

Qinghui Gao¹, Christine Ho, Yingmin Jia, Jingyi Jessica Li, Haiyan Huang

Affiliation

¹ Seventh Research Division and Department of Systems and Control, Beihang University, Beijing China.

Abstract

Identifying a bicluster, or submatrix of a gene expression dataset wherein the genes express similar behavior over the columns, is useful for discovering novel functional gene interactions. In this article, we introduce a new algorithm for finding biClusters with Linear Patterns (CLiP). Instead of solely maximizing Pearson correlation, we introduce a fitness function that also considers the correlation of complementary genes and conditions. This eliminates the need for a priori determination of the bicluster size. We employ both greedy search and the genetic algorithm in optimization, incorporating resampling for more robust discovery. When applied to both real and simulation datasets, our results show that CLiP is superior to existing methods. In analyzing RNA-seq fly and worm time-course data from modENCODE, we uncover a set of similarly expressed genes suggesting maternal dependence. Supplementary Material is available online (at www.liebertonline.com/cmb).

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Animals
Caenorhabditis elegans / genetics
Computational Biology / methods*
Drosophila melanogaster / genetics
Gene Expression Profiling
Gene Expression*
Multigene Family
Pattern Recognition, Automated / methods*
Saccharomyces cerevisiae / genetics
Sequence Analysis, RNA

Grants and funding

EY019094/EY/NEI NIH HHS/United States