Biclustering of linear patterns in gene expression data

J Comput Biol. 2012 Jun;19(6):619-31. doi: 10.1089/cmb.2012.0032.

Abstract

Identifying a bicluster, or submatrix of a gene expression dataset wherein the genes express similar behavior over the columns, is useful for discovering novel functional gene interactions. In this article, we introduce a new algorithm for finding biClusters with Linear Patterns (CLiP). Instead of solely maximizing Pearson correlation, we introduce a fitness function that also considers the correlation of complementary genes and conditions. This eliminates the need for a priori determination of the bicluster size. We employ both greedy search and the genetic algorithm in optimization, incorporating resampling for more robust discovery. When applied to both real and simulation datasets, our results show that CLiP is superior to existing methods. In analyzing RNA-seq fly and worm time-course data from modENCODE, we uncover a set of similarly expressed genes suggesting maternal dependence. Supplementary Material is available online (at www.liebertonline.com/cmb).

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Caenorhabditis elegans / genetics
  • Computational Biology / methods*
  • Drosophila melanogaster / genetics
  • Gene Expression Profiling
  • Gene Expression*
  • Multigene Family
  • Pattern Recognition, Automated / methods*
  • Saccharomyces cerevisiae / genetics
  • Sequence Analysis, RNA