Background: Accumulating databases in human genome research have enabled integrated genome-wide study on complicated diseases such as cancers. A practical approach is to mine a global transcriptome profile of disease from public database. New concepts of these diseases might emerge by landscaping this profile.
Methods: In this study, we clustered human colorectal normal mucosa (N), inflammatory bowel disease (IBD), adenoma (A) and cancer (T) related expression sequence tags (EST) into UniGenes via an in-house GetUni software package and analyzed the transcriptome overview of these libraries by GOTree Machine (GOTM). Additionally, we downloaded UniGene based cDNA libraries of colon and analyzed them by Xprofiler to cross validate the efficiency of GetUni. Semi-quantitative RT-PCR was used to validate the expression of beta-catenin and. 7 novel genes in colorectal cancers.
Results: The efficiency of GetUni was successfully validated by Xprofiler and RT-PCR. Genes in library N, IBD and A were all found in library T. A total of 14,879 genes were identified with 2,355 of them having at least 2 transcripts. Differences in gene enrichment among these libraries were statistically significant in 50 signal transduction pathways and Pfam protein domains by GOTM analysis P < 0.01 Hypergeometric Test). Genes in two metabolic pathways, ribosome and glycolysis, were more enriched in the expression profiles of A and IBD than in N and T. Seven transmembrane receptor superfamily genes were typically abundant in cancers.
Conclusion: Colorectal cancers are genetically heterogeneous. Transcription variants are common in them. Aberrations of ribosome and glycolysis pathway might be early indicators of precursor lesions in colon cancers. The electronic gene expression profile could be used to highlight the integral molecular events in colorectal cancers.