Statistical analysis of a small set of time-ordered gene expression data using linear splines

Bioinformatics. 2002 Nov;18(11):1477-85. doi: 10.1093/bioinformatics/18.11.1477.

Abstract

Motivation: Recently, the temporal response of genes to changes in their environment has been investigated using cDNA microarray technology by measuring the gene expression levels at a small number of time points. Conventional techniques for time series analysis are not suitable for such a short series of time-ordered data. The analysis of gene expression data has therefore usually been limited to a fold-change analysis, instead of a systematic statistical approach.

Methods: We use the maximum likelihood method together with Akaike's Information Criterion to fit linear splines to a small set of time-ordered gene expression data in order to infer statistically meaningful information from the measurements. The significance of measured gene expression data is assessed using Student's t-test.

Results: Previous gene expression measurements of the cyanobacterium Synechocystis sp. PCC6803 were reanalyzed using linear splines. The temporal response was identified of many genes that had been missed by a fold-change analysis. Based on our statistical analysis, we found that about four gene expression measurements or more are needed at each time point.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Cyanobacteria / classification
  • Cyanobacteria / genetics*
  • DNA, Bacterial / genetics
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / genetics*
  • Likelihood Functions
  • Linear Models
  • Models, Genetic*
  • Models, Statistical
  • Reproducibility of Results
  • Sample Size
  • Sensitivity and Specificity
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*
  • Species Specificity
  • Stochastic Processes
  • Time Factors

Substances

  • DNA, Bacterial