Revisiting the codon adaptation index from a whole-genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models

Ronald Jansen; Harmen J Bussemaker; Mark Gerstein

doi:10.1093/nar/gkg306

Revisiting the codon adaptation index from a whole-genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models

Nucleic Acids Res. 2003 Apr 15;31(8):2242-51. doi: 10.1093/nar/gkg306.

Authors

Ronald Jansen¹, Harmen J Bussemaker, Mark Gerstein

Affiliation

¹ Department of Molecular Biophysics and Biochemistry, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, USA.

Abstract

Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and the codon usage, use this bias to predict the expression level of genes. When these indices were first introduced, they were based on fairly simple assumptions about which genes are most highly expressed: the CAI was originally based on the codon composition of a set of only 24 highly expressed genes, and the codon usage on assumptions about which functional classes of genes are highly expressed in fast-growing bacteria. Given the recent advent of genome-wide expression data, we should be able to improve on these assumptions. Here, we measure, in yeast, the degree to which consideration of the current genome-wide expression data sets improves the performance of both numerical indices. Indeed, we find that by changing the parameterization of each model its correlation with actual expression levels can be somewhat improved, although both indices are fairly insensitive to the exact way they are parameterized. This insensitivity indicates a consistent codon bias amongst highly expressed genes. We also attempt direct linear regression of codon composition against genome-wide expression levels (and protein abundance data). This has some similarity with the CAI formalism and yields an alternative model for the prediction of expression levels based on the coding sequences of genes. More information is available at http://bioinfo.mbb.yale.edu/expression/codons.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Codon / genetics*
Computational Biology / methods
Gene Expression Profiling
Gene Expression Regulation, Fungal
Genome*
Genome, Fungal
Models, Genetic*
Saccharomyces cerevisiae / genetics

Substances

Codon

Abstract

Publication types

MeSH terms

Substances

Grants and funding