A dynamic programming algorithm for binning microbial community profiles

Quansong Ruan; Joshua A Steele; Michael S Schwalbach; Jed A Fuhrman; Fengzhu Sun

doi:10.1093/bioinformatics/btl114

A dynamic programming algorithm for binning microbial community profiles

Bioinformatics. 2006 Jun 15;22(12):1508-14. doi: 10.1093/bioinformatics/btl114. Epub 2006 Mar 27.

Authors

Quansong Ruan¹, Joshua A Steele, Michael S Schwalbach, Jed A Fuhrman, Fengzhu Sun

Affiliation

¹ Department of Mathematics, University of Southern California 3620 South Vermont Avenue, KAP 108, Los Angeles, California 90089-253, USA.

PMID: 16567364
DOI: 10.1093/bioinformatics/btl114

Abstract

Motivation: A number of community profiling approaches have been widely used to study the microbial community composition and its variations in environmental ecology. Automated Ribosomal Intergenic Spacer Analysis (ARISA) is one such technique. ARISA has been used to study microbial communities using 16S-23S rRNA intergenic spacer length heterogeneity at different times and places. Owing to errors in sampling, random mutations in PCR amplification, and probably mostly variations in readings from the equipment used to analyze fragment sizes, the data read directly from the fragment analyzer should not be used for down stream statistical analysis. No optimal data preprocessing methods are available. A commonly used approach is to bin the reading lengths of the 16S-23S intergenic spacer. We have developed a dynamic programming algorithm based binning method for ARISA data analysis which minimizes the overall differences between replicates from the same sampling location and time.

Results: In a test example from an ocean time series sampling program, data preprocessing identified several outliers which upon re-examination were found to be because of systematic errors. Clustering analysis of the ARISA from different times based on the dynamic programming algorithm binned data revealed important features of the biodiversity of the microbial communities.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Biodiversity
Cluster Analysis
Computational Biology / methods*
DNA, Ribosomal Spacer / genetics*
Genes, Bacterial*
Models, Biological
Models, Theoretical
Mutation
Programming Languages
RNA, Ribosomal, 16S / genetics
RNA, Ribosomal, 23S / genetics
Seawater / microbiology*
Software

Substances

DNA, Ribosomal Spacer
RNA, Ribosomal, 16S
RNA, Ribosomal, 23S