A strategy for assembling the maize (Zea mays L.) genome

Scott J Emrich; Srinivas Aluru; Yan Fu; Tsui-Jung Wen; Mahesh Narayanan; Ling Guo; Daniel A Ashlock; Patrick S Schnable

doi:10.1093/bioinformatics/bth017

A strategy for assembling the maize (Zea mays L.) genome

Bioinformatics. 2004 Jan 22;20(2):140-7. doi: 10.1093/bioinformatics/bth017.

Authors

Scott J Emrich¹, Srinivas Aluru, Yan Fu, Tsui-Jung Wen, Mahesh Narayanan, Ling Guo, Daniel A Ashlock, Patrick S Schnable

Affiliation

¹ Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA 50011, USA.

PMID: 14734303
DOI: 10.1093/bioinformatics/bth017

Abstract

Because the bulk of the maize (Zea mays L.) genome consists of repetitive sequences, sequencing efforts are being targeted to its 'gene-rich' fraction. Traditional assembly programs are inadequate for this approach because they are optimized for a uniform sampling of the genome and inherently lack the ability to differentiate highly similar paralogs.

Results: We report the development of bioinformatics tools for the accurate assembly of the maize genome. This software, which is based on innovative parallel algorithms to ensure scalability, assembled 730,974 genomic survey sequences fragments in 4 h using 64 Pentium III 1.26 GHz processors of a commodity cluster. Algorithmic innovations are used to reduce the number of pairwise alignments significantly without sacrificing quality. Clone pair information was used to estimate the error rate for improved differentiation of polymorphisms versus sequencing errors. The assembly was also used to evaluate the effectiveness of various filtering strategies and thereby provide information that can be used to focus subsequent sequencing efforts.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Computing Methodologies
Database Management Systems
Databases, Nucleic Acid
Gene Expression Profiling / methods*
Genome, Plant*
Repetitive Sequences, Nucleic Acid / genetics*
Sequence Alignment / methods*
Sequence Analysis, DNA / methods*
Software
Software Design
Zea mays / genetics*