PseudoPipe: an automated pseudogene identification pipeline

Zhaolei Zhang; Nicholas Carriero; Deyou Zheng; John Karro; Paul M Harrison; Mark Gerstein

doi:10.1093/bioinformatics/btl116

PseudoPipe: an automated pseudogene identification pipeline

Bioinformatics. 2006 Jun 15;22(12):1437-9. doi: 10.1093/bioinformatics/btl116. Epub 2006 Mar 30.

Authors

Zhaolei Zhang¹, Nicholas Carriero, Deyou Zheng, John Karro, Paul M Harrison, Mark Gerstein

Affiliation

¹ Banting and Best Department of Medical Research, Donnelly CCBR, University of Toronto 160 College Street, Toronto, ON M5S 3E1, Canada.

PMID: 16574694
DOI: 10.1093/bioinformatics/btl116

Abstract

Motivation: Mammalian genomes contain many 'genomic fossils' i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes.

Results: We have developed a homology-based computational pipeline ('PseudoPipe') that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential "parent" proteins against the intergenic regions of the genome and then processing the resulting "raw hits" -- i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Automation
Computational Biology / methods*
Evolution, Molecular
Genome
Humans
Models, Genetic
Pseudogenes
Reproducibility of Results
Software

Abstract

Publication types

MeSH terms

Grants and funding