PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data

Kuo Ping Chiu; Chee-Hong Wong; Qiongyu Chen; Pramila Ariyaratne; Hong Sain Ooi; Chia-Lin Wei; Wing-Kin Ken Sung; Yijun Ruan

doi:10.1186/1471-2105-7-390

PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data

BMC Bioinformatics. 2006 Aug 25:7:390. doi: 10.1186/1471-2105-7-390.

Authors

Kuo Ping Chiu¹, Chee-Hong Wong, Qiongyu Chen, Pramila Ariyaratne, Hong Sain Ooi, Chia-Lin Wei, Wing-Kin Ken Sung, Yijun Ruan

Affiliation

¹ Information and Mathematical Sciences Group, Genome Institute of Singapore, 60 Biopolis Street, Genome #02-01, 138672, Singapore. [email protected]

Abstract

Background: We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable.

Results: We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the Project Manager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping.

Conclusion: The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Animals
Base Sequence
Computational Biology / methods
Databases, Nucleic Acid
Genome / genetics*
Humans
Molecular Sequence Data
Sequence Analysis, DNA / methods*
Software*
Transcription, Genetic / genetics*

Abstract

Publication types

MeSH terms

Grants and funding