Motivation: Transposon insertion sequencing (Tn-seq) is an emerging technology that combines transposon mutagenesis with next-generation sequencing technologies for the identification of genes related to bacterial survival. The resulting data from Tn-seq experiments consist of sequence reads mapped to millions of potential transposon insertion sites and a large portion of insertion sites have zero mapped reads. Novel statistical method for Tn-seq data analysis is needed to infer functions of genes on bacterial growth.
Results: In this article, we propose a zero-inflated Poisson model for analyzing the Tn-seq data that are high-dimensional and with an excess of zeros. Maximum likelihood estimates of model parameters are obtained using an expectation-maximization (EM) algorithm, and pseudogenes are utilized to construct appropriate statistical tests for the transposon insertion tolerance of normal genes of interest. We propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant and hyper-tolerant, while controlling the false discovery rate. We evaluate the proposed method with simulation studies and apply the proposed method to a real Tn-seq data from an experiment that studied the bacterial pathogen, Campylobacter jejuniAvailability and implementation: We provide R code for implementing our proposed method at http://github.com/ffliu/TnSeq A user's guide with example data analysis is also available there.
Contact: [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: [email protected].