A zero-inflated Poisson model for insertion tolerance analysis of genes based on Tn-seq data

Bioinformatics. 2016 Jun 1;32(11):1701-8. doi: 10.1093/bioinformatics/btw061. Epub 2016 Feb 1.

Abstract

Motivation: Transposon insertion sequencing (Tn-seq) is an emerging technology that combines transposon mutagenesis with next-generation sequencing technologies for the identification of genes related to bacterial survival. The resulting data from Tn-seq experiments consist of sequence reads mapped to millions of potential transposon insertion sites and a large portion of insertion sites have zero mapped reads. Novel statistical method for Tn-seq data analysis is needed to infer functions of genes on bacterial growth.

Results: In this article, we propose a zero-inflated Poisson model for analyzing the Tn-seq data that are high-dimensional and with an excess of zeros. Maximum likelihood estimates of model parameters are obtained using an expectation-maximization (EM) algorithm, and pseudogenes are utilized to construct appropriate statistical tests for the transposon insertion tolerance of normal genes of interest. We propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant and hyper-tolerant, while controlling the false discovery rate. We evaluate the proposed method with simulation studies and apply the proposed method to a real Tn-seq data from an experiment that studied the bacterial pathogen, Campylobacter jejuniAvailability and implementation: We provide R code for implementing our proposed method at http://github.com/ffliu/TnSeq A user's guide with example data analysis is also available there.

Contact: [email protected]

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • DNA Transposable Elements*
  • High-Throughput Nucleotide Sequencing
  • Likelihood Functions

Substances

  • DNA Transposable Elements