A conditional predictive p-value to compare a multinomial with an overdispersed multinomial in the analysis of T-cell populations

Biostatistics. 2014 Jan;15(1):129-39. doi: 10.1093/biostatistics/kxt039. Epub 2013 Oct 4.

Abstract

Immunological experiments that record primary molecular sequences of T-cell receptors produce moderate to high-dimensional categorical data, some of which may be subject to extra-multinomial variation caused by technical constraints of cell-based assays. Motivated by such experiments in melanoma research, we develop a statistical procedure for testing the equality of two discrete populations, where one population delivers multinomial data and the other is subject to a specific form of overdispersion. The procedure computes a conditional-predictive p-value by splitting the data set into two, obtaining a predictive distribution for one piece given the other, and using the observed predictive ordinate to generate a p-value. The procedure has a simple interpretation, requires fewer modeling assumptions than would be required of a fully Bayesian analysis, and has reasonable operating characteristics as evidenced empirically and by asymptotic analysis.

Keywords: Bayesian p-value; Dirichlet multinomial; Double overdispersion; Fisher's exact test; HPRT assay; Mass culture experiments; Molecular sequence data; T-cell receptor.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cell Proliferation
  • Complementarity Determining Regions / genetics
  • Complementarity Determining Regions / immunology*
  • Data Interpretation, Statistical*
  • Humans
  • Models, Statistical*
  • Mutation / genetics
  • Mutation / immunology
  • Receptors, Antigen, T-Cell / immunology*
  • Sequence Analysis, DNA
  • T-Lymphocytes / immunology*

Substances

  • Complementarity Determining Regions
  • Receptors, Antigen, T-Cell