PhenoMan: phenotypic data exploration, selection, management and quality control for association studies of rare and common variants

Bioinformatics. 2014 Feb 1;30(3):442-4. doi: 10.1093/bioinformatics/btt682. Epub 2013 Dec 12.

Abstract

Motivation: Next-generation sequencing and other high-throughput technology advances have promoted great interest in detecting associations between complex traits and genetic variants. Phenotype selection, quality control (QC) and control of confounders are crucial and can have a great impact on the ability to detect associations. Although there are programs to perform association analyses, e.g. PLINK and GenABEL, they cannot be used for comprehensive management and QC of phenotype data. To address this need PhenoMan was developed: to select individuals based on multiple phenotype criteria or population membership; control for missing covariate data; remove related individuals, duplicate samples and individuals with incorrect sex specification; recode primary traits and covariates; transform data; remove or winsorize outliers; select covariates for analysis; and create residuals. To ensure consistency and harmonization between analyses, a report is generated for every dataset. Summary statistics are also provided in graphical or text format. PhenoMan can be used for selection and manipulation of quantitative, disease and control data.

Summary: Phenoman is freeware that provides approaches for efficient exploration and management of phenotype data. Proper QC of phenotypes before proceeding to the association analysis is critical to ensure control of type I and II errors, reliable effect estimates and consistent results between studies. PhenoMan is highly beneficial for the preparation of qualitative and quantitative trait data for association studies using new datasets as well as those obtained from public repositories.

Availability and implementation: code.google.com/p/phenoman

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Genetic Association Studies / methods*
  • Genetic Association Studies / standards
  • Genetic Variation*
  • High-Throughput Nucleotide Sequencing / methods
  • High-Throughput Nucleotide Sequencing / standards
  • Humans
  • Phenotype*
  • Quality Control
  • Quantitative Trait, Heritable
  • Software*