Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation

BMC Bioinformatics. 2009 Mar 16:10:84. doi: 10.1186/1471-2105-10-84.

Abstract

Background: The use of current high-throughput genetic, genomic and post-genomic data leads to the simultaneous evaluation of a large number of statistical hypothesis and, at the same time, to the multiple-testing problem. As an alternative to the too conservative Family-Wise Error-Rate (FWER), the False Discovery Rate (FDR) has appeared for the last ten years as more appropriate to handle this problem. However one drawback of FDR is related to a given rejection region for the considered statistics, attributing the same value to those that are close to the boundary and those that are not. As a result, the local FDR has been recently proposed to quantify the specific probability for a given null hypothesis to be true.

Results: In this context we present a semi-parametric approach based on kernel estimators which is applied to different high-throughput biological data such as patterns in DNA sequences, genes expression and genome-wide association studies.

Conclusion: The proposed method has the practical advantages, over existing approaches, to consider complex heterogeneities in the alternative hypothesis, to take into account prior information (from an expert judgment or previous studies) by allowing a semi-supervised mode, and to deal with truncated distributions such as those obtained in Monte-Carlo simulations. This method has been implemented and is available through the R package kerfdr via the CRAN or at (http://stat.genopole.cnrs.fr/software/kerfdr).

MeSH terms

  • Computational Biology / methods*
  • False Positive Reactions
  • Gene Expression Profiling / methods*
  • Genome-Wide Association Study
  • Internet
  • Models, Statistical*
  • Software*