The false discovery rate: a key concept in large-scale genetic studies

James J Chen; Paula K Roberson; Michael J Schell

doi:10.1177/107327481001700108

The false discovery rate: a key concept in large-scale genetic studies

Cancer Control. 2010 Jan;17(1):58-62. doi: 10.1177/107327481001700108.

Authors

James J Chen¹, Paula K Roberson, Michael J Schell

Affiliation

¹ Division of Personalized Nutrition and Medicine, National Center for Toxicological Research, Food and Drug Administration, HFT-20, Jefferson, AR 72079, USA. [email protected]

PMID: 20010520
DOI: 10.1177/107327481001700108

Abstract

Background: In experimental research, a statistical test is often used for making decisions on a null hypothesis such as that the means of gene expression in the normal and tumor groups are equal. Typically, a test statistic and its corresponding P value are calculated to measure the extent of the difference between the two groups. The null hypothesis is rejected and a discovery is declared when the P value is less than a prespecified significance level. When more than one test is conducted, use of a significance level intended for use by a single test typically leads to a large chance of false-positive findings.

Methods: This paper presents an overview of the multiple testing framework and describes the false discovery rate (FDR) approach to determining the significance cutoff when a large number of tests are conducted.

Results: The FDR is the expected proportion of the null hypotheses that are falsely rejected divided by the total number of rejections. An FDR-controlling procedure is described and illustrated with a numerical example.

Conclusions: In multiple testing, a classical "family-wise error rate" (FWE) approach is commonly used when the number of tests is small. When a study involves a large number of tests, the FDR error measure is a more useful approach to determining a significance cutoff, as the FWE approach is too stringent. The FDR approach allows more claims of significant differences to be made, provided the investigator is willing to accept a small fraction of false-positive findings.

MeSH terms

Data Interpretation, Statistical*
Databases, Genetic
False Positive Reactions
Gene Expression Profiling / methods
Genetic Techniques*
Humans
Models, Genetic
Oligonucleotide Array Sequence Analysis / methods
Statistics as Topic / methods*