Motivation: Owing to the experimental cost and difficulty in obtaining biological materials, it is essential to consider appropriate sample sizes in microarray studies. With the growing use of the False Discovery Rate (FDR) in microarray analysis, an FDR-based sample size calculation is essential.
Method: We describe an approach to explicitly connect the sample size to the FDR and the number of differentially expressed genes to be detected. The method fits parametric models for degree of differential expression using the Expectation-Maximization algorithm.
Results: The applicability of the method is illustrated with simulations and studies of a lung microarray dataset. We propose to use a small training set or published data from relevant biological settings to calculate the sample size of an experiment.
Availability: Code to implement the method in the statistical package R is available from the authors.