Underpowered experiments have three problems: true effects are harder to detect, the true effects that are detected tend to have inflated effect sizes and as power decreases so does the probability that a statistically significant result represents a true effect. Many biology experiments are underpowered and recent calls to change the traditional 0.05 significance threshold to a more stringent value of 0.005 will further reduce the power of the average experiment. Increasing power by increasing the sample size is often the only option considered, but more samples increases costs, makes the experiment harder to conduct and is contrary to the 3Rs principles for animal research. We show how the design of an experiment and some analytical decisions can have a surprisingly large effect on power.
Keywords: experimental design; power; reproducibility; sample size; statistics.