Optimising the number of subjects required for an event-related functional imaging study is critical for ensuring sufficient statistical power. We report an empirical investigation of this issue by employing a resampling approach to the data of 58 subjects drawn from four previous GO/NOGO studies. Using voxelwise measures and setting the activation map from the complete sample to be a "gold standard", analyses revealed the statistical power to be surprisingly low at typical sample sizes (n = 20). However, voxels that were significantly active from smaller samples tended to be true positives, that is, they were typically active in the gold standard map and correlated well with the gold standard activation measure. The numerous false negatives that resulted from the lower SNR of the smaller samples drove the poor statistical power of those samples. Splitting the sample into two groups provided a test of the reproducibility of activation maps that was assessed using an alternative measure that quantified the distances between centres-of-mass of activated areas. These analyses revealed that although the voxelwise overlap may be poor, the locations of activated areas provide some optimism for studies with typical sample sizes. With n = 20 in each of two groups, it was found that the centres-of-mass for 80% of activated areas fell within 25 mm of each other. The reported analyses, by quantifying the spatial reproducibility for various sample sizes performing a typical event-related cognitive task, thus provide an empirical measure of the disparity to be expected in comparing activation maps.