Determining the appropriate sample size is a crucial component of positron emission tomography (PET) studies. Power calculations, the traditional method for determining sample size, were developed for hypothesis-testing approaches to data analysis. This method for determining sample size is challenged by the complexities of PET data analysis: use of exploratory analysis strategies, search for multiple correlated nodes on interlinked networks, and analysis of large numbers of pixels that may have correlated values due to both anatomical and functional dependence. We examine the effects of variable sample size in a study of human memory, comparing large (n = 33), medium (n = 16,17), small (n = 11, 11, 11), and very small (n = 6,6,7,7,7) samples. Results from the large sample are assumed to be the "gold standard." The primary criterion for assessing sample size is replicability. This is evaluated using a hierarchically ordered group of parameters: pattern of peaks, location of peaks, number of peaks, size (volume) of peaks, and intensity of the associated t (or z) statistic. As sample size decreases, false negatives begin to appear, with some loss of pattern and peak detection; there is no corresponding increase in false positives. The results suggest that good replicability occurs with a sample size of 10-20 subjects in studies of human cognition that use paired subtraction comparisons of single experimental/baseline conditions with blood flow differences ranging from 4 to 13%.