The extent of and accessibility to genetic variation in a large germplasm collection are of interest to biologists and breeders. Construction of core collections (CC) is a favored approach to efficient exploration and conservation of novel variation in genetic resources. Using 4,310 Chinese accessions of Oryza sativa L. and 36 SSR markers, we investigated the genetic variation in different sized sub-populations, the factors that affect CC size and different sampling strategies in establishing CC. Our results indicated that a mathematical model could reliably simulate the relationship between genetic variation and population size and thus predict the variation in large germplasm collections using randomly sampled populations of 700-1,500 accessions. We recommend two principles in determining the CC size: (1) compromising between genetic variation and genetic redundancy and (2) retaining the main types of alleles. Based on the most effective scheme selected from 229 sampling schemes, we finally developed a hierarchical CC system, in which different population scales and genetic diversities allow a flexible use of genetic resources. The CC, comprising 1.7% (932) of the accessions in the basic collection, retained more than 85% of both the SSR and phenotypic variations. A mini core collection, comprising 0.3% (189) of the accessions in the basic collection, retained 70.65% of the SSR variation and 76.97% of the phenotypic variation, thus providing a rational framework for intensive surveys of natural variation in complex traits in rice genetic resources and hence utilization of variation in rice breeding.