Standardization procedures are commonly used to combine phenotype data that were measured using different instruments, but there is little information on how the choice of standardization method influences pooled estimates and heterogeneity. Heterogeneity is of key importance in meta-analyses of observational studies because it affects the statistical models used and the decision of whether or not it is appropriate to calculate a pooled estimate of effect. Using 2-stage individual participant data analyses, we compared 2 common methods of standardization, T-scores and category-centered scores, to create combinable memory scores using cross-sectional data from 3 Canadian population-based studies (the Canadian Study on Health and Aging (1991-1992), the Canadian Community Health Survey on Healthy Aging (2008-2009), and the Quebec Longitudinal Study on Nutrition and Aging (2004-2005)). A simulation was then conducted to assess the influence of varying the following items across population-based studies: 1) effect size, 2) distribution of confounders, and 3) the relationship between confounders and the outcome. We found that pooled estimates based on the unadjusted category-centered scores tended to be larger than those based on the T-scores, although the differences were negligible when adjusted scores were used, and that most individual participant data meta-analyses identified significant heterogeneity. The results of the simulation suggested that in terms of heterogeneity, the method of standardization played a smaller role than did different effect sizes across populations and differential confounding of the outcome measure across studies. Although there was general consistency between the 2 types of standardization methods, the simulations identified a number of sources of heterogeneity, some of which are not the usual sources considered by researchers.