This study investigated the utility of confirmatory factor analysis (CFA) and item response theory (IRT) models for testing the comparability of psychological measurements. Both procedures were used to investigate whether mood ratings collected in Minnesota and China were comparable. Several issues were addressed. The first issue was that of establishing a common measurement scale across groups, which involves full or partial measurement invariance of trait indicators. It is shown that using CFA or IRT models, test items that function differentially as trait indicators across groups need not interfere with comparing examinees on the same trait dimension. Second, the issue of model fit was addressed. It is proposed that person-fit statistics be used to judge the practical fit of IRT models. Finally, topics for future research are suggested.