The authors examined data (N = 34,108) on the differential reliability and validity of facet scales from the NEO Inventories. They evaluated the extent to which (a) psychometric properties of facet scales are generalizable across ages, cultures, and methods of measurement, and, (b) validity criteria are associated with different forms of reliability. Composite estimates of facet scale stability, heritability, and cross-observer validity were broadly generalizable. Two estimates of retest reliability were independent predictors of the three validity criteria; none of three estimates of internal consistency was. Available evidence suggests the same pattern of results for other personality inventories. Internal consistency of scales can be useful as a check on data quality but appears to be of limited utility for evaluating the potential validity of developed scales, and it should not be used as a substitute for retest reliability. Further research on the nature and determinants of retest reliability is needed.