Background: Profiling or clustering individuals based on personality and other characteristics is a common statistical approach used in marketing, medicine, and social sciences. This approach improves data simplicity, supports the implementation of a data-driven decision-making process, and guides intervention strategies, such as personalized care. However, the clustering process involves loss of information owing to the discretization of continuous variables. Although any loss of information may be practically or pragmatically acceptable, the amount of information lost and its influence on predicting external outcomes have not yet been systematically investigated.
Methods: We assessed the accuracy of predicting physical activity using the clustering approach and compared it with the dimensional approach, where variables are used as continuous regressors. This analysis is based on survey data from a sample of 20,573 individuals regarding physical activity and psychological traits, including the Big-Five personality traits.
Results: A four-cluster solution, supported by the standard criterion for determining the number of clusters, achieved no more than 60-70% prediction accuracy of the dimensional approach employing the raw dimensional scale as explanatory variables.
Conclusion: The cluster solution suggested by conventional statistical criteria may not be optimal when clusters are used to predict external outcomes.
Keywords: Clustering; Personality; Physical activity; Prediction; Profiling.
© 2024. The Author(s).