Evaluating the ability of large language models to emulate personality

Sci Rep. 2025 Jan 2;15(1):519. doi: 10.1038/s41598-024-84109-5.

Abstract

For social sciences, recent advancements in Large Language Models (LLMs) have the potential to revolutionize the study of human behaviors by facilitating the creation of realistic agents characterized by a diverse range of individual differences. This research presents novel simulation studies assessing GPT-4's ability to role-play real-world individuals with diverse big five personality profiles. In simulation 1, emulated personality responses exhibited superior internal consistency, but also a more distinct and structured factor organization compared to the human counterparts they were based on. Furthermore, these emulated scores exhibited remarkably high convergent validity with the human self-reported personality scale scores. Simulation 2 replicated these findings but demonstrated that the robustness of GPT-4's role-playing appears to wane as the complexity of the roles increases. Introducing supplementary demographic information in conjunction with personality affected convergent validities for certain emulated traits. However, including additional demographic characteristics enhanced the validity of emulated personality scores for predicting external criteria. Collectively, the findings underscore a promising future of using LLMs to emulate realistic and real person-based agents with varied personality traits. The broader applied implications and avenues for future research are elaborated upon.

MeSH terms

  • Adult
  • Female
  • Humans
  • Language*
  • Male
  • Models, Psychological
  • Personality*