Despite the many use cases for large language models (LLMs) in creating personalized chatbots, there has been limited research on evaluating the extent to which the behaviors of personalized LLMs accurately and consistently reflect specific personality traits. We consider studying the behavior of LLM-based agents, referred to as LLM personas, and present a case study with ChatGPT and GPT-4. The study investigates whether LLMs can generate content that aligns with their assigned personality profiles. To this end, we create distinct LLM personas based on the Big Five personality model, have them complete the 44-item Big Five Inventory (BFI) personality test and a story writing task, and then assess their essays with automatic and human evaluations. Results show that LLM personas' self-reported BFI scores are consistent with their designated personality types, with large effect sizes observed across five traits. Additionally, there are significant correlations between the assigned personality types and certain psycholinguistic features of their writings, as measured by the Linguistic Inquiry and Word Count (LIWC) tool. Interestingly, human evaluators perceive the stories as less personal when told that the stories are authored by AI. However, their judgments on other aspects of the writing such as readability, cohesiveness, redundancy, likeability, and believability remain largely unaffected. Notably, when evaluators were informed about the AI authorship, their accuracy in identifying the intended personality traits from the stories decreased by more than 10% for some traits. This research marks a significant step forward in understanding the capabilities of LLMs to express personality traits.
翻译:暂无翻译