The lexical hypothesis posits that personality traits are encoded in language and is foundational to models like the Big Five. We created a bottom-up personality model from a classic adjective list using machine learning and compared its descriptive utility against the Big Five by analyzing one million Reddit comments. The Big Five, particularly Agreeableness, Conscientiousness, and Neuroticism, provided a far more powerful and interpretable description of these online communities. In contrast, our machine-learning clusters provided no meaningful distinctions, failed to recover the Extraversion trait, and lacked the psychometric coherence of the Big Five. These results affirm the robustness of the Big Five and suggest personality's semantic structure is context-dependent. Our findings show that while machine learning can help check the ecological validity of established psychological theories, it may not be able to replace them.
翻译:词汇假说认为人格特质编码于语言之中,是诸如大五人格等模型的理论基础。我们采用机器学习方法,基于经典形容词列表构建了自下而上的人格模型,并通过分析一百万条Reddit评论,将其描述效用与大五人格模型进行比较。大五人格模型(尤其是宜人性、尽责性和神经质维度)为这些网络社区提供了远更具解释力且可解读的描述。相比之下,我们的机器学习聚类未能提供有意义的区分,未能复现外向性特质,且缺乏大五人格的心理测量学连贯性。这些结果证实了大五人格模型的稳健性,并表明人格的语义结构具有情境依赖性。我们的研究显示,机器学习虽有助于检验既有心理学理论的生态效度,但可能无法取代这些理论。