Text-based personality prediction by computational models is an emerging field with the potential to significantly improve on key weaknesses of survey-based personality assessment. We investigate 3848 profiles from Twitter with self-labeled Myers-Briggs personality traits (MBTI) - a framework closely related to the Five Factor Model of personality - to better understand how text-based digital traces from social engagement online can be used to predict user personality traits. We leverage BERT, a state-of-the-art NLP architecture based on deep learning, to analyze various sources of text that hold most predictive power for our task. We find that biographies, statuses, and liked tweets contain significant predictive power for all dimensions of the MBTI system. We discuss our findings and their implications for the validity of the MBTI and the lexical hypothesis, a foundational theory underlying the Five Factor Model that links language use and behavior. Our results hold optimistic implications for personality psychologists, computational linguists, and other social scientists aiming to predict personality from observational text data and explore the links between language and core behavioral traits.
翻译:通过计算模型进行基于文字的人格预测是一个新兴领域,有可能大大改善基于调查的人格评估的关键弱点。我们调查了Twitter上带有自我标签的Myers-Briggs个性特征(MBTI)的3848个剖面,这个框架与个性五因素模型密切相关,以更好地了解如何利用在线社会参与的基于文字的数字痕迹来预测用户个性特征。我们利用基于深层次学习的先进NLP架构BERT,分析各种具有我们任务最大预测力的文本来源。我们发现,生物学、状态和喜欢的推文含有对MBTI系统所有层面的重大预测力。我们讨论了我们的调查结果及其对MBTI的有效性和词汇假设的影响,这是将语言使用和行为联系起来的五因素模型的基础理论。我们的结果对个心理学家、计算语言学家和其他社会科学家产生了乐观的影响,目的是预测观察文本数据中的个性,并探索语言与核心行为特征之间的联系。