Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training. Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children. These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills.
翻译:思维理论(ToM),或将不可观察的精神状态归结于他人的能力,是人类社会互动、沟通、同情、自我意识和道德的核心。我们执行经典的错误信仰任务,在人类中广泛用来测试托马、多种语言模式,没有实例或预培训。我们的结果表明,2022年以前公布的模型几乎无法解决托马任务。然而,2022年1月版本的GPT-3(davinci-002)解决了托马任务的70%,其表现与七岁儿童相似。 此外,2022年11月版本(davinci-003)解决了托马任务的93%,其表现与九岁儿童相似。这些研究结果表明,托马类似的能力(远被认为是独一无二的人类)可能自发地成为语言模式改善语言技能的副产品。