Intuitive psychology is a pillar of common-sense reasoning. The replication of this reasoning in machine intelligence is an important stepping-stone on the way to human-like artificial intelligence. Several recent tasks and benchmarks for examining this reasoning in Large-Large Models have focused in particular on belief attribution in Theory-of-Mind tasks. These tasks have shown both successes and failures. We consider in particular a recent purported success case, and show that small variations that maintain the principles of ToM turn the results on their head. We argue that in general, the zero-hypothesis for model evaluation in intuitive psychology should be skeptical, and that outlying failure cases should outweigh average success rates. We also consider what possible future successes on Theory-of-Mind tasks by more powerful LLMs would mean for ToM tasks with people.
翻译:直觉心理学是常识推理的支柱。在机器智能中复制这种推理是人类类人造智能道路上的一个重要跳板。在大型模型中,最近用于审查这一推理的若干任务和基准特别侧重于理论任务中的信仰归属。这些任务既有成功也有失败。我们特别考虑最近一个所谓成功的案例,并表明维持托姆原则的微小差异会把结果转动到他们的头上。我们争辩说,一般而言,直觉心理学模型评估的零伪理论应该持怀疑态度,而外围失败案例应该超过平均成功率。我们还考虑了更强大的LMM公司在理论任务中今后可能取得的成功对托姆公司的任务意味着什么。