To achieve human-like common sense about everyday life, machine learning systems must understand and reason about the goals, preferences, and actions of other agents in the environment. By the end of their first year of life, human infants intuitively achieve such common sense, and these cognitive achievements lay the foundation for humans' rich and complex understanding of the mental states of others. Can machines achieve generalizable, commonsense reasoning about other agents like human infants? The Baby Intuitions Benchmark (BIB) challenges machines to predict the plausibility of an agent's behavior based on the underlying causes of its actions. Because BIB's content and paradigm are adopted from developmental cognitive science, BIB allows for direct comparison between human and machine performance. Nevertheless, recently proposed, deep-learning-based agency reasoning models fail to show infant-like reasoning, leaving BIB an open challenge.
翻译:为实现人类对日常生活的常识,机器学习系统必须理解和理解环境中其他代理人的目标、偏好和行动。在他们生命的第一年结束时,人类婴儿直觉地实现了这种常识,这些认知成就为人类对他人精神状态的丰富和复杂的理解奠定了基础。 机器能否实现关于其他代理人如人类婴儿的可普及的常识推理? 婴儿试验基准(BBBB)挑战机器根据行为的根本原因预测代理人行为的可信赖性。 由于BIB的内容和范式是从发展认知科学中采纳的,BIB允许直接比较人类和机器表现。 尽管如此,最近提出的深学习的代理推理模型未能显示像婴儿一样的推理,让BIB留下一个公开的挑战。