When developing AI systems that interact with humans, it is essential to design both a system that can understand humans, and a system that humans can understand. Most deep network based agent-modeling approaches are 1) not interpretable and 2) only model external behavior, ignoring internal mental states, which potentially limits their capability for assistance, interventions, discovering false beliefs, etc. To this end, we develop an interpretable modular neural framework for modeling the intentions of other observed entities. We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft, and show that incorporating interpretability can significantly increase predictive performance under the right conditions.
翻译:在开发与人类互动的人工智能系统时,必须设计一个能够理解人类的系统和人类能够理解的系统。 多数基于网络的深层代理模型模型方法是:(1) 无法解释,(2) 只有示范外在行为,忽视内部精神状态,这有可能限制其援助、干预、发现虚假信仰的能力。 为此,我们开发了一个可解释的模块神经框架,用于模拟其他被观察实体的意图。 我们展示了我们的方法的有效性,实验了人类参与者在地雷工艺中搜索和救援任务中提供的数据,并表明纳入可解释性可以极大地提高在适当条件下的预测性能。