Recent years have seen an increasing amount of work on embodied AI agents that can perform tasks by following human language instructions. However, most of these agents are reactive, meaning that they simply learn and imitate behaviors encountered in the training data. These reactive agents are insufficient for long-horizon complex tasks. To address this limitation, we propose a neuro-symbolic deliberative agent that, while following language instructions, proactively applies reasoning and planning based on its neural and symbolic representations acquired from past experience (e.g., natural language and egocentric vision). We show that our deliberative agent achieves greater than 70% improvement over reactive baselines on the challenging TEACh benchmark. Moreover, the underlying reasoning and planning processes, together with our modular framework, offer impressive transparency and explainability to the behaviors of the agent. This enables an in-depth understanding of the agent's capabilities, which shed light on challenges and opportunities for future embodied agents for instruction following. The code is available at https://github.com/sled-group/DANLI.
翻译:近些年来,在能够按照人类语言指示执行任务的体现的AI代理商方面,已经开展了越来越多的工作;然而,大多数这些代理商都是被动的,这意味着它们只是学习和模仿培训数据中遇到的行为。这些被动代理商不足以完成长视距复杂的任务。为解决这一限制,我们提议设立一个神经-精神分裂审议代理商,该代理商在遵循语言指示的同时,根据其过去的经验(例如自然语言和自我中心观点)获得的神经和象征性表现(例如自然语言和自我中心观点)积极应用推理和规划。我们表明,我们的议事代理商比挑战性TEACh基准的被动基线改进了70%以上。此外,基本推理和规划流程,加上我们的模块框架,为代理人的行为提供了令人印象深刻的透明度和可解释性。这使人们能够深入了解该代理商的能力,从而了解未来体现的代理商在接下来的教学方面的挑战和机会。该代码可在 https://github.com/sled-group/DANLI查阅。