ARNOLD：在真实 3D 场景中基于语言相关的连续状态任务学习的基准 (ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes)

Ran Gong,Jiangyong Huang,Yizhou Zhao,Haoran Geng,Xiaofeng Gao,Qingyang Wu,Wensi Ai,Ziheng Zhou,Demetri Terzopoulos,Song-Chun Zhu,Baoxiong Jia,Siyuan Huang

from arxiv, The first two authors contributed equally; 20 pages; 17 figures; project availalbe: https://arnold-benchmark.github.io/

Understanding the continuous states of objects is essential for task learning and planning in the real world. However, most existing task learning benchmarks assume discrete(e.g., binary) object goal states, which poses challenges for the learning of complex tasks and transferring learned policy from simulated environments to the real world. Furthermore, state discretization limits a robot's ability to follow human instructions based on the grounding of actions and states. To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals. To promote language-instructed learning, we provide expert demonstrations with template-generated language descriptions. We assess task performance by utilizing the latest language-conditioned policy learning models. Our results indicate that current models for language-conditioned manipulations continue to experience significant challenges in novel goal-state generalizations, scene generalizations, and object generalizations. These findings highlight the need to develop new algorithms that address this gap and underscore the potential for further research in this area. See our project page at: https://arnold-benchmark.github.io

翻译：理解物体的连续状态对于在现实世界中的任务学习和规划至关重要。然而，大多数现有的任务学习基准假定物体目标状态是离散的（例如二进制），这给学习复杂任务和从仿真环境转移到现实世界的学习策略带来了挑战。此外，状态离散化限制了机器人根据动作和状态的接地执行人类指令的能力。为了应对这些挑战，我们提出了 ARNOLD，一个基准，用于评估在真实 3D 场景中基于语言相关的连续状态任务学习。ARNOLD 由 8 个语言相关的任务组成，这些任务涉及理解物体状态并学习连续目标的策略。为促进基于语言的学习，我们提供了基于模板生成语言描述的专家演示。我们通过使用最新的基于语言条件的策略学习模型来评估任务性能。我们的结果表明，目前的语言条件操纵模型在新颖目标状态泛化、场景泛化和对象泛化方面仍然面临重大挑战。这些发现凸显了需要开发新算法以填补这一领域的差距，并强调了进一步研究的潜力。请参见我们的项目页面：https://arnold-benchmark.github.io