The domain of Embodied AI has recently witnessed substantial progress, particularly in navigating agents within their environments. These early successes have laid the building blocks for the community to tackle tasks that require agents to actively interact with objects in their environment. Object manipulation is an established research domain within the robotics community and poses several challenges including manipulator motion, grasping and long-horizon planning, particularly when dealing with oft-overlooked practical setups involving visually rich and complex scenes, manipulation using mobile agents (as opposed to tabletop manipulation), and generalization to unseen environments and objects. We propose a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and present a new challenge to the Embodied AI community known as ArmPointNav. This task extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance, manipulating objects in the presence of occlusion, and multi-object manipulation that necessitates long term planning. Popular learning paradigms that are successful on PointNav challenges show promise, but leave a large room for improvement.
翻译:物体操纵是机器人界内部一个固定的研究领域,它提出了几项挑战,包括操纵运动、掌握和长视横向规划,特别是在处理视觉丰富和复杂场景、使用移动剂操作(而不是桌面操纵)和对看不见的环境和物体进行一般化处理时。我们提议了一个基于物理学、视觉丰富的AI2-THOR框架的物体操纵框架,并对称为ArmPointNav的Embodi AI社区提出了新的挑战。这项任务将流行点导航任务扩大到物体操纵,并提出了新的挑战,包括3D避免障碍、在隐蔽和复杂场景面前操纵物体以及需要长期规划的多点操纵。在PointNav上成功的大众学习范例显示了希望,但留下了很大的改进空间。