TarGF:物体重新排列学习目标渐变字段 (TarGF: Learning Target Gradient Field for Object Rearrangement)

Object Rearrangement is to move objects from an initial state to a goal state. Here, we focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution without explicit goal specification. However, it remains challenging for AI agents, as it is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. Hence, it is infeasible to directly employ reinforcement learning or imitation learning algorithms to address the task. This paper aims to search for a policy only with a set of examples from a target distribution instead of a handcrafted reward function. We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution. For object rearrangement, the TarGF can be used in two ways: 1) For model-based planning, we can cast the target gradient into a reference control and output actions with a distributed path planner; 2) For model-free reinforcement learning, the TarGF is not only used for estimating the likelihood-change as a reward but also provides suggested actions in residual policy learning. Experimental results in ball rearrangement and room rearrangement demonstrate that our method significantly outperforms the state-of-the-art methods in the quality of the terminal state, the efficiency of the control process, and scalability. The code and demo videos are on our project website.

翻译：对象重新排列是将对象从初始状态移到目标状态。这里, 我们的焦点是更实际的物体重新排列设置, 即将对象从打乱的布局重新排列为规范性的目标分布, 没有明确的目标规格。但是, 对AI 代理来说, 仍然具有挑战性, 因为很难描述奖赏工程或收集专家轨迹的目标分配( 目标规格) 作为演示。因此, 直接使用强化学习或仿造学习算法来完成任务是行不通的。本文的目的是只寻找一组目标重新排列的样板, 即将对象从打乱的布局重新排列为规范性的目标分布, 也就是将对象重新排列的目标分配( 目标规格) 。因此, 无法直接使用基于模型的规划, 我们可以将目标梯度引入一个参考控制和输出动作, 并且使用一套来自目标分布式的图像配置的图像转换功能。我们使用得分比对比对目标分布式的图像分配的图像分配功能, 我们使用得分比对对象比对目标配置目标分布式的视频功能进行重新排列。我们使用的目标匹配目标比对目标比对目标配置目标配置目标配置目标配置目标配置目标配置目标配置目标配置目标配置目标配置, 域域域域域域域域域域域域域域域分布图,,, 并用于对目标定位校标定值校正值校正值校正值校程校程校程校程校程校程校程校程图,, 的校程校程校程校程计算法, 的校正程法学习算法。