部分可观测环境多物体重新安排规划的有效基线 (Effective Baselines for Multiple Object Rearrangement Planning in Partially Observable Mapped Environments)

Many real-world tasks, from house-cleaning to cooking, can be formulated as multi-object rearrangement problems -- where an agent needs to get specific objects into appropriate goal states. For such problems, we focus on the setting that assumes a pre-specified goal state, availability of perfect manipulation and object recognition capabilities, and a static map of the environment but unknown initial location of objects to be rearranged. Our goal is to enable home-assistive intelligent agents to efficiently plan for rearrangement under such partial observability. This requires efficient trade-offs between exploration of the environment and planning for rearrangement, which is challenging because of long-horizon nature of the problem. To make progress on this problem, we first analyze the effects of various factors such as number of objects and receptacles, agent carrying capacity, environment layouts etc. on exploration and planning for rearrangement using classical methods. We then investigate both monolithic and modular deep reinforcement learning (DRL) methods for planning in our setting. We find that monolithic DRL methods do not succeed at long-horizon planning needed for multi-object rearrangement. Instead, modular greedy approaches surprisingly perform reasonably well and emerge as competitive baselines for planning with partial observability in multi-object rearrangement problems. We also show that our greedy modular agents are empirically optimal when the objects that need to be rearranged are uniformly distributed in the environment -- thereby contributing baselines with strong performance for future work on multi-object rearrangement planning in partially observable settings.

翻译：许多现实世界的任务,从清洁房屋到烹饪,可以被设计成多目标重新排列问题 -- -- 代理人需要将特定对象纳入适当的目标状态。对于这些问题,我们侧重于假设预先确定的目标状态、是否具备完美的操纵和物体识别能力,以及环境静态地图,但将重新排列的物体最初位置不明。我们的目标是使家庭辅助智能剂能够在这种局部可观察性下有效规划重新排列。这需要在环境探索和重新安排规划之间进行有效的权衡,因为这一问题具有长期高度的性质,这是具有挑战性的。为了在这一问题上取得进展,我们首先分析各种因素的影响,如使用传统方法进行物体和贮器的数量、代理人携带能力、环境布局等等的勘探和规划。我们然后调查在我们的设置中进行规划的单轨和模块深度强化学习(DRL)方法。我们发现,单轨DRL方法不能在长期同步的规划中取得成功,因为多偏差的精确度,因为问题的特性具有一定的精确性;为了在未来的精确性规划,我们首先分析各种因素,我们需要以稳性地进行弹性的深度的后期排序。