Object packing by autonomous robots is an im-portant challenge in warehouses and logistics industry. Most conventional data-driven packing planning approaches focus on regular cuboid packing, which are usually heuristic and limit the practical use in realistic applications with everyday objects. In this paper, we propose a deep hierarchical reinforcement learning approach to simultaneously plan packing sequence and placement for irregular object packing. Specifically, the top manager network infers packing sequence from six principal view heightmaps of all objects, and then the bottom worker network receives heightmaps of the next object to predict the placement position and orientation. The two networks are trained hierarchically in a self-supervised Q-Learning framework, where the rewards are provided by the packing results based on the top height , object volume and placement stability in the box. The framework repeats sequence and placement planning iteratively until all objects have been packed into the box or no space is remained for unpacked items. We compare our approach with existing robotic packing methods for irregular objects in a physics simulator. Experiments show that our approach can pack more objects with less time cost than the state-of-the-art packing methods of irregular objects. We also implement our packing plan with a robotic manipulator to show the generalization ability in the real world.
翻译:自动机器人的物体包装是仓库和物流行业的一个不切实际的挑战。大多数常规数据驱动的包装规划方法都侧重于常规幼崽包装,这些包装方法通常是杂乱无章的,限制了日常物体在现实应用中的实际用途。在本文件中,我们建议采用一个深层次强化学习方法,同时计划包装序列和安放非正常物体包装。具体地说,高层管理网络从所有物体的6个主视高度图中推断包装序列,然后底部工人网络获得下一个物体的高度分布图,以预测放置位置和方向。两个网络在自我监督的Q-学习框架中接受等级训练,其奖赏由基于顶部高度、对象体积和容器内安放稳定性的包装结果提供。这个框架重复顺序并反复规划,直到所有物体被打入盒子或无非包装物品的空间。我们将我们的方法与物理模拟器中用于非常规物体的现有机器人包装方法进行比较。实验表明,我们的方法可以以比状态控制的Q-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L