We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems. The goal is to consecutively move a sequence of 3D objects with arbitrary shapes into a designated container with only partial observations of the object sequence. Meanwhile, we take physical realizability into account, involving physics dynamics and constraints of a placement. The packing policy should understand the 3D geometry of the object to be packed and make effective decisions to accommodate it in the container in a physically realizable way. We propose a Reinforcement Learning (RL) pipeline to learn the policy. The complex irregular geometry and imperfect object placement together lead to huge solution space. Direct training in such space is prohibitively data intensive. We instead propose a theoretically-provable method for candidate action generation to reduce the action space of RL and the learning burden. A parameterized policy is then learned to select the best placement from the candidates. Equipped with an efficient method of asynchronous RL acceleration and a data preparation process of simulation-ready training sequences, a mature packing policy can be trained in a physics-based environment within 48 hours. Through extensive evaluation on a variety of real-life shape datasets and comparisons with state-of-the-art baselines, we demonstrate that our method outperforms the best-performing baseline on all datasets by at least 12.8% in terms of packing utility.
翻译:我们研究的是为非正常 3D 形状学习在线包装技能的问题,这可以说是垃圾包装问题最具挑战性的设置。目标是将任意形状的三维对象序列相继移动成一个指定的容器,只对天体序列进行部分观测。与此同时,我们考虑到物理的可变性,包括物理动态和放置限制。包装政策应当了解要包装的物体的三维几何学,并做出有效的决定,以实际可变的方式将其装在容器中。我们建议用强化学习(RL)管道来学习政策。复杂的不规则几何和不完善的物体放置一起导致巨大的解决方案空间。在这种空间的直接培训是令人无法接受的数据密集的。我们建议采用理论上可行的方法来生成候选动作动作,以减少RL的行动空间和学习负担。然后,一个参数化的政策应该从候选人中选择最佳的放置位置。我们建议采用一个不同步的RL加速率的有效方法,一个模拟的训练序列的数据编制过程。一个成熟的包装政策可以在基于物理基础环境的最不完善的环境中培训,在48小时内进行最起码的比较。我们用最接近的基线的方式,用最接近的基线的方式展示了12年的数据。