We tackle the Online 3D Bin Packing Problem, a challenging yet practically useful variant of the classical Bin Packing Problem. In this problem, the items are delivered to the agent without informing the full sequence information. Agent must directly pack these items into the target bin stably without changing their arrival order, and no further adjustment is permitted. Online 3D-BPP can be naturally formulated as Markov Decision Process (MDP). We adopt deep reinforcement learning, in particular, the on-policy actor-critic framework, to solve this MDP with constrained action space. To learn a practically feasible packing policy, we propose three critical designs. First, we propose an online analysis of packing stability based on a novel stacking tree. It attains a high analysis accuracy while reducing the computational complexity from $O(N^2)$ to $O(N \log N)$, making it especially suited for RL training. Second, we propose a decoupled packing policy learning for different dimensions of placement which enables high-resolution spatial discretization and hence high packing precision. Third, we introduce a reward function that dictates the robot to place items in a far-to-near order and therefore simplifies the collision avoidance in movement planning of the robotic arm. Furthermore, we provide a comprehensive discussion on several key implemental issues. The extensive evaluation demonstrates that our learned policy outperforms the state-of-the-art methods significantly and is practically usable for real-world applications.
翻译:我们处理在线 3D Bin 包装问题,这是一个挑战性但实际有用的典型 Bin 包装问题的变体。 在这个问题中, 物品在没有告知完整序列信息的情况下交付给代理商。 代理商必须在不改变到货顺序的情况下, 将这些项目直接包装在目标箱中, 并且不允许进一步调整。 在线 3D- BPP 可以自然地制定成Markov 决策程序( MDP ) 。 我们采用深度强化学习, 特别是政策上的行为者- 批评性框架, 以便在行动空间有限的情况下解决这个 MDP 。 为了学习一种实际可行的包装政策, 我们提出了三种关键设计。 首先, 我们提出一个基于小堆叠树的包装稳定性的在线分析。 它在将计算复杂性从$( N%2) 降低到 $( n log N ) 的情况下, 直接将这些物品放到目标箱中, 并且特别适合 Mark 培训 。 其次, 我们提出一个分解的包装政策学习政策学习方法, 使高分辨率空间离心, 因而包装精准。 第三, 我们提出一个奖励功能功能功能, 要将机器人放置一个真实的物品放在 一个远端的固定的移动中, 。