This paper seeks to tackle the bin packing problem (BPP) through a learning perspective. Building on self-attention-based encoding and deep reinforcement learning algorithms, we propose a new end-to-end learning model for this task of interest. By decomposing the combinatorial action space, as well as utilizing a new training technique denoted as prioritized oversampling, which is a general scheme to speed up on-policy learning, we achieve state-of-the-art performance in a range of experimental settings. Moreover, although the proposed approach attend2pack targets offline-BPP, we strip our method down to the strict online-BPP setting where it is also able to achieve state-of-the-art performance. With a set of ablation studies as well as comparisons against a range of previous works, we hope to offer as a valid baseline approach to this field of study.
翻译:本文试图从学习的角度解决垃圾包装问题。 基于基于自我注意的编码和深层强化学习算法, 我们为这一感兴趣的任务提出了一个新的端到端学习模式。 通过分解组合行动空间,以及使用被称为优先抽样的新培训技术,这是加速政策学习的一般计划,我们在一系列实验环境中实现了最先进的业绩。 此外,尽管拟议方法包含2包目标离线BPP, 我们将我们的方法分解为严格的在线-端到端学习模式,它也可以在其中实现最新业绩。 通过一套缩略图研究以及比照以往的一系列工作,我们希望作为这一研究领域的一个有效的基线方法。