Automated Guided Vehicles (AGVs) have been widely used for material handling in flexible shop floors. Each product requires various raw materials to complete the assembly in production process. AGVs are used to realize the automatic handling of raw materials in different locations. Efficient AGVs task allocation strategy can reduce transportation costs and improve distribution efficiency. However, the traditional centralized approaches make high demands on the control center's computing power and real-time capability. In this paper, we present decentralized solutions to achieve flexible and self-organized AGVs task allocation. In particular, we propose two improved multi-agent reinforcement learning algorithms, MADDPG-IPF (Information Potential Field) and BiCNet-IPF, to realize the coordination among AGVs adapting to different scenarios. To address the reward-sparsity issue, we propose a reward shaping strategy based on information potential field, which provides stepwise rewards and implicitly guides the AGVs to different material targets. We conduct experiments under different settings (3 AGVs and 6 AGVs), and the experiment results indicate that, compared with baseline methods, our work obtains up to 47\% task response improvement and 22\% training iterations reduction.
翻译:自动制导器(AGV)已被广泛用于灵活商店楼层的材料处理。每种产品都需要各种原材料才能完成生产过程中的组装。AGV被用于在不同地点实现原材料的自动处理。有效的AGV任务分配战略可以降低运输成本,提高分配效率。但是,传统的集中化方法对控制中心的计算能力和实时能力提出了很高的要求。在本文中,我们提出了实现灵活和自组织AGV任务分配的分散化解决方案。特别是,我们建议了两种改进的多试剂强化学习算法:MADDPG-IPF(信息潜力领域)和BiCNet-IPF,以实现AGV之间适应不同情景的协调。为了解决奖励差异问题,我们建议了一种基于信息潜力领域的奖励制式战略,它提供逐步的奖励,并隐含地指导AGVs实现不同的物质目标。我们在不同环境下进行实验(3 AGVs和6 AGVs)。我们提出的试验结果表明,与基线方法相比,我们的工作得到了多达47 ⁇ 任务反应改进和22 ⁇ 培训的减少。