We consider task allocation for multi-object transport using a multi-robot system, in which each robot selects one object among multiple objects with different and unknown weights. The existing centralized methods assume the number of robots and tasks to be fixed, which is inapplicable to scenarios that differ from the learning environment. Meanwhile, the existing distributed methods limit the minimum number of robots and tasks to a constant value, making them applicable to various numbers of robots and tasks. However, they cannot transport an object whose weight exceeds the load capacity of robots observing the object. To make it applicable to various numbers of robots and objects with different and unknown weights, we propose a framework using multi-agent reinforcement learning for task allocation. First, we introduce a structured policy model consisting of 1) predesigned dynamic task priorities with global communication and 2) a neural network-based distributed policy model that determines the timing for coordination. The distributed policy builds consensus on the high-priority object under local observations and selects cooperative or independent actions. Then, the policy is optimized by multi-agent reinforcement learning through trial and error. This structured policy of local learning and global communication makes our framework applicable to various numbers of robots and objects with different and unknown weights, as demonstrated by numerical simulations.
翻译:我们考虑使用多机器人系统为多子传输分配任务,让每个机器人在具有不同和未知重量的多个物体中选择一个物体。现有中央集权方法假定了需要固定的机器人和任务的数量和任务的数量,这不适用于与学习环境不同的情景。同时,现有分布式方法将机器人和任务的最低数量和任务数量限制在不变值,使其适用于各种数目的机器人和任务。然而,它们不能将重量超过观察物体的机器人负荷能力的物体运输出去。为了将其适用于不同和未知重量的不同数目的机器人和物体,我们提议了一个框架,利用多剂强化学习来分配任务。首先,我们引入了一个结构化的政策模式,包括:(1) 预先设计与全球通信有关的动态任务优先事项,(2) 基于神经网络的分布式政策模式,确定协调的时间。分布式政策在当地观察下就高优先对象达成共识,并选择合作或独立的行动。然后,通过试验和错误学习多剂强化多剂,优化政策。这种结构化的学习和全球通信政策,使我们的框架能够应用各种数字和数字。