受斯蒂格兼蚁群和蚁群启发的可缩放、分散式多机构加强学习方法 (Scalable, Decentralized Multi-Agent Reinforcement Learning Methods Inspired by Stigmergy and Ant Colonies)

Bolstering multi-agent learning algorithms to tackle complex coordination and control tasks has been a long-standing challenge of on-going research. Numerous methods have been proposed to help reduce the effects of non-stationarity and unscalability. In this work, we investigate a novel approach to decentralized multi-agent learning and planning that attempts to address these two challenges. In particular, this method is inspired by the cohesion, coordination, and behavior of ant colonies. As a result, these algorithms are designed to be naturally scalable to systems with numerous agents. While no optimality is guaranteed, the method is intended to work well in practice and scale better in efficacy with the number of agents present than others. The approach combines single-agent RL and an ant-colony-inspired decentralized, stigmergic algorithm for multi-agent path planning and environment modification. Specifically, we apply this algorithm in a setting where agents must navigate to a goal location, learning to push rectangular boxes into holes to yield new traversable pathways. It is shown that while the approach yields promising success in this particular environment, it may not be as easily generalized to others. The algorithm designed is notably scalable to numerous agents but is limited in its performance due to its relatively simplistic, rule-based approach. Furthermore, the composability of RL-trained policies is called into question, where, while policies are successful in their training environments, applying trained policies to a larger-scale, multi-agent framework results in unpredictable behavior.

翻译：强化多试剂学习算法,以应对复杂的协调和控制任务,这是长期不断研究的一项长期挑战。提出了许多方法,帮助减少非常态和不可扩缩的影响。在这项工作中,我们调查了分散多试剂学习和规划的新办法,以试图应对这两项挑战。特别是,这种方法的灵感来自蚂蚁群的凝聚、协调和行为。因此,这些算法的设计自然可以扩缩到拥有众多代理物的系统。虽然没有保证最佳性,但该方法的用意是在实践上和规模上与现有代理人数目相比效果更好。该方法将减少非常态和不可扩缩的影响。我们研究了分散式多试剂学习和规划的多试剂学习和规划,试图解决这两个挑战。特别是,这种方法的灵感来自蚂蚁群的凝聚体的凝聚力、协调和行为。因此,这些算法的设计是自然可以将矩形箱推入洞洞穴,以产生新的可变性的路径。它表明,虽然这一方法在特定环境中取得了有希望的成功,但相对于其他代理人而言,它可能不是一个容易被广泛化的政策,而将其应用到一个比较普通化的政策适用于其他的常规。它的设计是被设计。它被设计成一个容易地,其成功的环境。它被称作是用来用来适用于一个容易的。在较普通化的。在较普通化的政策是被称作的。它被称作的。它被设计成一种叫做的。它被设计成一种叫做的。它被称作的。它被称作的。它被设计成一种叫做的。它被称作的。它是用来适用于一种叫做的。它。它。它是一种一种叫做的。它被被被称作的。它的。它是用来用来适用于一种叫做的。它是用来在一种叫做的。它是用来适用于一种叫做一种叫做一种叫做一种叫做一种叫做的。它。它是一种叫做一种叫做的。它是一种一种一种一种叫做一种叫做一种叫做一种叫做一种叫做一种叫做一种叫做一种叫做一种叫做一种叫做一种叫式式式式式式的。它是一种叫做一种叫做一种叫做一种叫法程。它。它。它。它是一种叫做的。在一种叫做的。在一种叫做的。在一种叫式的。它。它的。它是一种叫式的。它。它。它。它。它。它。它的