Cooperative multi-robot tasks can benefit from heterogeneity in the robots' physical and behavioral traits. In spite of this, traditional Multi-Agent Reinforcement Learning (MARL) frameworks lack the ability to explicitly accommodate policy heterogeneity, and typically constrain agents to share neural network parameters. This enforced homogeneity limits application in cases where the tasks benefit from heterogeneous behaviors. In this paper, we crystallize the role of heterogeneity in MARL policies. Towards this end, we introduce Heterogeneous Graph Neural Network Proximal Policy Optimization (HetGPPO), a paradigm for training heterogeneous MARL policies that leverages a Graph Neural Network for differentiable inter-agent communication. HetGPPO allows communicating agents to learn heterogeneous behaviors while enabling fully decentralized training in partially observable environments. We complement this with a taxonomical overview that exposes more heterogeneity classes than previously identified. To motivate the need for our model, we present a characterization of techniques that homogeneous models can leverage to emulate heterogeneous behavior, and show how this "apparent heterogeneity" is brittle in real-world conditions. Through simulations and real-world experiments, we show that: (i) when homogeneous methods fail due to strong heterogeneous requirements, HetGPPO succeeds, and, (ii) when homogeneous methods are able to learn apparently heterogeneous behaviors, HetGPPO achieves higher resilience to both training and deployment noise.
翻译:多机器人合作任务可受益于机器人物理和行为特征的异质性。 尽管如此,传统的多机构强化学习(MARL)框架缺乏明确适应政策异质性的能力,并通常限制代理商共享神经网络参数。在任务受益于不同行为的情况下,强制实行同质性限制应用。在本文中,我们明确了异质性在MARL政策中的作用。为此,我们引入了超异性图形神经网络优化政策模型(HetGPO),这是培训混杂的MARL政策的范例,它利用图形神经网络进行不同部门间通信。HetGPO允许代理商学习异性行为,同时允许在部分可观察环境中进行完全分散的培训。我们用一个分类学的概览来补充这一点,它暴露出比以前确定的更异性等级。为了激发我们的模型的需要,我们展示了一种对同质模型能够利用的技术的特征来模仿多式政策优化(HetGPOPOPO),并展示了“直观的内分流性行为, 当真实性实验方法显示真实性、不透明性实验时,我们如何在真实性世界中学习正确的方法时,我们如何学习。