Modelling the behaviours of other agents is essential for understanding how agents interact and making effective decisions. Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution. To eliminate this assumption, we extract representations from the local information of the controlled agent using encoder-decoder architectures. Using the observations and actions of the modelled agents during training, our models learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent. The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning; thus, during execution, the policy does not require access to other agents' information. We provide a comprehensive evaluation and ablations studies in cooperative, competitive and mixed multi-agent environments, showing that our method achieves significantly higher returns than baseline methods which do not use the learned representations.
翻译:模拟其他代理人的行为对于了解代理人如何相互作用和作出有效决定至关重要; 现有的代理模拟方法通常假定在执行期间对当地观察和模拟代理人所选择的行动的了解; 为了消除这一假设,我们利用编码器-编码器结构从受控代理人的当地资料中提取说明; 利用训练期间模拟代理人的观察和行动,我们的模型学会只根据受控代理人的当地观察来提取关于示范代理人的表示; 使用这些说明来增强受控代理人通过深层加固学习所培训的决策政策; 因此,在执行期间,该政策不需要获取其他代理人的资料; 我们在合作、竞争和混合多代理人环境中提供综合评价和推理研究,表明我们的方法比不使用所了解的表述的基线方法要高得多。