Modelling the behaviours of other agents is essential for understanding how agents interact and making effective decisions. Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution. To eliminate this assumption, we extract representations from the local information of the controlled agent using encoder-decoder architectures. Using the observations and actions of the modelled agents during training, our models learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent. The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning; thus, during execution, the policy does not require access to other agents' information. We provide a comprehensive evaluation and ablations studies in cooperative, competitive and mixed multi-agent environments, showing that our method achieves higher returns than baseline methods which do not use the learned representations.
翻译:模拟其他代理商的行为对于了解代理商如何相互作用和作出有效决定至关重要; 现有的代理商模拟方法通常假定了解当地观察和在执行期间模拟代理商所选择的行动; 为了消除这一假设,我们利用编码器-编码器结构从受控代理商的当地信息中提取说明; 利用模拟代理商在培训期间的观察和行动,我们的模型学会仅仅根据受控代理商的当地观察来提取关于示范代理商的表示; 使用这些说明来增强受控代理商通过深层加固学习所培训的决策政策; 因此,在执行期间,该政策不需要获取其他代理商的信息; 我们在合作、竞争和混合的多代理商环境中提供综合评价和推理研究,表明我们的方法比不使用所了解的表述的基线方法获得更高的回报率。