In multi-agent deep reinforcement learning, extracting sufficient and compact information of other agents is critical to attain efficient convergence and scalability of an algorithm. In canonical frameworks, distilling of such information is often done in an implicit and uninterpretable manner, or explicitly with cost functions not able to reflect the relationship between information compression and utility in representation. In this paper, we present Information-Bottleneck-based Other agents' behavior Representation learning for Multi-agent reinforcement learning (IBORM) to explicitly seek low-dimensional mapping encoder through which a compact and informative representation relevant to other agents' behaviors is established. IBORM leverages the information bottleneck principle to compress observation information, while retaining sufficient information relevant to other agents' behaviors used for cooperation decision. Empirical results have demonstrated that IBORM delivers the fastest convergence rate and the best performance of the learned policies, as compared with implicit behavior representation learning and explicit behavior representation learning without explicitly considering information compression and utility.
翻译:在多试剂深层强化学习中,获取其他代理人的充足和紧凑信息对于实现算法的有效趋同和伸缩至关重要。在计算机框架中,这种信息的提炼往往以隐含和不可解释的方式进行,或者明确使用无法反映信息压缩和实用性之间关系的成本功能。在本文中,我们介绍了基于信息-瓶颈的其他方面代理人的行为表现学习,以用于多试剂强化学习(IBORM),以明确寻找低维绘图编码器,通过该编码器建立一个与其他代理人的行为相关的紧凑和信息化代表。IBORM利用信息瓶颈原则压缩观察信息,同时保留与用于合作决定的其他代理人的行为有关的充分信息。经验性结果表明,IBORM提供了最快的趋同率和所学政策的最佳表现,与隐含的行为表现学习相比,没有明确考虑信息压缩和实用性。