We present a general optimization framework for emergent belief-state representation without any supervision. We employed the common configuration of multiagent reinforcement learning and communication to improve exploration coverage over an environment by leveraging the knowledge of each agent. In this paper, we obtained that recurrent neural nets (RNNs) with shared weights are highly biased in partially observable environments because of their noncooperativity. To address this, we designated an unbiased version of self-play via mechanism design, also known as reverse game theory, to clarify unbiased knowledge at the Bayesian Nash equilibrium. The key idea is to add imaginary rewards using the peer prediction mechanism, i.e., a mechanism for mutually criticizing information in a decentralized environment. Numerical analyses, including StarCraft exploration tasks with up to 20 agents and off-the-shelf RNNs, demonstrate the state-of-the-art performance.
翻译:我们为新兴的信仰国家代表提供了一个总体优化框架,没有受到任何监督。我们采用了多试剂强化学习和通信的共同配置,通过利用每个代理人的知识来改善对环境的探索范围。在本文中,我们获得的是,具有共同重量的经常性神经网(RNN)由于不合作性,在部分可观测环境中高度偏颇。为了解决这个问题,我们指定了一个通过机制设计(也称为逆向游戏理论)来无偏见地自我游戏的版本,以澄清巴伊西亚纳什平衡的不偏倚知识。关键的想法是利用同行预测机制,即分散环境中相互批评信息的机制,添加想象中的奖励。数字分析,包括有多达20个代理人和超贴贴贴的RNNS的StarCraft勘探任务,展示了最新表现。