We present a general framework for evolutionary learning to emergent unbiased state representation without any supervision. Evolutionary frameworks such as self-play converge to bad local optima in case of multi-agent reinforcement learning in non-cooperative partially observable environments with communication due to information asymmetry. Our proposed framework is a simple modification of self-play inspired by mechanism design, also known as {\em reverse game theory}, to elicit truthful signals and make the agents cooperative. The key idea is to add imaginary rewards using the peer prediction method, i.e., a mechanism for evaluating the validity of information exchanged between agents in a decentralized environment. Numerical experiments with predator prey, traffic junction and StarCraft tasks demonstrate that the state-of-the-art performance of our framework.
翻译:我们提出了一个在没有任何监督的情况下进行进化学习以创造出公正的国家代表性的总体框架。进化框架,如自我游戏,在由于信息不对称而进行交流的不合作的半可观测环境中进行多试剂强化学习,如果多试剂强化学习,在多试剂强化学习,因为信息不对称。我们提议的框架是简单的修改由机制设计所启发的自我游戏,也称为“反向游戏理论 ”, 以获得真实信号并使代理人合作。 关键的想法是使用同行预测方法添加假想的奖励,即一种评估分散环境中代理人之间交流信息有效性的机制。 与捕食者、交通连接点和StarCraft任务进行的数字实验表明我们框架的最新表现。