We converted the recently developed BabyAI grid world platform to a sender/receiver setup in order to test the hypothesis that established deep reinforcement learning techniques are sufficient to incentivize the emergence of a grounded discrete communication protocol between generalized agents. This is in contrast to previous experiments that employed straight-through estimation or specialized inductive biases. Our results show that these can indeed be avoided, by instead providing proper environmental incentives. Moreover, they show that a longer interval between communications incentivized more abstract semantics. In some cases, the communicating agents adapted to new environments more quickly than a monolithic agent, showcasing the potential of emergent communication for transfer learning and generalization in general.
翻译:我们将最近开发的 BabyAI 网格世界平台转换成发件人/接收人设置,以测试建立深层强化学习技术的假设,足以激励普遍化物剂之间出现有根有根的离散通信协议。这与以往采用直线估计或专门引导偏差的实验形成鲜明对比。我们的结果表明,通过提供适当的环境激励措施,这些的确可以避免。此外,这些结果表明,通信之间的间隔更长了激励性更抽象的语义学。在某些情况下,通信代理器比单一媒介更快地适应新环境,展示了新兴通信在一般传输学习和普及方面的潜力。