We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available -- namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to practical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.
翻译:我们考虑的是通过Markov决定过程轨迹来传播外部信息的问题。 我们称之为Markov编码游戏(MCG),这种设置将源码编码和一大批优惠游戏普遍化。 MCG还孤立了一个在分散控制环境中很重要的问题,在这种分散控制环境中,没有廉价的谈话手段,也就是说,它们需要平衡通信与相关通信成本之间的平衡。 我们根据最大增压学习和我们称之为MEME的最小增压连接,为MCG提供了一种基于理论基础的方法。由于最近在最小增压组合的近似算法方面的突破,MEME不仅仅是一种理论算法,而且可以应用于实用环境。 我们生动地表明,MEME能够超越小的MG的强大基线,而且ME能够在极为庞大的MCG上取得强大的性能。 后一点是,我们证明ME能够通过卡托尔和彭的轨迹谱轨迹,不遗漏地传递二进图象,同时实现最大或最接近的噪音的出现,同时在最接近的状态下进行。