Mutual information maximization provides an appealing formalism for learning representations of data. In the context of reinforcement learning (RL), such representations can accelerate learning by discarding irrelevant and redundant information, while retaining the information necessary for control. Much of the prior work on these methods has addressed the practical difficulties of estimating mutual information from samples of high-dimensional observations, while comparatively less is understood about which mutual information objectives yield representations that are sufficient for RL from a theoretical perspective. In this paper, we formalize the sufficiency of a state representation for learning and representing the optimal policy, and study several popular mutual-information based objectives through this lens. Surprisingly, we find that two of these objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP. We corroborate our theoretical results with empirical experiments on a simulated game environment with visual observations.
翻译:相互信息最大化为学习数据表述提供了一种颇具吸引力的正式形式;在强化学习方面,这种表述可以通过放弃无关和多余的信息而加快学习,同时保留控制所需的信息;以前关于这些方法的许多工作已经解决了从高维观测样本中估计相互信息的实际困难,而相对较少理解的是,从理论角度,对哪些相互信息目标的表述能够产生对于学习和代表最佳政策足够足够的数据表述;在本文件中,我们正式确定了国家代表性是否足以学习和代表最佳政策,并通过这一视角研究若干基于共同信息的公众目标;令人惊讶的是,我们发现,鉴于对多维观测模型结构的简单和共同假设,其中两个目标可能产生不足的表述。 我们用视觉观测模拟游戏环境的经验实验证实了我们的理论结果。