Biological agents learn and act intelligently in spite of a highly limited capacity to process and store information. Many real-world problems involve continuous control, which represents a difficult task for artificial intelligence agents. In this paper we explore the potential learning advantages a natural constraint on information flow might confer onto artificial agents in continuous control tasks. We focus on the model-free reinforcement learning (RL) setting and formalize our approach in terms of an information-theoretic constraint on the complexity of learned policies. We show that our approach emerges in a principled fashion from the application of rate-distortion theory. We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm and situate it within a broader family of RL algorithms such as the Soft Actor Critic (SAC) and Mutual Information Reinforcement Learning (MIRL) algorithm. Our experiments using continuous control tasks show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments. This is achieved in the CLAC model while displaying the high sample efficiency of similar methods.
翻译:尽管处理和储存信息的能力极为有限,生物剂仍然明智地学习和采取行动。许多现实世界问题涉及持续控制,这是人工智能剂的一项艰巨任务。我们在本文件中探讨对信息流动的自然限制可能给人工剂带来的潜在学习优势可能会在连续控制任务中赋予人工剂。我们侧重于无模型强化学习(RL)的设置和正式确定我们的方法,即对所学政策的复杂性有一个信息理论限制。我们表明,我们的方法是从应用率扭曲理论中以有原则的方式出现的。我们实施了一种新型的能力限制作用者-Critic(CLAC)算法,并将其置于一个范围更广的RL算法体系中,如Soft Actor Critic(SAC)和相互信息强化学习(MIRL)算法。我们利用连续控制任务进行的实验表明,与替代方法相比,CLAC在培训和修改测试环境之间的普遍化得到了改进。这在CLAC模型中得到了实现,同时展示了类似方法的高样本效率。