We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. In this approach, which we term MDL-control (MDL-C), the agent learns the common structure among the tasks with which it is faced and then distills it into a simpler representation which facilitates faster convergence and generalization to new tasks. In doing so, MDL-C naturally balances adaptation to each task with epistemic uncertainty about the task distribution. We motivate MDL-C via formal connections between the MDL principle and Bayesian inference, derive theoretical performance guarantees, and demonstrate MDL-C's empirical effectiveness on both discrete and high-dimensional continuous control tasks. %Empirically, this framework is used to modify existing policy optimization approaches and improves their multitask performance in both discrete and high-dimensional continuous control problems.
翻译:我们根据最低描述长度(MDL)原则为多任务强化学习提出了一个新的框架。在这个方法中,我们称之为MDL-控制(MDL-C),该代理人学习了它所面临的任务的共同结构,然后将其提炼成一个更简单的表述方式,促进更快的趋同和概括到新的任务。在这样做时,MDL-C自然平衡了适应每项任务与任务分配的共性不确定性。我们通过MDL原则与Bayesian推论之间的正式联系,激励MDL-C, 得出理论性能保障,并展示MDL-C在离散和高维持续控制任务方面的经验效果。% 随机的是,这个框架被用来修改现有的政策优化方法,改进其在离散和高维连续控制问题方面的多任务性能。