One practical challenge in reinforcement learning (RL) is how to make quick adaptations when faced with new environments. In this paper, we propose a principled framework for adaptive RL, called \textit{AdaRL}, that adapts reliably and efficiently to changes across domains with a few samples from the target domain, even in partially observable environments. Specifically, we leverage a parsimonious graphical representation that characterizes structural relationships over variables in the RL system. Such graphical representations provide a compact way to encode what and where the changes across domains are, and furthermore inform us with a minimal set of changes that one has to consider for the purpose of policy adaptation. We show that by explicitly leveraging this compact representation to encode changes, we can efficiently adapt the policy to the target domain, in which only a few samples are needed and further policy optimization is avoided. We illustrate the efficacy of AdaRL through a series of experiments that vary factors in the observation, transition, and reward functions for Cartpole and Atari games.
翻译:强化学习(RL)的一个实际挑战是如何在面对新环境时快速适应。在本文件中,我们提出了一个适应性RL的原则框架,称为\textit{AdaRL},它可靠和高效地适应跨域的变化,有目标域的几个样本,即使是部分可观测环境中的样本。具体地说,我们利用一种模糊的图形代表方式来描述结构关系相对于RL系统中变量的特点。这种图形表达方式提供了一种缩略语,用以说明跨域的变化是什么和在哪里,并告诉我们为了政策适应而必须考虑的最低限度的一套变化。我们表明,通过明确利用这一契约代表方式来编码变化,我们可以有效地将政策调整到目标域,只需要少数几个样本,并避免进一步的政策优化。我们通过一系列实验来说明AdaRL的功效,这些实验在观察、过渡和奖励卡托尔和阿塔里游戏的功能方面有着不同的因素。