用于持续学习的区块背景 MDP (Block Contextual MDPs for Continual Learning)

In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics is implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the continual reinforcement learning scenario, the sequence of tasks is another source of nonstationarity. In this work, we propose to examine this continual reinforcement learning setting through the block contextual MDP (BC-MDP) framework, which enables us to relax the assumption of stationarity. This framework challenges RL algorithms to handle both nonstationarity and rich observation settings and, by additionally leveraging smoothness properties, enables us to study generalization bounds for this setting. Finally, we take inspiration from adaptive control to propose a novel algorithm that addresses the challenges introduced by this more realistic BC-MDP setting, allows for zero-shot adaptation at evaluation time, and achieves strong performance on several nonstationary environments.

翻译：在强化学习(RL)中,当定义Markov决策程序时,环境动态被暗含地假定为固定的。这种固定性假设虽然简化,但在许多情景中可能是不切实际的。在持续强化学习的情景中,任务的顺序是非固定性的另一个来源。在这项工作中,我们提议通过块状背景的MDP(BC-MDP)框架来审查这种持续强化学习设置,这使我们能够放松对固定性的假设。这个框架挑战RL算法,以便既处理非静止性又处理丰富的观测设置,并通过进一步利用平稳性能,使我们能够研究这一设置的概括性界限。最后,我们从适应性控制中得到灵感,提出一种新的算法,以应对这种更现实的BC-MDP设置所带来的挑战,允许在评价时零点调整,并在一些非静止环境中取得强大的性表现。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【Uber AI新论文】持续元学习，Learning to Continually Learn

专知会员服务

37+阅读 · 2020年2月27日