以高斯进程无限混合方式进行任务不可确定性在线强化学习 (Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes)

Continuously learning to solve unseen tasks with limited experience has been extensively pursued in meta-learning and continual learning, but with restricted assumptions such as accessible task distributions, independently and identically distributed tasks, and clear task delineations. However, real-world physical tasks frequently violate these assumptions, resulting in performance degradation. This paper proposes a continual online model-based reinforcement learning approach that does not require pre-training to solve task-agnostic problems with unknown task boundaries. We maintain a mixture of experts to handle nonstationarity, and represent each different type of dynamics with a Gaussian Process to efficiently leverage collected data and expressively model uncertainty. We propose a transition prior to account for the temporal dependencies in streaming data and update the mixture online via sequential variational inference. Our approach reliably handles the task distribution shift by generating new models for never-before-seen dynamics and reusing old models for previously seen dynamics. In experiments, our approach outperforms alternative methods in non-stationary tasks, including classic control with changing dynamics and decision making in different driving scenarios.

翻译：在元学习和持续学习过程中,人们广泛不断学习如何解决经验有限的无形任务,但是,在诸如可获取的任务分配、独立和完全分配的任务和明确的任务划分等有限假设下,不断学习如何解决经验有限的无形任务。然而,现实世界的物理任务经常违反这些假设,导致业绩退化。本文件建议采用持续的在线模式强化学习方法,不需要通过培训前培训解决任务机密性问题,任务界限不明。我们保持专家的混合,处理不常态问题,并代表每一种不同的动态动态,与高山进程一道,有效地利用所收集的数据和明确模型不确定性。我们提议在考虑数据流中的时间依赖性之前先进行过渡,并通过连续的变式推断更新在线混合。我们的方法可靠地处理任务分配变化,方法是产生新模型,用于前所未有的动态,并将旧模型用于以往所见的动态。在实验中,我们的方法超越了非常态任务中的替代方法,包括对变化动态的典型控制以及在不同驱动情景中的决策。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日