利用超网络不断进行基于模型的加强强化学习 (Continual Model-Based Reinforcement Learning with Hypernetworks)

Effective planning in model-based reinforcement learning (MBRL) and model-predictive control (MPC) relies on the accuracy of the learned dynamics model. In many instances of MBRL and MPC, this model is assumed to be stationary and is periodically re-trained from scratch on state transition experience collected from the beginning of environment interactions. This implies that the time required to train the dynamics model - and the pause required between plan executions - grows linearly with the size of the collected experience. We argue that this is too slow for lifelong robot learning and propose HyperCRL, a method that continually learns the encountered dynamics in a sequence of tasks using task-conditional hypernetworks. Our method has three main attributes: first, it includes dynamics learning sessions that do not revisit training data from previous tasks, so it only needs to store the most recent fixed-size portion of the state transition experience; second, it uses fixed-capacity hypernetworks to represent non-stationary and task-aware dynamics; third, it outperforms existing continual learning alternatives that rely on fixed-capacity networks, and does competitively with baselines that remember an ever increasing coreset of past experience. We show that HyperCRL is effective in continual model-based reinforcement learning in robot locomotion and manipulation scenarios, such as tasks involving pushing and door opening. Our project website with videos is at this link https://rvl.cs.toronto.edu/blog/2020/hypercrl

翻译：模型强化学习(MBRL)和模型预测控制(MPC)的有效规划取决于所学动态模型的准确性。在MBRL和MPC的许多情况下,这一模型假定是固定的,根据从环境互动一开始就收集的国家过渡经验,定期从零开始根据从环境互动中收集的国家过渡经验进行再培训。这意味着随着所收集经验的大小,动态模型培训所需的时间(以及计划执行之间所需的暂停时间)随着所收集经验的大小而线性增长。我们认为,对于终身机器人学习来说,这太慢了,并提议了超链接(HyperCRL),这种方法在任务序列中不断学习所遭遇到的动态动态动态。我们的方法有三个主要属性:第一,它包括不从以往任务中重新审查培训数据的动态学习课,因此只需要储存最近固定规模的州过渡经验部分;第二,它使用固定能力超网络代表非固定和任务认知的动态;第三,它优于现有依赖固定能力网络的不断学习替代品,并且以基准为竞争基准,在不断更新的服务器/服务器上记录我们不断更新的升级的轨道上,这是不断更新的机器人操作的机器人操作的轨道。我们显示,在不断更新的轨道上,在不断更新的轨道上不断更新的轨道上学习的轨道上学习的轨道上,在不断学习的轨道上进行。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning

专知会员服务

66+阅读 · 2020年4月17日