部署期间的自我监督政策调整 (Self-Supervised Policy Adaptation during Deployment) - 专知论文

会员服务 ·

0

回合 · Continuity · 泛化理论 · Performer · DeepMind ·

2020 年 12 月 10 日

Self-Supervised Policy Adaptation during Deployment

翻译：部署期间的自我监督政策调整

Nicklas Hansen,Rishabh Jangir,Yu Sun,Guillem Alenyà,Pieter Abbeel,Alexei A. Efros,Lerrel Pinto,Xiaolong Wang

from arxiv, Project page: https://nicklashansen.github.io/PAD/ Code: https://github.com/nicklashansen/policy-adaptation-during-deployment

In most real world scenarios, a policy trained by reinforcement learning in one environment needs to be deployed in another, potentially quite different environment. However, generalization across different environments is known to be hard. A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal. Our work explores the use of self-supervision to allow the policy to continue training after deployment without using any rewards. While previous methods explicitly anticipate changes in the new environment, we assume no prior knowledge of those changes yet still obtain significant improvements. Empirical evaluations are performed on diverse simulation environments from DeepMind Control suite and ViZDoom, as well as real robotic manipulation tasks in continuously changing environments, taking observations from an uncalibrated camera. Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments.

翻译：在最真实的世界情景中,在一种环境中通过强化学习而培训的政策需要在另一种环境中部署,可能非常不同。然而,人们知道,在不同的环境中,一般化是很困难的。自然的解决办法是,在新环境部署后继续培训,但如果新环境没有提供奖励信号,则无法做到这一点。我们的工作探索使用自我监督来让政策在部署后继续培训,而不使用任何奖励。虽然以前的方法明确预见到新环境的变化,但我们假设这些变化的先前知识还没有显著改善。从DeepMind控制套房和ViZDomoom对不同的模拟环境进行了经验评估,并在不断变化的环境中进行了真正的机器人操纵任务,从一个未经校正的相机中进行观察。我们的方法改进了在36个环境中的31个环境的普及,在大多数环境中,我们的方法超越了区域随机化。

0

相关内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

专知会员服务

12+阅读 · 2019年12月8日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

Improper Learning with Gradient-based Policy Optimization

Improper Learning with Gradient-based Policy Optimization

Arxiv

0+阅读 · 2021年2月16日

Policy-Based Federated Learning

Arxiv

0+阅读 · 2021年2月16日

Domain Adversarial Reinforcement Learning

Arxiv

0+阅读 · 2021年2月14日

Learning Off-Policy with Online Planning

Arxiv

1+阅读 · 2021年2月12日

Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

Arxiv

0+阅读 · 2021年2月12日

Self-supervised Learning: Generative or Contrastive

Arxiv

19+阅读 · 2020年7月21日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

专知会员服务

12+阅读 · 2019年12月8日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《大规模供应链中断实时管理中智能决策支持系统的弹性集成》最新295页

中文版 | 强化国防以数据为中心的网络安全与零信任协作

中文版 | 领导力、杀伤力与数据素养：备战数据驱动与AI赋能未来战争的三大关键

中文版 | 21世纪武器装备中的人工智能与机器人技术：俄罗斯视角

相关资讯

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

相关论文

Improper Learning with Gradient-based Policy Optimization

Improper Learning with Gradient-based Policy Optimization

Arxiv

0+阅读 · 2021年2月16日

Policy-Based Federated Learning

Arxiv

0+阅读 · 2021年2月16日

Domain Adversarial Reinforcement Learning

Arxiv

0+阅读 · 2021年2月14日

Learning Off-Policy with Online Planning

Arxiv

1+阅读 · 2021年2月12日

Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

Arxiv

0+阅读 · 2021年2月12日

Self-supervised Learning: Generative or Contrastive

Arxiv

19+阅读 · 2020年7月21日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员