$P^{3}O$: 基于提示的视觉表示转移强化学习方法 ($P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting) - 专知论文

会员服务 ·

0

表示 · 强化学习 · 算法 · 优化器 · 不变 ·

2023 年 3 月 27 日

$P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting

翻译：$P^{3}O$: 基于提示的视觉表示转移强化学习方法

Guoliang You,Xiaomeng Chu,Yifan Duan,Jie Peng,Jianmin Ji,Yu Zhang,Yanyong Zhang

from arxiv, This paper has been accepted to be presented at the upcoming IEEE International Conference on Multimedia & Expo (ICME) in 2023

It is important for deep reinforcement learning (DRL) algorithms to transfer their learned policies to new environments that have different visual inputs. In this paper, we introduce Prompt based Proximal Policy Optimization ($P^{3}O$), a three-stage DRL algorithm that transfers visual representations from a target to a source environment by applying prompting. The process of $P^{3}O$ consists of three stages: pre-training, prompting, and predicting. In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged. We implement $P^{3}O$ and evaluate it on the OpenAI CarRacing video game. The experimental results show that $P^{3}O$ outperforms the state-of-the-art visual transferring schemes. In particular, $P^{3}O$ allows the learned policies to perform well in environments with different visual inputs, which is much more effective than retraining the policies in these environments.

翻译：对于深度强化学习（DRL）算法来说，将其学到的策略转移到具有不同视觉输入的新环境非常重要。本文提出Prompt based Proximal Policy Optimization ($P^{3}O$)，一种基于提示的三阶段DRL算法，通过应用提示方法从目标环境向源环境转移视觉表示。$P^{3}O}$ 的过程包括三个阶段：预训练，提示和预测。特别地，我们指定了一个提示转换器进行表示转换，并提出了一个两步训练过程，对目标环境的提示转换器进行训练，而DRL的其余流程保持不变。我们实施了$P^{3}O}$并在OpenAI CarRacing视频游戏上进行了评估。实验结果表明，$P^{3}O$优于最先进的视觉转移方案。特别地，$P^{3}O$允许学到的策略在具有不同视觉输入的环境中表现良好，比在这些环境中重新训练策略要有效得多。

0

相关内容

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【AAAI2022】跨域少样本图分类

【AAAI2022】跨域少样本图分类

专知会员服务

30+阅读 · 2022年1月22日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【牛津大学博士论文】基于强化学习的无地图机器人导航，Reinforcement Learning Based MRN

【牛津大学博士论文】基于强化学习的无地图机器人导航，Reinforcement Learning Based MRN

专知会员服务

121+阅读 · 2020年5月18日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

NAACL 2022 | 基于Prompt的文本生成迁移学习

NAACL 2022 | 基于Prompt的文本生成迁移学习

PaperWeekly

1+阅读 · 2022年8月31日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

小鼠肺移植早期再灌注损伤中NETs作用的可视化研究

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤细胞来源的微颗粒介导卵巢癌的免疫治疗与化疗

国家自然科学基金

0+阅读 · 2014年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

含有芳环侧链的聚硅烷的手性转移和对映体分离

国家自然科学基金

0+阅读 · 2012年12月31日

若干最优控制问题的有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

FAM3B258在结肠癌浸润和转移中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

离子通道TRPM2在血管壁内膜增生中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于群聚智能的汉语认知隐态建模研究

国家自然科学基金

1+阅读 · 2008年12月31日

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Arxiv

0+阅读 · 2023年5月17日

Continual Vision-Language Representation Learning with Off-Diagonal Information

Arxiv

0+阅读 · 2023年5月17日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Arxiv

13+阅读 · 2022年3月29日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Deep Reinforcement Learning: An Overview

Deep Reinforcement Learning: An Overview

Arxiv

17+阅读 · 2018年11月26日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【AAAI2022】跨域少样本图分类

【AAAI2022】跨域少样本图分类

专知会员服务

30+阅读 · 2022年1月22日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【牛津大学博士论文】基于强化学习的无地图机器人导航，Reinforcement Learning Based MRN

【牛津大学博士论文】基于强化学习的无地图机器人导航，Reinforcement Learning Based MRN

专知会员服务

121+阅读 · 2020年5月18日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

【论文推荐中科院自动化所】视频游戏中深度强化学习的研究综述，A Survey of Deep Reinforcement Learning in Video

专知会员服务

48+阅读 · 2019年12月24日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

NAACL 2022 | 基于Prompt的文本生成迁移学习

NAACL 2022 | 基于Prompt的文本生成迁移学习

PaperWeekly

1+阅读 · 2022年8月31日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Arxiv

0+阅读 · 2023年5月17日

Continual Vision-Language Representation Learning with Off-Diagonal Information

Arxiv

0+阅读 · 2023年5月17日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Arxiv

13+阅读 · 2022年3月29日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Arxiv

17+阅读 · 2020年4月28日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Deep Reinforcement Learning: An Overview

Deep Reinforcement Learning: An Overview

Arxiv

17+阅读 · 2018年11月26日

相关基金

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

小鼠肺移植早期再灌注损伤中NETs作用的可视化研究

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤细胞来源的微颗粒介导卵巢癌的免疫治疗与化疗

国家自然科学基金

0+阅读 · 2014年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

含有芳环侧链的聚硅烷的手性转移和对映体分离

国家自然科学基金

0+阅读 · 2012年12月31日

若干最优控制问题的有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

FAM3B258在结肠癌浸润和转移中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

离子通道TRPM2在血管壁内膜增生中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于群聚智能的汉语认知隐态建模研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员