生成性模型的目标条件强化学习学习方法 (Learning Generative Models with Goal-conditioned Reinforcement Learning) - 专知论文

会员服务 ·

0

Agent · 训练集 · 强化学习 · 特征保持 · 对数似然 ·

2023 年 3 月 26 日

Learning Generative Models with Goal-conditioned Reinforcement Learning

翻译：生成性模型的目标条件强化学习学习方法

Mariana Vargas Vieyra,Pierre Ménard

We present a novel, alternative framework for learning generative models with goal-conditioned reinforcement learning. We define two agents, a goal conditioned agent (GC-agent) and a supervised agent (S-agent). Given a user-input initial state, the GC-agent learns to reconstruct the training set. In this context, elements in the training set are the goals. During training, the S-agent learns to imitate the GC-agent while remaining agnostic of the goals. At inference we generate new samples with the S-agent. Following a similar route as in variational auto-encoders, we derive an upper bound on the negative log-likelihood that consists of a reconstruction term and a divergence between the GC-agent policy and the (goal-agnostic) S-agent policy. We empirically demonstrate that our method is able to generate diverse and high quality samples in the task of image synthesis.

翻译：我们提出了一种新颖的、使用目标条件强化学习学习生成性模型的方法。我们定义了两个智能体，一个是目标条件代理 (GC-agent)，一个是监督代理 (S-agent)。在给定用户输入的初始状态的情况下，GC-agent 学习重构训练集。在这个背景下，训练集中的元素是目标。在训练期间，S-agent 学习模仿 GC-agent，但对目标的特征保持不知情。在推理的过程中，我们使用 S-agent 生成新样本。在运用模型变分自编码器的类似转化方法的基础上，我们得出了负对数似然的上界，它由重构项和 GC-agent 策略与（对目标不知情的） S-agent 策略之间的差异组成。我们通过实验证明，我们的方法在图像合成任务中能够生成多样化和高质量的样本。

0

相关内容

Agent

生成对抗网络，10页pdf

生成对抗网络，10页pdf

专知会员服务

32+阅读 · 2022年11月23日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

【深度学习中的不确定性-贝叶斯CNN | TensorFlow概率】Uncertainty In Deep Learning — Bayesian CNN | TensorFlow Probability

【深度学习中的不确定性-贝叶斯CNN | TensorFlow概率】Uncertainty In Deep Learning — Bayesian CNN | TensorFlow Probability

专知会员服务

40+阅读 · 2022年3月19日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

生成对抗网络，10页pdf

生成对抗网络，10页pdf

专知

2+阅读 · 2022年11月23日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

入门 | 通过 Q-learning 深入理解强化学习

入门 | 通过 Q-learning 深入理解强化学习

机器之心

12+阅读 · 2018年4月17日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

微生物菌体对铝的吸附行为及其改良酸性土壤的应用潜力

国家自然科学基金

0+阅读 · 2015年12月31日

复杂数据模型中的分布逼近方法

国家自然科学基金

3+阅读 · 2014年12月31日

基于强化学习的前列腺癌蛋白质间相互作用网络的模型及方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

长效抗中毒低铂/掺杂型TiN催化剂的可控合成及甲醇氧化催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Markov决策过程值函数逼近的基函数自动构造

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

随机微分方程的逼近

国家自然科学基金

0+阅读 · 2009年12月31日

多尺度高斯过程模型及其学习曲线研究

国家自然科学基金

2+阅读 · 2009年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Arxiv

0+阅读 · 2023年5月17日

Goal-Conditioned Supervised Learning with Sub-Goal Prediction

Arxiv

0+阅读 · 2023年5月17日

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Arxiv

0+阅读 · 2023年5月15日

Physics-Informed Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年5月14日

S-REINFORCE: A Neuro-Symbolic Policy Gradient Approach for Interpretable Reinforcement Learning

Arxiv

0+阅读 · 2023年5月12日

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Arxiv

5+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

生成对抗网络，10页pdf

生成对抗网络，10页pdf

专知会员服务

32+阅读 · 2022年11月23日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

【深度学习中的不确定性-贝叶斯CNN | TensorFlow概率】Uncertainty In Deep Learning — Bayesian CNN | TensorFlow Probability

【深度学习中的不确定性-贝叶斯CNN | TensorFlow概率】Uncertainty In Deep Learning — Bayesian CNN | TensorFlow Probability

专知会员服务

40+阅读 · 2022年3月19日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

正视无人机心理战：恐惧效应与战略反思

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

相关资讯

生成对抗网络，10页pdf

生成对抗网络，10页pdf

专知

2+阅读 · 2022年11月23日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

入门 | 通过 Q-learning 深入理解强化学习

入门 | 通过 Q-learning 深入理解强化学习

机器之心

12+阅读 · 2018年4月17日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Arxiv

0+阅读 · 2023年5月17日

Goal-Conditioned Supervised Learning with Sub-Goal Prediction

Arxiv

0+阅读 · 2023年5月17日

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Arxiv

0+阅读 · 2023年5月15日

Physics-Informed Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年5月14日

S-REINFORCE: A Neuro-Symbolic Policy Gradient Approach for Interpretable Reinforcement Learning

Arxiv

0+阅读 · 2023年5月12日

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Arxiv

5+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

微生物菌体对铝的吸附行为及其改良酸性土壤的应用潜力

国家自然科学基金

0+阅读 · 2015年12月31日

复杂数据模型中的分布逼近方法

国家自然科学基金

3+阅读 · 2014年12月31日

基于强化学习的前列腺癌蛋白质间相互作用网络的模型及方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

长效抗中毒低铂/掺杂型TiN催化剂的可控合成及甲醇氧化催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Markov决策过程值函数逼近的基函数自动构造

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

随机微分方程的逼近

国家自然科学基金

0+阅读 · 2009年12月31日

多尺度高斯过程模型及其学习曲线研究

国家自然科学基金

2+阅读 · 2009年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员