使用混合政策进行最大成份强化学习 (Maximum Entropy Reinforcement Learning with Mixture Policies) - 专知论文

会员服务 ·

0

估计/估计量 · Continuity · 近似 · 易处理的 · CASES ·

2021 年 3 月 18 日

Maximum Entropy Reinforcement Learning with Mixture Policies

翻译：使用混合政策进行最大成份强化学习

Nir Baram,Guy Tennenholtz,Shie Mannor

Mixture models are an expressive hypothesis class that can approximate a rich set of policies. However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward. The entropy of a mixture model is not equal to the sum of its components, nor does it have a closed-form expression in most cases. Using such policies in MaxEnt algorithms, therefore, requires constructing a tractable approximation of the mixture entropy. In this paper, we derive a simple, low-variance mixture-entropy estimator. We show that it is closely related to the sum of marginal entropies. Equipped with our entropy estimator, we derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.

翻译：混合模型是一个直观的假设类别,可以近似于一套丰富的政策。但是,在最大肠杆菌(MaxEnt)框架中使用混合政策并非直截了当。混合物模型的酶值并不等于其成分的总和, 在大多数情况下, 它也没有封闭式的表达方式。因此, 在 MaxEnt 算法中使用这种政策, 需要构建混合物酶的可移植近似值。在本文中, 我们得出一个简单、低差异的混合物- 血压估计器。我们显示它与边际植物的总和密切相关。我们用我们的酶测算器, 我们从混合物保值中提取了一个 SoftAcor- Critic (SAC) 的算法变量, 并在一系列连续的控制任务中进行评估。

0

相关内容

估计/估计量

估计/估计量

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

291+阅读 · 2020年3月10日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

专知会员服务

50+阅读 · 2020年1月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Machine Learning：十大机器学习算法

Machine Learning：十大机器学习算法

开源中国

21+阅读 · 2018年3月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Adaptive Policy Transfer in Reinforcement Learning

Arxiv

1+阅读 · 2021年5月10日

Reinforcement learning of rare diffusive dynamics

Arxiv

0+阅读 · 2021年5月10日

CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Arxiv

0+阅读 · 2021年5月9日

Reinforcement Learning with Random Delays

Arxiv

2+阅读 · 2021年5月4日

Deep Reinforcement Learning for Adaptive Exploration of Unknown Environments

Arxiv

0+阅读 · 2021年5月4日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Arxiv

11+阅读 · 2018年7月12日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

291+阅读 · 2020年3月10日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

专知会员服务

50+阅读 · 2020年1月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《军事域人工智能风险、机遇与治理战略指导报告》2025最新76页报告

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

俄乌冲突的地缘政治与军事教训（万字长文）

《弹药快速效能建模：推进互操作性与技术优势》2025最新26页报告

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

李宏毅-201806-中文-Deep Reinforcement Learning精品课程分享

深度学习与NLP

15+阅读 · 2018年6月20日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Machine Learning：十大机器学习算法

Machine Learning：十大机器学习算法

开源中国

21+阅读 · 2018年3月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Adaptive Policy Transfer in Reinforcement Learning

Arxiv

1+阅读 · 2021年5月10日

Reinforcement learning of rare diffusive dynamics

Arxiv

0+阅读 · 2021年5月10日

CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Arxiv

0+阅读 · 2021年5月9日

Reinforcement Learning with Random Delays

Arxiv

2+阅读 · 2021年5月4日

Deep Reinforcement Learning for Adaptive Exploration of Unknown Environments

Arxiv

0+阅读 · 2021年5月4日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Arxiv

11+阅读 · 2018年7月12日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员