Q 美元- Munchausen 强化学习 ($q$-Munchausen Reinforcement Learning) - 专知论文

会员服务 ·

0

学成 · 正则化项 · 强化学习 · Performer · 奖励函数 ·

2022 年 5 月 16 日

$q$-Munchausen Reinforcement Learning

翻译：Q 美元- Munchausen 强化学习

Lingwei Zhu,Zheng Chen,Eiji Uchibe,Takamitsu Matsubara

The recently successful Munchausen Reinforcement Learning (M-RL) features implicit Kullback-Leibler (KL) regularization by augmenting the reward function with logarithm of the current stochastic policy. Though significant improvement has been shown with the Boltzmann softmax policy, when the Tsallis sparsemax policy is considered, the augmentation leads to a flat learning curve for almost every problem considered. We show that it is due to the mismatch between the conventional logarithm and the non-logarithmic (generalized) nature of Tsallis entropy. Drawing inspiration from the Tsallis statistics literature, we propose to correct the mismatch of M-RL with the help of $q$-logarithm/exponential functions. The proposed formulation leads to implicit Tsallis KL regularization under the maximum Tsallis entropy framework. We show such formulation of M-RL again achieves superior performance on benchmark problems and sheds light on more general M-RL with various entropic indices $q$.

翻译：最近成功的M-RL(M-RL)在Munchausen Servicen Securement Leiber(KL)中表现了隐含的Kullback-Leibler(KL)的正规化,通过对当前随机政策的对数来增加奖励功能。虽然在Boltzmann软式马克思政策方面已经显示出了显著的改进,但当考虑Tsallis 稀释式马克思政策时,扩增导致几乎考虑的每一个问题都有一个平坦的学习曲线。我们表明,这是由于传统对数与Tsallis entropy(通用)性质Tsallis 之间的不匹配。我们从Tsallis统计文献中汲取了灵感,我们建议用$logarithm/Explential功能来纠正M-RL的不匹配。拟议配对Tsallis KL的正规化导致在最大 Tsallis entropy框架下隐含Tsalis 。我们再次展示M-RL的这种配方在基准问题上取得优异性表现,并在较普通的M-RL与各种昆虫指数上以$q$。

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

动脉粥样硬化中oxLDL/CD36受体介导VSMC炎症表型转化的作用和机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于改变Michael加成反应活性的策略研究新型可克服耐药的EGFR[T790M]靶向小分子抑制剂

国家自然科学基金

0+阅读 · 2013年12月31日

血管紧张素II受体基因多态性与原发性醛固酮增多症发病风险、亚型及预后的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

非线性系统全局输出反馈：镇定、跟踪和应用

国家自然科学基金

0+阅读 · 2012年12月31日

EGFR受体介导的肺肿瘤靶向siRNA类脂质载体的构建及转运调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

IL-22介导S1P生成经S1PR1持续活化STAT3促进胰腺癌细胞增殖的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

受体毛细管电泳对赤芍中扩血管活性成分的筛选研究

国家自然科学基金

0+阅读 · 2012年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Langmuir环流在上层海洋混合中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

Two-Sample Testing in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月6日

On Effective Scheduling of Model-based Reinforcement Learning

Arxiv

0+阅读 · 2022年7月5日

Computational-Statistical Gaps in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月3日

Lifelong Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年7月1日

Performative Reinforcement Learning

Arxiv

0+阅读 · 2022年6月30日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Two-Sample Testing in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月6日

On Effective Scheduling of Model-based Reinforcement Learning

Arxiv

0+阅读 · 2022年7月5日

Computational-Statistical Gaps in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月3日

Lifelong Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年7月1日

Performative Reinforcement Learning

Arxiv

0+阅读 · 2022年6月30日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

动脉粥样硬化中oxLDL/CD36受体介导VSMC炎症表型转化的作用和机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于改变Michael加成反应活性的策略研究新型可克服耐药的EGFR[T790M]靶向小分子抑制剂

国家自然科学基金

0+阅读 · 2013年12月31日

血管紧张素II受体基因多态性与原发性醛固酮增多症发病风险、亚型及预后的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

非线性系统全局输出反馈：镇定、跟踪和应用

国家自然科学基金

0+阅读 · 2012年12月31日

EGFR受体介导的肺肿瘤靶向siRNA类脂质载体的构建及转运调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

IL-22介导S1P生成经S1PR1持续活化STAT3促进胰腺癌细胞增殖的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

受体毛细管电泳对赤芍中扩血管活性成分的筛选研究

国家自然科学基金

0+阅读 · 2012年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Langmuir环流在上层海洋混合中的作用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员