为持续控制而指导的正常水平分配强化学习 (Normality-Guided Distributional Reinforcement Learning for Continuous Control) - 专知论文

会员服务 ·

0

Learning · Continuity · 价值函数 · Performer · 规范化的 ·

2022 年 8 月 28 日

Normality-Guided Distributional Reinforcement Learning for Continuous Control

翻译：为持续控制而指导的正常水平分配强化学习

Ju-Seung Byun,Andrew Perrault

Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) methods instead model the value distribution, which has been shown to improve performance in many settings. In this paper, we model the value distribution as approximately normal using the Markov Chain central limit theorem. We analytically compute quantile bars to provide a new DRL target that is informed by the decrease in standard deviation that occurs over the course of an episode. In addition, we suggest an exploration strategy based on how closely the learned value distribution resembles the target normal distribution to make the value function more accurate for better policy improvement. The approach we outline is compatible with many DRL structures. We use proximal policy optimization as a testbed and show that both the normality-guided target and exploration bonus produce performance improvements. We demonstrate our method outperforms DRL baselines on a number of continuous control tasks.

翻译：学习平均回报或价值函数的预测模型,在许多强化学习算法中发挥着关键作用。分配强化学习方法( DRL) 代替了价值分配模式, 这已证明可以改善许多环境的性能。在本文中, 我们用 Markov 链中枢限制理论, 将价值分配模式作为大致正常的模型。我们分析地计算了个数列, 以提供一个新的 DRL 目标, 其依据是某一事件过程中标准偏差的下降。此外, 我们建议了一种探索战略, 其依据是, 所学到的值分布与目标正常分布的相似, 以使价值函数更精确地改进政策。我们所描述的方法与许多 DRL 结构相容。我们使用准度政策优化作为测试仪, 并显示正常性指导目标与勘探红利都产生性能改进。我们展示的方法优于一系列连续控制任务的 DRL 基线。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

内质网应激在半骺板阻滞术调节骺板软骨内生骨化中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

AG-WUS-PcG-lncRNA互作对梅多雌蕊发育的调控

国家自然科学基金

0+阅读 · 2015年12月31日

国产盆距兰属(Gastrochilus)的分类修订

国家自然科学基金

0+阅读 · 2015年12月31日

地质样品Ce4+/Ce3+比值分析及其应用：以藏东玉龙斑岩铜矿为例研究岩浆相对氧化还原状态与斑岩矿床形成关系

国家自然科学基金

0+阅读 · 2014年12月31日

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

云南老君山W-Sn多金属矿集区成岩成矿年代学格架及意义

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

高同型半胱氨酸血症通过调节巨噬细胞亚群而促进脂肪组织胰岛素抵抗-多囊卵巢综合征患者并发症的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Mg基相变储热材料设计与热循环中的传热传质行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

张应力作用下颅底软骨联合的差异蛋白质组学研究

国家自然科学基金

0+阅读 · 2009年12月31日

The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月16日

Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月14日

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Arxiv

0+阅读 · 2022年10月14日

Mutual Information Regularized Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年10月14日

Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning

Arxiv

0+阅读 · 2022年10月13日

A Mixture of Surprises for Unsupervised Reinforcement Learning

Arxiv

0+阅读 · 2022年10月13日

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年10月12日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】VideoLucy：用于长视频理解的深度记忆回溯机制

不确定环境下无人机与无人地面车辆编队的地下勘探规划算法 | 122页

【NTU博士论文】端到端鲁棒自动语音识别的最新进展

用于强化学习的扩散模型：基础、分类与发展

相关资讯

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月16日

Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年10月14日

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Arxiv

0+阅读 · 2022年10月14日

Mutual Information Regularized Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年10月14日

Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning

Arxiv

0+阅读 · 2022年10月13日

A Mixture of Surprises for Unsupervised Reinforcement Learning

Arxiv

0+阅读 · 2022年10月13日

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年10月12日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

相关基金

内质网应激在半骺板阻滞术调节骺板软骨内生骨化中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

AG-WUS-PcG-lncRNA互作对梅多雌蕊发育的调控

国家自然科学基金

0+阅读 · 2015年12月31日

国产盆距兰属(Gastrochilus)的分类修订

国家自然科学基金

0+阅读 · 2015年12月31日

地质样品Ce4+/Ce3+比值分析及其应用：以藏东玉龙斑岩铜矿为例研究岩浆相对氧化还原状态与斑岩矿床形成关系

国家自然科学基金

0+阅读 · 2014年12月31日

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

云南老君山W-Sn多金属矿集区成岩成矿年代学格架及意义

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

高同型半胱氨酸血症通过调节巨噬细胞亚群而促进脂肪组织胰岛素抵抗-多囊卵巢综合征患者并发症的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Mg基相变储热材料设计与热循环中的传热传质行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

张应力作用下颅底软骨联合的差异蛋白质组学研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员