与不受限制的单体神经网络进行分配强化学习 (Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks) - 专知论文

会员服务 ·

0

Learning · Neural Networks · 通用近似器 · Networking · Q网络` ·

2022 年 6 月 3 日

Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks

翻译：与不受限制的单体神经网络进行分配强化学习

Thibaut Théate,Antoine Wehenkel,Adrien Bolland,Gilles Louppe,Damien Ernst

The distributional reinforcement learning (RL) approach advocates for representing the complete probability distribution of the random return instead of only modelling its expectation. A distributional RL algorithm may be characterised by two main components, namely the representation of the distribution together with its parameterisation and the probability metric defining the loss. The present research work considers the unconstrained monotonic neural network (UMNN) architecture, a universal approximator of continuous monotonic functions which is particularly well suited for modelling different representations of a distribution (PDF, CDF, QF). This property enables the efficient decoupling of the effect of the function approximator class from that of the probability metric. The research paper firstly introduces a methodology for learning different representations of the random return distribution. Secondly, a novel distributional RL algorithm named unconstrained monotonic deep Q-network (UMDQN) is presented. Lastly, in light of this new algorithm, an empirical comparison is performed between three probability quasimetrics, namely the Kullback-Leibler divergence, Cramer distance, and Wasserstein distance. The results highlight the main strengths and weaknesses associated with each probability metric together with an important limitation of the Wasserstein distance. This research concludes by calling for a reconsideration of all probability metrics in distributional RL, contrasting with the clear dominance of the Wasserstein distance in recent publications.

翻译：分配强化学习(RL) 方法主张代表随机返回的完全概率分布,而不是仅仅模拟其预期。分配RL算法可以用两个主要组成部分来定性, 即分布的表示及其参数化和确定损失的概率度量。目前的研究工作考虑了不受限制的单调神经网络(UMNN)架构, 即一个通用的连续单调函数匹配器, 它特别适合于模拟分布的不同表示( PDF、 CDF、 QF) 。这一属性使得功能相近类的功能与概率度测量的对比有效脱钩。研究论文首先介绍了一种方法, 学习随机返回分布的不同表达方式。第二, 介绍了一个名为不受限制的单调深度Q- 网络(UMDQN) 的新的分发RL 算法。最后, 根据这一新算法, 实证比较是在三个概率准之间进行的, 即 Kullback- Leiper 差异、 Cramer距离和 Wasierrstein 距离的函数效果。这个结果显示的是, 与每个VAL 标准分布中的重要的比力, 以及每个VAL 度的比值, 的比值。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

几类含∞-Laplace算子的特征值问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

受限空间强湍流射流流场流动特性基础研究

国家自然科学基金

0+阅读 · 2015年12月31日

可压缩Navier-Stokes方程组及相关模型解的整体适定性研究

国家自然科学基金

0+阅读 · 2015年12月31日

HIF-1调控Galectin-1与S1PR1-STAT3信号轴对话并诱导胃癌特异性肝转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

几类扩散过程的逼近及应用

国家自然科学基金

1+阅读 · 2014年12月31日

融合网络环境下实时与自适应的服务选择机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于不完备数据的声波和电磁波反散射问题的理论和数值算法

国家自然科学基金

1+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

动态条件下的高速铁路牵引供电系统非线性谐振研究

国家自然科学基金

0+阅读 · 2011年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

Multilevel Bayesian Deep Neural Networks

Arxiv

0+阅读 · 2022年7月20日

An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Arxiv

0+阅读 · 2022年7月18日

MAD for Robust Reinforcement Learning in Machine Translation

Arxiv

0+阅读 · 2022年7月18日

Uncertainty Calibration in Bayesian Neural Networks via Distance-Aware Priors

Arxiv

0+阅读 · 2022年7月17日

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

Arxiv

0+阅读 · 2022年7月15日

Improving Task-free Continual Learning by Distributionally Robust Memory Evolution

Arxiv

0+阅读 · 2022年7月15日

Distributionally Robust Deep Learning using Hardness Weighted Sampling

Arxiv

0+阅读 · 2022年7月14日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

Neural Networks

通用近似器

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

ICCV最佳论文出炉，朱俊彦团队用砖块积木摘得桂冠

面向具身操作的高效视觉–语言–动作模型：系统综述

人类与人工智能战斗飞行员的交互研究

【NTU博士论文】反事实推理在多模态对话生成中的应用

相关资讯

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Multilevel Bayesian Deep Neural Networks

Arxiv

0+阅读 · 2022年7月20日

An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Arxiv

0+阅读 · 2022年7月18日

MAD for Robust Reinforcement Learning in Machine Translation

Arxiv

0+阅读 · 2022年7月18日

Uncertainty Calibration in Bayesian Neural Networks via Distance-Aware Priors

Arxiv

0+阅读 · 2022年7月17日

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

Arxiv

0+阅读 · 2022年7月15日

Improving Task-free Continual Learning by Distributionally Robust Memory Evolution

Arxiv

0+阅读 · 2022年7月15日

Distributionally Robust Deep Learning using Hardness Weighted Sampling

Arxiv

0+阅读 · 2022年7月14日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

几类含∞-Laplace算子的特征值问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

受限空间强湍流射流流场流动特性基础研究

国家自然科学基金

0+阅读 · 2015年12月31日

可压缩Navier-Stokes方程组及相关模型解的整体适定性研究

国家自然科学基金

0+阅读 · 2015年12月31日

HIF-1调控Galectin-1与S1PR1-STAT3信号轴对话并诱导胃癌特异性肝转移的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

几类扩散过程的逼近及应用

国家自然科学基金

1+阅读 · 2014年12月31日

融合网络环境下实时与自适应的服务选择机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于不完备数据的声波和电磁波反散射问题的理论和数值算法

国家自然科学基金

1+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

动态条件下的高速铁路牵引供电系统非线性谐振研究

国家自然科学基金

0+阅读 · 2011年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员