连续时间强化学习分配的汉密尔顿-贾科比-贝尔曼赤道 (Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning) - 专知论文

会员服务 ·

0

学成 · 控制器 · 总回报 · 强化学习 · 近似 ·

2022 年 5 月 24 日

Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

翻译：连续时间强化学习分配的汉密尔顿-贾科比-贝尔曼赤道

Harley Wiltzer,David Meger,Marc G. Bellemare

from arxiv, Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

Continuous-time reinforcement learning offers an appealing formalism for describing control problems in which the passage of time is not naturally divided into discrete increments. Here we consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time, stochastic environment. Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, learning state representations, multiagent coordination, and more. We begin by establishing the distributional analogue of the Hamilton-Jacobi-Bellman (HJB) equation for It\^o diffusions and the broader class of Feller-Dynkin processes. We then specialize this equation to the setting in which the return distribution is approximated by $N$ uniformly-weighted particles, a common design choice in distributional algorithms. Our derivation highlights additional terms due to statistical diffusivity which arise from the proper handling of distributions in the continuous-time setting. Based on this, we propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online control algorithm. We demonstrate the effectiveness of such an algorithm in a synthetic control problem.

翻译：连续时间强化学习为描述时间的流逝并非自然地分为离散增量的控制问题提供了一种颇具吸引力的形式主义。这里我们考虑的是预测在连续时间、随机环境中相互作用的代理人获得的回报分布的问题。准确的返回预测已证明对确定风险敏感控制的最佳政策、学习国家表现、多剂协调等都是有益的。我们首先通过建立HJB(HJB)对It ⁇ o扩散和Feller-Dynkin进程大类分配的分布性模拟方程式(HJB)开始为It ⁇ o扩散和Feller-Dynkin进程进行分布性模拟。我们然后将这一方程式专门化为返回分布近似于美元统一加权粒子的环境,这是分配算法中的一种通用设计选择。我们的推算结果突出了由于在连续时间设置中正确处理分配所产生的统计差异性而产生的额外条件。基于这一点,我们提出一种可移动的算法,以大致解决基于JKO计划的分配性HJB(可在网上控制算法中实施)的分布性 HJK。我们展示了这种合成控制算法的有效性。

0

相关内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

CRISPR/Cas9介导的染色体转位与剂量补偿效应研究

国家自然科学基金

0+阅读 · 2015年12月31日

CSP I-plus 修饰的内皮抑制素靶向抑制肝细胞癌转移的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

多项式代数上自同构的结构研究

国家自然科学基金

0+阅读 · 2013年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Hamilton体系下粘性流体的保结构算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

Mather理论与Hamilton-Jacobi方程的粘性解

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月12日

A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data

Arxiv

0+阅读 · 2022年7月12日

Accelerated Reinforcement Learning for Temporal Logic Control Objectives

Arxiv

0+阅读 · 2022年7月12日

Cluster-Based Control of Transition-Independent MDPs

Arxiv

0+阅读 · 2022年7月11日

Learning an evolved mixture model for task-free continual learning

Arxiv

0+阅读 · 2022年7月11日

Offline Reinforcement Learning for Road Traffic Control

Arxiv

0+阅读 · 2022年7月11日

Robust finite element discretization and solvers for distributed elliptic optimal control problems

Arxiv

0+阅读 · 2022年7月11日

Black and Gray Box Learning of Amplitude Equations: Application to Phase Field Systems

Black and Gray Box Learning of Amplitude Equations: Application to Phase Field Systems

Arxiv

0+阅读 · 2022年7月8日

Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年7月8日

Malliavin calculus and its application to robust optimal portfolio for an insider

Arxiv

0+阅读 · 2022年7月8日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月12日

A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data

Arxiv

0+阅读 · 2022年7月12日

Accelerated Reinforcement Learning for Temporal Logic Control Objectives

Arxiv

0+阅读 · 2022年7月12日

Cluster-Based Control of Transition-Independent MDPs

Arxiv

0+阅读 · 2022年7月11日

Learning an evolved mixture model for task-free continual learning

Arxiv

0+阅读 · 2022年7月11日

Offline Reinforcement Learning for Road Traffic Control

Arxiv

0+阅读 · 2022年7月11日

Robust finite element discretization and solvers for distributed elliptic optimal control problems

Arxiv

0+阅读 · 2022年7月11日

Black and Gray Box Learning of Amplitude Equations: Application to Phase Field Systems

Black and Gray Box Learning of Amplitude Equations: Application to Phase Field Systems

Arxiv

0+阅读 · 2022年7月8日

Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年7月8日

Malliavin calculus and its application to robust optimal portfolio for an insider

Arxiv

0+阅读 · 2022年7月8日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

CRISPR/Cas9介导的染色体转位与剂量补偿效应研究

国家自然科学基金

0+阅读 · 2015年12月31日

CSP I-plus 修饰的内皮抑制素靶向抑制肝细胞癌转移的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

多项式代数上自同构的结构研究

国家自然科学基金

0+阅读 · 2013年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

广义Hamilton体系下粘性流体的保结构算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

Mather理论与Hamilton-Jacobi方程的粘性解

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员