通过Sinkhorn迭代进行分配强化学习 (Distributional Reinforcement Learning via Sinkhorn Iterations) - 专知论文

会员服务 ·

0

state-of-the-art · Learning · 总回报 · 估计/估计量 · 散度 ·

2022 年 9 月 29 日

Distributional Reinforcement Learning via Sinkhorn Iterations

翻译：通过Sinkhorn迭代进行分配强化学习

Ke Sun,Yingnan Zhao,Yi Liu,Wulong Liu,Bei Jiang,Linglong Kong

from arxiv, arXiv admin note: text overlap with arXiv:2110.03155

Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the entire distribution of the total return rather than only its expectation. The empirical success of distributional RL is determined by the representation of return distributions and the choice of distribution divergence. In this paper, we propose a new class of \textit{Sinkhorn distributional RL~(SinkhornDRL)} algorithm that learns a finite set of statistics, i.e., deterministic samples, from each return distribution and then uses Sinkhorn iterations to evaluate the Sinkhorn distance between the current and target Bellmen distributions. Sinkhorn divergence features as the interpolation between the Wasserstein distance and Maximum Mean Discrepancy~(MMD). SinkhornDRL finds a sweet spot by taking advantage of the geometry of optimal transport-based distance and the unbiased gradient estimate property of MMD. Finally, compared to state-of-the-art algorithms, SinkhornDRL's competitive performance is demonstrated on the suit of 55 Atari games.

翻译：分配强化学习~ (RL) 是一类最先进的算法, 用来估计总回报的分布, 而不是仅仅估计其预期。分配RL 的成功经验取决于返回分布的表示和分布差异的选择。在本文中, 我们提出一种新的 \ textit{ Sinkhorn 分布RL~ (SinkhornDRL) 算法, 用来学习一套有限的统计数据, 即确定性样本, 从每次返回分布中, 然后使用 Sinkhorn 迭代法来评价当前和目标贝尔曼分布之间的辛克角距离。 Sinkhorn 差异特征是瓦瑟斯坦距离和最大平均值差异~ (MMDD) 之间的内插。 Sinkhorn DRL 发现一个甜点, 利用基于最佳运输距离的几何法和不偏向梯度估计 MMD的属性。最后, SinkhornDL 的竞争性表现在55 Atari 游戏的套中展示。

0

相关内容

state-of-the-art

state-of-the-art

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

石斑鱼半胱氨酸蛋白酶抑制剂B（CystatinB）在虹彩病毒SGIV感染中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非小细胞肺癌患者血浆可溶性TRAIL对循环ALDH1+肿瘤干细胞样细胞的影响及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

甜菜夜蛾几丁质脱乙酰酶抵御病毒侵染的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Mg-Zn-RE(Ce,Nd)系镁合金强化相析出过程与强化机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

新型纳米稀土相在铝硅合金中的形成、演化规律及其强化机制

国家自然科学基金

0+阅读 · 2013年12月31日

SiC/Ti基复合材料中Ti基体超细晶的形成和强化机理

国家自然科学基金

0+阅读 · 2012年12月31日

放电等离子烧结技术制备Al基非晶合金及其复合材料的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于时间隔离的安全关键控制网络防危调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

干湿交替过程中土壤氧化铁形态转化对As和Sb环境化学行为的影响机制

国家自然科学基金

0+阅读 · 2011年12月31日

The Benefits of Model-Based Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2022年11月4日

Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning

Arxiv

0+阅读 · 2022年11月4日

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年11月3日

Phase Transitions in Learning and Earning under Price Protection Guarantee

Arxiv

0+阅读 · 2022年11月3日

Behavior Prior Representation learning for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年11月2日

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Arxiv

0+阅读 · 2022年11月1日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

VIP会员

文章信息

相关主题

state-of-the-art

估计/估计量

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

The Benefits of Model-Based Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2022年11月4日

Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning

Arxiv

0+阅读 · 2022年11月4日

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年11月3日

Phase Transitions in Learning and Earning under Price Protection Guarantee

Arxiv

0+阅读 · 2022年11月3日

Behavior Prior Representation learning for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年11月2日

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Arxiv

0+阅读 · 2022年11月1日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

相关基金

石斑鱼半胱氨酸蛋白酶抑制剂B（CystatinB）在虹彩病毒SGIV感染中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非小细胞肺癌患者血浆可溶性TRAIL对循环ALDH1+肿瘤干细胞样细胞的影响及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

甜菜夜蛾几丁质脱乙酰酶抵御病毒侵染的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Mg-Zn-RE(Ce,Nd)系镁合金强化相析出过程与强化机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

新型纳米稀土相在铝硅合金中的形成、演化规律及其强化机制

国家自然科学基金

0+阅读 · 2013年12月31日

SiC/Ti基复合材料中Ti基体超细晶的形成和强化机理

国家自然科学基金

0+阅读 · 2012年12月31日

放电等离子烧结技术制备Al基非晶合金及其复合材料的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于时间隔离的安全关键控制网络防危调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

干湿交替过程中土壤氧化铁形态转化对As和Sb环境化学行为的影响机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员