更深入地审视在行为者-批评方数值中差错的差价 (A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms) - 专知论文

会员服务 ·

0

评论员 · 优化器 · 价值函数 · 表示学习 · 泛函 ·

2022 年 1 月 26 日

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

翻译：更深入地审视在行为者-批评方数值中差错的差价

Shangtong Zhang,Romain Laroche,Harm van Seijen,Shimon Whiteson,Remi Tachet des Combes

from arxiv, AAMAS 2022

We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $\gamma^t$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually ignore the discounting ($\gamma^t$) for the actor while using a discounted critic. We investigate this mismatch in two scenarios. In the first scenario, we consider optimizing an undiscounted objective $(\gamma = 1)$ where $\gamma^t$ disappears naturally $(1^t = 1)$. We then propose to interpret the discounting in critic in terms of a bias-variance-representation trade-off and provide supporting empirical results. In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

翻译：我们从代表性学习的角度来调查行为体-批评算法实施中的贴现错配。理论上,行为体-批评算法通常对演员和评论家都有折扣,也就是说,当时间观察到的转型时,在演员更新中有一个美元(gamma美元)的术语,在轨迹中,批评家是一个折扣价值功能。然而,从业者通常忽略对演员的折扣(gamma美元),而使用折扣评论家。我们用两种假设来调查这一错配。在第一种假设中,我们考虑优化一个未贴现的目标$(\gamma = 1美元),即$(gamma = 1美元)自然消失。我们然后提议用偏差-差异-代表交易来解释批评折扣,并提供经验结果支持。在第二种假设中,我们考虑优化一个贴现目标($\gamma < 1美元),并提议从辅助任务角度来解释在演员更新中忽略贴现,并提供经验结果支持。

0

相关内容

评论员

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Notch通路相关基因甲基化对牙发育的调控机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

风电机组独立变桨距系统概率模糊建模与协调优化控制

国家自然科学基金

0+阅读 · 2014年12月31日

miR-29c在乳腺癌Lapatinib耐药中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

无界Petri网分析理论与方法

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

层topos中的拓扑结构与序结构

国家自然科学基金

0+阅读 · 2011年12月31日

癌症相关受体EGFR、Fas、ER和AR与钙调素相互作用的晶体结构研究

国家自然科学基金

1+阅读 · 2009年12月31日

整合猪miRNA和功能基因表达谱芯片元数据挖掘肌肉生长发育新的调控通路

国家自然科学基金

0+阅读 · 2009年12月31日

Theoretical analysis of edit distance algorithms: an applied perspective

Arxiv

0+阅读 · 2022年4月20日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Low Degree Testing over the Reals

Arxiv

0+阅读 · 2022年4月18日

Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey

Arxiv

0+阅读 · 2022年4月18日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning

Arxiv

0+阅读 · 2022年4月15日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

Machine Learning: Algorithms, Models, and Applications

Arxiv

22+阅读 · 2022年1月6日

A Survey on Edge Intelligence

A Survey on Edge Intelligence

Arxiv

52+阅读 · 2020年3月26日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】行动，规划与学习，622页pdf

美军坦克部队反无人机新策略：主炮轰击方案

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

数据质量维度的实践展开：一项综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Theoretical analysis of edit distance algorithms: an applied perspective

Arxiv

0+阅读 · 2022年4月20日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Low Degree Testing over the Reals

Arxiv

0+阅读 · 2022年4月18日

Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey

Arxiv

0+阅读 · 2022年4月18日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning

Arxiv

0+阅读 · 2022年4月15日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

Machine Learning: Algorithms, Models, and Applications

Arxiv

22+阅读 · 2022年1月6日

A Survey on Edge Intelligence

A Survey on Edge Intelligence

Arxiv

52+阅读 · 2020年3月26日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

相关基金

Notch通路相关基因甲基化对牙发育的调控机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

氧化应激相关蛋白p66Shc和GDF1在砷介导的心脏毒性中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

风电机组独立变桨距系统概率模糊建模与协调优化控制

国家自然科学基金

0+阅读 · 2014年12月31日

miR-29c在乳腺癌Lapatinib耐药中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

无界Petri网分析理论与方法

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

层topos中的拓扑结构与序结构

国家自然科学基金

0+阅读 · 2011年12月31日

癌症相关受体EGFR、Fas、ER和AR与钙调素相互作用的晶体结构研究

国家自然科学基金

1+阅读 · 2009年12月31日

整合猪miRNA和功能基因表达谱芯片元数据挖掘肌肉生长发育新的调控通路

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员