直接有利估算 (Direct Advantage Estimation) - 专知论文

会员服务 ·

0

估计/估计量 · 去噪自编码 · 泛函 · 有向 · 总回报 ·

2023 年 2 月 6 日

Direct Advantage Estimation

翻译：直接有利估算

Hsiao-Ru Pan,Nico Gürtler,Alexander Neitz,Bernhard Schölkopf

from arxiv, Published at NeurIPS 2022

The predominant approach in reinforcement learning is to assign credit to actions based on the expected return. However, we show that the return may depend on the policy in a way which could lead to excessive variance in value estimation and slow down learning. Instead, we show that the advantage function can be interpreted as causal effects and shares similar properties with causal representations. Based on this insight, we propose Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from on-policy data while simultaneously minimizing the variance of the return without requiring the (action-)value function. We also relate our method to Temporal Difference methods by showing how value functions can be seamlessly integrated into DAE. The proposed method is easy to implement and can be readily adapted by modern actor-critic methods. We evaluate DAE empirically on three discrete control domains and show that it can outperform generalized advantage estimation (GAE), a strong baseline for advantage estimation, on a majority of the environments when applied to policy optimization.

翻译：强化学习的主要方法是根据预期回报情况对行动进行信用分配。然而,我们表明,回报可能取决于政策,其方式可能导致价值估计的过度差异,减缓学习速度。相反,我们表明,优势功能可被解释为因果关系效应,与因果表现具有相似的属性。我们提出直接优势估算(DAE),这是一种创新方法,可以模拟优势功能,直接根据政策数据直接估算优势功能,同时根据政策数据进行估算,同时在不要求(行动)价值功能的情况下尽可能缩小回报差异。我们还将我们的方法与时间差异方法联系起来,表明如何将价值功能无缝地纳入DAE。拟议的方法易于实施,并且可以通过现代的行为体激励方法加以随时调整。我们从三个独立的控制领域对DAE进行了经验评估,并表明,在政策优化时,它能够超过优势估计的强基准,即优势估计,在大多数环境中,它能够超过普遍优势估计。

0

相关内容

估计/估计量

估计/估计量

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

CMOS太赫兹热探测器机理及关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于硅工艺的太赫兹芯片级倍频链技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

非线性约束全局优化的新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

太赫兹量子阱光电探测器的光耦合器研究

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin通路介导RELMβ调控糖尿病肾病系膜细胞增殖的机制研究

国家自然科学基金

1+阅读 · 2013年12月31日

超分辨太赫兹近场光学显微术

国家自然科学基金

0+阅读 · 2012年12月31日

石墨法木材表面热解成型机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于流式生物芯片技术的肿瘤标志物对早期肺癌诊断的基础研究

国家自然科学基金

0+阅读 · 2009年12月31日

Two-step estimation of latent trait models

Arxiv

0+阅读 · 2023年3月28日

Brain-inspired bodily self-perception model that replicates the rubber hand illusion

Arxiv

0+阅读 · 2023年3月28日

Human Pose Estimation in Extremely Low-Light Conditions

Arxiv

0+阅读 · 2023年3月27日

Expert Kaplan--Meier estimation

Arxiv

0+阅读 · 2023年3月27日

Towards black-box parameter estimation

Arxiv

0+阅读 · 2023年3月27日

FAStEN: an efficient adaptive method for feature selection and estimation in high-dimensional functional regressions

Arxiv

0+阅读 · 2023年3月26日

Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes

Arxiv

0+阅读 · 2023年3月26日

Demystifying estimands in cluster-randomised trials

Arxiv

0+阅读 · 2023年3月24日

Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

Arxiv

0+阅读 · 2023年3月23日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

VIP会员

文章信息

相关主题

估计/估计量

去噪自编码

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能技术提升军事不确定性环境下领导决策能力研究》180页

以机器速度锁定目标：人工智能的能力与局限

中文版 | 革新国家安全：国防情报离线本地部署大语言模型

《美军21世纪医疗抵消战略》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Two-step estimation of latent trait models

Arxiv

0+阅读 · 2023年3月28日

Brain-inspired bodily self-perception model that replicates the rubber hand illusion

Arxiv

0+阅读 · 2023年3月28日

Human Pose Estimation in Extremely Low-Light Conditions

Arxiv

0+阅读 · 2023年3月27日

Expert Kaplan--Meier estimation

Arxiv

0+阅读 · 2023年3月27日

Towards black-box parameter estimation

Arxiv

0+阅读 · 2023年3月27日

FAStEN: an efficient adaptive method for feature selection and estimation in high-dimensional functional regressions

Arxiv

0+阅读 · 2023年3月26日

Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes

Arxiv

0+阅读 · 2023年3月26日

Demystifying estimands in cluster-randomised trials

Arxiv

0+阅读 · 2023年3月24日

Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

Arxiv

0+阅读 · 2023年3月23日

Decomposed Mutual Information Estimation for Contrastive Representation Learning

Arxiv

11+阅读 · 2021年6月25日

相关基金

聚精氨酸诱导肿瘤微环境的免疫活性及逆转cetuximab耐药性的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

CMOS太赫兹热探测器机理及关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于硅工艺的太赫兹芯片级倍频链技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

非线性约束全局优化的新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

太赫兹量子阱光电探测器的光耦合器研究

国家自然科学基金

0+阅读 · 2013年12月31日

Wnt/β-catenin通路介导RELMβ调控糖尿病肾病系膜细胞增殖的机制研究

国家自然科学基金

1+阅读 · 2013年12月31日

超分辨太赫兹近场光学显微术

国家自然科学基金

0+阅读 · 2012年12月31日

石墨法木材表面热解成型机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于流式生物芯片技术的肿瘤标志物对早期肺癌诊断的基础研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员