直接有利估算 (Direct Advantage Estimation) - 专知论文

会员服务 ·

0

估计/估计量 · 去噪自编码 · 泛函 · 有向 · Learning ·

2022 年 8 月 15 日

Direct Advantage Estimation

翻译：直接有利估算

Hsiao-Ru Pan,Nico Gürtler,Alexander Neitz,Bernhard Schölkopf

The predominant approach in reinforcement learning is to assign credit to actions based on the expected return. However, we show that the return may depend on the policy in a way which could lead to excessive variance in value estimation and slow down learning. Instead, we show that the advantage function can be interpreted as causal effects and shares similar properties with causal representations. Based on this insight, we propose Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from on-policy data while simultaneously minimizing the variance of the return without requiring the (action-)value function. We also relate our method to Temporal Difference methods by showing how value functions can be seamlessly integrated into DAE. The proposed method is easy to implement and can be readily adapted by modern actor-critic methods. We evaluate DAE empirically on three discrete control domains and show that it can outperform generalized advantage estimation (GAE), a strong baseline for advantage estimation, on a majority of the environments when applied to policy optimization.

翻译：强化学习的主要方法是根据预期回报情况对行动进行信用分配。然而,我们表明,回报可能取决于政策,其方式可能导致价值估计的过度差异,减缓学习速度。相反,我们表明,优势功能可被解释为因果关系效应,与因果表现具有相似的属性。我们提出直接优势估算(DAE),这是一种创新方法,可以模拟优势功能,直接根据政策数据直接估算优势功能,同时根据政策数据进行估算,同时在不要求(行动)价值功能的情况下尽可能缩小回报差异。我们还将我们的方法与时间差异方法联系起来,表明如何将价值功能无缝地纳入DAE。拟议的方法易于实施,并且可以通过现代的行为体激励方法加以随时调整。我们从三个独立的控制领域对DAE进行了经验评估,并表明,在政策优化时,它能够超过优势估计的强基准,即优势估计,在大多数环境中,它能够超过普遍优势估计。

0

相关内容

估计/估计量

估计/估计量

【CVPR 2022】提出一种基于Shapley value的ShapPruning后门去除算法，Few-shot Backdoor Defense Using Shapley Estimation

【CVPR 2022】提出一种基于Shapley value的ShapPruning后门去除算法，Few-shot Backdoor Defense Using Shapley Estimation

专知会员服务

7+阅读 · 2022年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

酿酒酵母酪蛋白激酶CK2参与调控过氧化物酶体生物发生的分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Akt磷酸化Prohibitin介导其线粒体转位促进膀胱癌的增殖

国家自然科学基金

0+阅读 · 2014年12月31日

基于氧气荧光探针的量子点/金属有机框架纳米复合材料的可控合成及性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

STIM1突变与核浆钙信号调控

国家自然科学基金

0+阅读 · 2012年12月31日

基于细胞凋亡抑制途径的酵母耐铝性及其胞内钙信号调控分子机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Heterogeneous Treatment Effect Estimation for Observational Data using Model-based Forests

Arxiv

0+阅读 · 2022年10月6日

Robust Estimation of Loss-Based Measures of Model Performance under Covariate Shift

Arxiv

0+阅读 · 2022年10月5日

PlaneDepth: Plane-Based Self-Supervised Monocular Depth Estimation

Arxiv

0+阅读 · 2022年10月4日

Robust Prediction Error Estimation with Monte-Carlo Methodology

Arxiv

0+阅读 · 2022年10月2日

Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls

Arxiv

0+阅读 · 2022年10月2日

VIP会员

文章信息

相关主题

估计/估计量

去噪自编码

相关VIP内容

【CVPR 2022】提出一种基于Shapley value的ShapPruning后门去除算法，Few-shot Backdoor Defense Using Shapley Estimation

【CVPR 2022】提出一种基于Shapley value的ShapPruning后门去除算法，Few-shot Backdoor Defense Using Shapley Estimation

专知会员服务

7+阅读 · 2022年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

【书籍】从零开始构建文本生成图像生成器：基于 Transformers 与扩散模型

人工智能与未来指挥

【伯克利博士论文】将大语言模型绑定至虚拟人格：实现人类行为模拟

稀疏自编码器综述：解释大语言模型的内部机制

相关资讯

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Heterogeneous Treatment Effect Estimation for Observational Data using Model-based Forests

Arxiv

0+阅读 · 2022年10月6日

Robust Estimation of Loss-Based Measures of Model Performance under Covariate Shift

Arxiv

0+阅读 · 2022年10月5日

PlaneDepth: Plane-Based Self-Supervised Monocular Depth Estimation

Arxiv

0+阅读 · 2022年10月4日

Robust Prediction Error Estimation with Monte-Carlo Methodology

Arxiv

0+阅读 · 2022年10月2日

Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls

Arxiv

0+阅读 · 2022年10月2日

相关基金

酿酒酵母酪蛋白激酶CK2参与调控过氧化物酶体生物发生的分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Akt磷酸化Prohibitin介导其线粒体转位促进膀胱癌的增殖

国家自然科学基金

0+阅读 · 2014年12月31日

基于氧气荧光探针的量子点/金属有机框架纳米复合材料的可控合成及性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

STIM1突变与核浆钙信号调控

国家自然科学基金

0+阅读 · 2012年12月31日

基于细胞凋亡抑制途径的酵母耐铝性及其胞内钙信号调控分子机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员