平均非政策政策评价回报率和职能接近率 (Average-Reward Off-Policy Policy Evaluation with Function Approximation) - 专知论文

会员服务 ·

0

估计/估计量 · 策略评估 · 泛函 · 近似 · 价值函数 ·

2021 年 1 月 8 日

Average-Reward Off-Policy Policy Evaluation with Function Approximation

翻译：平均非政策政策评价回报率和职能接近率

Shangtong Zhang,Yi Wan,Richard S. Sutton,Shimon Whiteson

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function. For this problem, bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad (Sutton & Barto, 2018). To address the deadly triad, we propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting. In terms of estimating the differential value function, the algorithms are the first convergent off-policy linear function approximation algorithms. In terms of estimating the reward rate, the algorithms are the first convergent off-policy linear function approximation algorithms that do not require estimating the density ratio. We demonstrate empirically the advantage of the proposed algorithms, as well as their nonlinear variants, over a competitive density-ratio-based approach, in a simple domain as well as challenging robot simulation tasks.

翻译：我们考虑在平均回报 MDP 中以函数近似值( FA) 进行政策外评估,在平均回报 MDP 中,目标是估算奖励率和差值功能。对于这个问题,靴式是必需的,并且与政策外学习和 FA 一起,导致致命的三合体( Sutton & Barto, 2018年) 。为了解决致命的三合体,我们建议了两种新奇的算法,在平均回报环境中复制了渐进式TD 算法的可喜成功率。在估计差值函数方面,算法是第一个趋同的离政策直线函数近似算法。在估计奖励率方面,算法是第一个不需要估计密度比例的趋同的离政策性线性函数近效算法。我们从经验上展示了拟议算法及其非线性变法的优势,在简单领域和具有挑战性的机器人模拟任务中,超越了竞争性的密度比法。

0

相关内容

估计/估计量

估计/估计量

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

76+阅读 · 2021年1月23日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

【综述】自动驾驶领域中的强化学习，附18页论文下载

【综述】自动驾驶领域中的强化学习，附18页论文下载

专知会员服务

176+阅读 · 2020年2月8日

【深度估计| 2019最新综述】单目深度估计方法综述（Monocular Depth Estimation: A Survey）

专知会员服务

69+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Arxiv

0+阅读 · 2021年3月8日

Greedy Multi-step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年3月7日

An approximation algorithm for approximation rank

Arxiv

0+阅读 · 2021年3月7日

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

Arxiv

0+阅读 · 2021年3月6日

Over-the-Air Statistical Estimation

Arxiv

0+阅读 · 2021年3月6日

An overview on deep learning-based approximation methods for partial differential equations

Arxiv

0+阅读 · 2021年3月5日

Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach

Arxiv

0+阅读 · 2021年3月5日

Inverse Reinforcement Learning with Explicit Policy Estimates

Arxiv

0+阅读 · 2021年3月4日

Policy Decomposition: Approximate Optimal Control with Suboptimality Estimates

Arxiv

0+阅读 · 2021年3月3日

Stochastic optimization for numerical evaluation of imprecise probabilities

Arxiv

0+阅读 · 2021年3月3日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

76+阅读 · 2021年1月23日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

【综述】自动驾驶领域中的强化学习，附18页论文下载

【综述】自动驾驶领域中的强化学习，附18页论文下载

专知会员服务

176+阅读 · 2020年2月8日

【深度估计| 2019最新综述】单目深度估计方法综述（Monocular Depth Estimation: A Survey）

专知会员服务

69+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

模型提取攻击与防御的系统综述：最新进展与展望

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

【CMU博士论文】用于物理模拟的高效深度学习模型

大模型解决方案白皮书：社交陪伴场景全流程落地指南

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Arxiv

0+阅读 · 2021年3月8日

Greedy Multi-step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年3月7日

An approximation algorithm for approximation rank

Arxiv

0+阅读 · 2021年3月7日

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

Arxiv

0+阅读 · 2021年3月6日

Over-the-Air Statistical Estimation

Arxiv

0+阅读 · 2021年3月6日

An overview on deep learning-based approximation methods for partial differential equations

Arxiv

0+阅读 · 2021年3月5日

Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach

Arxiv

0+阅读 · 2021年3月5日

Inverse Reinforcement Learning with Explicit Policy Estimates

Arxiv

0+阅读 · 2021年3月4日

Policy Decomposition: Approximate Optimal Control with Suboptimality Estimates

Arxiv

0+阅读 · 2021年3月3日

Stochastic optimization for numerical evaluation of imprecise probabilities

Arxiv

0+阅读 · 2021年3月3日

微信扫码咨询专知VIP会员