非政策性评价和政策优化非政策性评价和政策优化的最小值间距 (Minimax Value Interval for Off-Policy Evaluation and Policy Optimization) - 专知论文

会员服务 ·

0

优化器 · 泛函 · 价值函数 · 类别 · 稳健性 ·

2020 年 10 月 11 日

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

翻译：非政策性评价和政策优化非政策性评价和政策优化的最小值间距

Nan Jiang,Jiawei Huang

We study minimax methods for off-policy evaluation (OPE) using value functions and marginalized importance weights. Despite that they hold promises of overcoming the exponential variance in traditional importance sampling, several key problems remain: (1) They require function approximation and are generally biased. For the sake of trustworthy OPE, is there anyway to quantify the biases? (2) They are split into two styles ("weight-learning" vs "value-learning"). Can we unify them? In this paper we answer both questions positively. By slightly altering the derivation of previous methods (one from each style; Uehara et al., 2020), we unify them into a single value interval that comes with a special type of double robustness: when either the value-function or the importance-weight class is well specified, the interval is valid and its length quantifies the misspecification of the other class. Our interval also provides a unified view of and new insights to some recent methods, and we further explore the implications of our results on exploration and exploitation in off-policy policy optimization with insufficient data coverage.

翻译：我们用价值函数和边缘化重要性加权来研究非政策性评估的小型方法(OPE),使用价值函数和边缘化重要性加权数。尽管它们承诺要克服传统重要性抽样中的指数差异,但仍然存在几个关键问题:(1) 它们需要功能近似,而且普遍存在偏差。为了值得信赖的OPE,它们是否真的可以量化偏差?(2) 它们被分为两种形式(“加权学习”与“价值学习”)。我们能否在本文中将它们区分为两个问题?通过稍微改变以前方法的衍生方式(每个类型一个;Uehara等人,2020年),我们将它们统一成一个单一的价值间隔,以特殊的双强度形式出现:(1) 当价值功能或重要性等级类别得到明确确定时,间隔是有效的,其长度可以量化其他类别的具体错误。我们的间隔也为最近的一些方法提供了统一的观点和新的见解,我们进一步探讨我们的结果对非政策性政策优化的探索和开发的影响,而数据覆盖面不足。

0

相关内容

优化器

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知会员服务

80+阅读 · 2020年6月11日

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

专知会员服务

9+阅读 · 2020年6月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【ICML 2019 Tutorials】算法配置：算法设计空间中的学习（Algorithm configuration: learning in the space of algorithm designs），德国弗莱堡大学（University of Freiburg）教授| Frank Hutter，不列颠哥伦比亚大学 (University of British Columbia)| Kevin Leyton Brown

【ICML 2019 Tutorials】算法配置：算法设计空间中的学习（Algorithm configuration: learning in the space of algorithm designs），德国弗莱堡大学（University of Freiburg）教授| Frank Hutter，不列颠哥伦比亚大学 (University of British Columbia)| Kevin Leyton Brown

专知会员服务

8+阅读 · 2019年6月10日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

蒙特卡罗方法(Monte Carlo Methods)

蒙特卡罗方法(Monte Carlo Methods)

数据挖掘入门与实战

6+阅读 · 2018年4月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【回顾】北交大博士：强化学习与策略评估

【回顾】北交大博士：强化学习与策略评估

AI研习社

4+阅读 · 2017年11月11日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

An algorithm for non-convex off-the-grid sparse spike estimation with a minimum separation constraint

An algorithm for non-convex off-the-grid sparse spike estimation with a minimum separation constraint

Arxiv

0+阅读 · 2020年12月2日

A Methodology for Deriving Evaluation Criteria for Software Solutions

Arxiv

0+阅读 · 2020年12月2日

Mutual Information Constraints for Monte-Carlo Objectives

Arxiv

0+阅读 · 2020年12月1日

Approximate Cross-Validation for Structured Models

Approximate Cross-Validation for Structured Models

Arxiv

0+阅读 · 2020年12月1日

Inexact Proximal-Point Penalty Methods for Constrained Non-Convex Optimization

Arxiv

0+阅读 · 2020年12月1日

Piecewise-Stationary Off-Policy Optimization

Arxiv

0+阅读 · 2020年12月1日

Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning

Arxiv

0+阅读 · 2020年12月1日

Approximation algorithms for hitting subgraphs

Arxiv

0+阅读 · 2020年11月29日

Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning

Arxiv

0+阅读 · 2020年11月25日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知会员服务

80+阅读 · 2020年6月11日

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

专知会员服务

9+阅读 · 2020年6月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【ICML 2019 Tutorials】算法配置：算法设计空间中的学习（Algorithm configuration: learning in the space of algorithm designs），德国弗莱堡大学（University of Freiburg）教授| Frank Hutter，不列颠哥伦比亚大学 (University of British Columbia)| Kevin Leyton Brown

【ICML 2019 Tutorials】算法配置：算法设计空间中的学习（Algorithm configuration: learning in the space of algorithm designs），德国弗莱堡大学（University of Freiburg）教授| Frank Hutter，不列颠哥伦比亚大学 (University of British Columbia)| Kevin Leyton Brown

专知会员服务

8+阅读 · 2019年6月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

蒙特卡罗方法(Monte Carlo Methods)

蒙特卡罗方法(Monte Carlo Methods)

数据挖掘入门与实战

6+阅读 · 2018年4月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【回顾】北交大博士：强化学习与策略评估

【回顾】北交大博士：强化学习与策略评估

AI研习社

4+阅读 · 2017年11月11日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

An algorithm for non-convex off-the-grid sparse spike estimation with a minimum separation constraint

An algorithm for non-convex off-the-grid sparse spike estimation with a minimum separation constraint

Arxiv

0+阅读 · 2020年12月2日

A Methodology for Deriving Evaluation Criteria for Software Solutions

Arxiv

0+阅读 · 2020年12月2日

Mutual Information Constraints for Monte-Carlo Objectives

Arxiv

0+阅读 · 2020年12月1日

Approximate Cross-Validation for Structured Models

Approximate Cross-Validation for Structured Models

Arxiv

0+阅读 · 2020年12月1日

Inexact Proximal-Point Penalty Methods for Constrained Non-Convex Optimization

Arxiv

0+阅读 · 2020年12月1日

Piecewise-Stationary Off-Policy Optimization

Arxiv

0+阅读 · 2020年12月1日

Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning

Arxiv

0+阅读 · 2020年12月1日

Approximation algorithms for hitting subgraphs

Arxiv

0+阅读 · 2020年11月29日

Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning

Arxiv

0+阅读 · 2020年11月25日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员