对非政策性推断进行适合的 Q- E- EE 评估 (Bootstrapping Fitted Q-Evaluation for Off-Policy Inference) - 专知论文

会员服务 ·

0

自助法/自举法 · 推断 · 估计/估计量 · 子采样 · 策略评估 ·

2022 年 5 月 22 日

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

翻译：对非政策性推断进行适合的 Q- E- EE 评估

Botao Hao,Xiang Ji,Yaqi Duan,Hao Lu,Csaba Szepesvári,Mengdi Wang

from arxiv, Accepted at ICML 2021

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood. In this paper, we study the use of bootstrapping in off-policy evaluation (OPE), and in particular, we focus on the fitted Q-evaluation (FQE) that is known to be minimax-optimal in the tabular and linear-model cases. We propose a bootstrapping FQE method for inferring the distribution of the policy evaluation error and show that this method is asymptotically efficient and distributionally consistent for off-policy statistical inference. To overcome the computation limit of bootstrapping, we further adapt a subsampling procedure that improves the runtime by an order of magnitude. We numerically evaluate the bootrapping method in classical RL environments for confidence interval estimation, estimating the variance of off-policy evaluator, and estimating the correlation between multiple off-policy evaluators.

翻译：在评估批量强化学习质量方面,Bootstrapping提供了灵活有效的方法,但对其理论属性理解较少。在本文中,我们研究了在非政策性评估中使用靴子的情况,特别是我们侧重于在表格式和线性模型中已知最优的适合的Q评价(FQE ) 。我们提出了一种用于推算政策评价错误分布的踢步法,并表明这一方法在非政策性统计推论方面是同样有效且分布一致的。为了克服靴子穿透的计算限制,我们进一步调整了一个子抽样程序,使运行时间按数量顺序加以改进。我们从数字上评估典型的RL环境中的靴子评估方法,以进行信任时间间隔估计,估计非政策性评价人员的差异,并估计多个非政策性评价人员之间的关联性。

0

相关内容

自助法/自举法

自助法/自举法

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

雌激素上调酸敏感离子通道：疼痛性别差异的一个新分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Co基过渡金属合金团簇的结构和磁性理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

Reticulon-1介导的内质网应激在糖尿病肾病发病机制中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

MS-HRM动态定量检测ABC转运蛋白家族启动子甲基化预测胰腺癌多药耐药的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

CD8+CD28- T 细胞在激素非依赖性前列腺癌形成中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

北极冰间水道反演和敏感性试验

国家自然科学基金

0+阅读 · 2012年12月31日

抑郁症认知偏差的神经环路特征与临床意义

国家自然科学基金

0+阅读 · 2012年12月31日

微量Zr、Mg等在Cu-Cr-Zr铜合金时效过程中的作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

ABCG2阳性Langerhans 细胞在角膜缘干细胞移植排斥反应中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年7月8日

Model predictivity assessment: incremental test-set selection and accuracy evaluation

Arxiv

0+阅读 · 2022年7月8日

A Survey on Participant Selection for Federated Learning in Mobile Networks

Arxiv

0+阅读 · 2022年7月8日

Interpretable Deep Causal Learning for Moderation Effects

Arxiv

0+阅读 · 2022年7月7日

On the implied weights of linear regression for causal inference

Arxiv

0+阅读 · 2022年7月7日

Run Time Analysis for Random Local Search on Generalized Majority Functions

Arxiv

0+阅读 · 2022年7月7日

Human Engagement Providing Evaluative and Informative Advice for Interactive Reinforcement Learning

Arxiv

0+阅读 · 2022年7月7日

Profile Matching for the Generalization and Personalization of Causal Inferences

Arxiv

0+阅读 · 2022年7月6日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Few-shot Learning: A Survey

Few-shot Learning: A Survey

Arxiv

363+阅读 · 2019年4月10日

VIP会员

文章信息

相关主题

自助法/自举法

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年7月8日

Model predictivity assessment: incremental test-set selection and accuracy evaluation

Arxiv

0+阅读 · 2022年7月8日

A Survey on Participant Selection for Federated Learning in Mobile Networks

Arxiv

0+阅读 · 2022年7月8日

Interpretable Deep Causal Learning for Moderation Effects

Arxiv

0+阅读 · 2022年7月7日

On the implied weights of linear regression for causal inference

Arxiv

0+阅读 · 2022年7月7日

Run Time Analysis for Random Local Search on Generalized Majority Functions

Arxiv

0+阅读 · 2022年7月7日

Human Engagement Providing Evaluative and Informative Advice for Interactive Reinforcement Learning

Arxiv

0+阅读 · 2022年7月7日

Profile Matching for the Generalization and Personalization of Causal Inferences

Arxiv

0+阅读 · 2022年7月6日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

Few-shot Learning: A Survey

Few-shot Learning: A Survey

Arxiv

363+阅读 · 2019年4月10日

相关基金

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

雌激素上调酸敏感离子通道：疼痛性别差异的一个新分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Co基过渡金属合金团簇的结构和磁性理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

Reticulon-1介导的内质网应激在糖尿病肾病发病机制中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

MS-HRM动态定量检测ABC转运蛋白家族启动子甲基化预测胰腺癌多药耐药的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

CD8+CD28- T 细胞在激素非依赖性前列腺癌形成中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

北极冰间水道反演和敏感性试验

国家自然科学基金

0+阅读 · 2012年12月31日

抑郁症认知偏差的神经环路特征与临床意义

国家自然科学基金

0+阅读 · 2012年12月31日

微量Zr、Mg等在Cu-Cr-Zr铜合金时效过程中的作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

ABCG2阳性Langerhans 细胞在角膜缘干细胞移植排斥反应中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员