在共同转变下进行非政策评价和外部有效性学习 (Off-Policy Evaluation and Learning for External Validity under a Covariate Shift) - 专知论文

会员服务 ·

0

协变量偏移 · 估计/估计量 · 学成 · 稳健性 · 极大 ·

2020 年 10 月 15 日

Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

翻译：在共同转变下进行非政策评价和外部有效性学习

Masahiro Kato,Masatoshi Uehara,Shota Yasui

We consider the evaluation and training of a new policy for the evaluation data by using the historical data obtained from a different policy. The goal of off-policy evaluation (OPE) is to estimate the expected reward of a new policy over the evaluation data, and that of off-policy learning (OPL) is to find a new policy that maximizes the expected reward over the evaluation data. Although the standard OPE and OPL assume the same distribution of covariate between the historical and evaluation data, there often exists a problem of a covariate shift, i.e., the distribution of the covariate of the historical data is different from that of the evaluation data. In this paper, we derive the efficiency bound of OPE under a covariate shift. Then, we propose doubly robust and efficient estimators for OPE and OPL under a covariate shift by using an estimator of the density ratio between the distributions of the historical and evaluation data. We also discuss other possible estimators and compare their theoretical properties. Finally, we confirm the effectiveness of the proposed estimators through experiments.

翻译：我们考虑通过使用从不同政策中获得的历史数据对评价数据的新政策进行评价和培训。非政策评价的目标是估计一项新政策对评价数据以及非政策学习(OPL)的预期奖励,以便找到一项新的政策,使预期的奖励最大化于评价数据。虽然标准OPE和OPL在历史数据和评价数据之间假定了相同的共差分布,但经常存在一个共变变化的问题,即历史数据共变的分布与评价数据的不同。在本文件中,我们通过共变变化的变化获得OPE的效率约束。然后,我们建议对OPE和OPL进行双倍有力和高效的估算,同时使用历史数据与评价数据分布的密度比例的估算器。我们还讨论其他可能的估算器并比较其理论属性。最后,我们通过实验确认了拟议的估算器的有效性。

0

相关内容

协变量偏移

协变量偏移

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【2020Manning新书】人工智能成功之道，272页pdf，Succeeding with AI

【2020Manning新书】人工智能成功之道，272页pdf，Succeeding with AI

专知会员服务

99+阅读 · 2020年3月8日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

71+阅读 · 2020年2月29日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

3+阅读 · 2017年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Causal query in observational data with hidden variables

Arxiv

2+阅读 · 2020年11月24日

On default priors for robust Bayesian estimation with divergences

Arxiv

0+阅读 · 2020年11月24日

Data Driven Robust Estimation Methods for Fixed Effects Panel Data Models

Arxiv

0+阅读 · 2020年11月22日

Scale estimation and data-driven tuning constant selection for M-quantile regression

Arxiv

0+阅读 · 2020年11月20日

Adaptive regression with Brownian path covariate

Arxiv

0+阅读 · 2020年11月20日

Variational Bayes for high-dimensional linear regression with sparse priors

Arxiv

0+阅读 · 2020年11月19日

A general theory of regression adjustment for covariate-adaptive randomization: OLS, Lasso, and beyond

Arxiv

0+阅读 · 2020年11月19日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

协变量偏移

估计/估计量

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【2020Manning新书】人工智能成功之道，272页pdf，Succeeding with AI

【2020Manning新书】人工智能成功之道，272页pdf，Succeeding with AI

专知会员服务

99+阅读 · 2020年3月8日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《面向无人机集群的避障动态传感器覆盖算法》最新38页

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

相关资讯

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

71+阅读 · 2020年2月29日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

3+阅读 · 2017年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Causal query in observational data with hidden variables

Arxiv

2+阅读 · 2020年11月24日

On default priors for robust Bayesian estimation with divergences

Arxiv

0+阅读 · 2020年11月24日

Data Driven Robust Estimation Methods for Fixed Effects Panel Data Models

Arxiv

0+阅读 · 2020年11月22日

Scale estimation and data-driven tuning constant selection for M-quantile regression

Arxiv

0+阅读 · 2020年11月20日

Adaptive regression with Brownian path covariate

Arxiv

0+阅读 · 2020年11月20日

Variational Bayes for high-dimensional linear regression with sparse priors

Arxiv

0+阅读 · 2020年11月19日

A general theory of regression adjustment for covariate-adaptive randomization: OLS, Lasso, and beyond

Arxiv

0+阅读 · 2020年11月19日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员