通过适应性权衡从背景强盗获取的数据进行非政策评价 (Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits) - 专知论文

会员服务 ·

0

估计/估计量 · Weight · 上下文赌博机/上下文老虎机 · 赌博机/老虎机 · 方差 ·

2021 年 6 月 3 日

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

翻译：通过适应性权衡从背景强盗获取的数据进行非政策评价

Ruohan Zhan,Vitor Hadad,David A. Hirshberg,Susan Athey

It has become increasingly common for data to be collected adaptively, for example using contextual bandits. Historical data of this type can be used to evaluate other treatment assignment policies to guide future innovation or experiments. However, policy evaluation is challenging if the target policy differs from the one used to collect data, and popular estimators, including doubly robust (DR) estimators, can be plagued by bias, excessive variance, or both. In particular, when the pattern of treatment assignment in the collected data looks little like the pattern generated by the policy to be evaluated, the importance weights used in DR estimators explode, leading to excessive variance. In this paper, we improve the DR estimator by adaptively weighting observations to control its variance. We show that a t-statistic based on our improved estimator is asymptotically normal under certain conditions, allowing us to form confidence intervals and test hypotheses. Using synthetic data and public benchmarks, we provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.

翻译：以适应性方式收集数据,例如使用背景强盗,已变得日益普遍。这种历史数据可用于评价其他治疗分配政策,以指导未来的创新或实验。然而,如果目标政策不同于收集数据的政策,政策评价则具有挑战性。包括双强(DR)估计员在内的大众估计员可能会受到偏见、过度差异或两者兼而有之的困扰。特别是,所收集数据的治疗分配模式与所要评估的政策模式几乎不同,DR估计员使用的重要权重爆炸,导致过度差异。在本文中,我们通过适应性加权观测来改进DR估计值,以控制其差异。我们表明,在某些条件下,基于我们改进的估测数的统计数据过于正常,使我们能够形成信任间隔和测试假象。我们利用合成数据和公共基准,为我们的估计员提高准确性和与现有替代物相比的推断性提供了经验证据。

0

相关内容

估计/估计量

估计/估计量

【ICML2021】基于经典迭代算法的图神经网络

专知会员服务

30+阅读 · 2021年5月21日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

专知会员服务

9+阅读 · 2019年12月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Ranker-agnostic Contextual Position Bias Estimation

Arxiv

0+阅读 · 2021年7月28日

Causal inference methods for combining randomized trials and observational studies: a review

Arxiv

0+阅读 · 2021年7月28日

Information fusion between knowledge and data in Bayesian network structure learning

Information fusion between knowledge and data in Bayesian network structure learning

Arxiv

0+阅读 · 2021年7月27日

Invariant Representation Learning for Treatment Effect Estimation

Arxiv

0+阅读 · 2021年7月27日

Extrapolation Estimation for Nonparametric Regression with Measurement Error

Arxiv

0+阅读 · 2021年7月27日

Adaptive Social Learning

Arxiv

0+阅读 · 2021年7月26日

A Real Time Monitoring Approach for Bivariate Event Data

Arxiv

0+阅读 · 2021年7月26日

Adaptive Estimation and Uniform Confidence Bands for Nonparametric IV

Arxiv

0+阅读 · 2021年7月25日

Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support

Arxiv

0+阅读 · 2021年7月24日

An Adaptive State Aggregation Algorithm for Markov Decision Processes

Arxiv

0+阅读 · 2021年7月23日

VIP会员

文章信息

相关主题

估计/估计量

上下文赌博机/上下文老虎机

赌博机/老虎机

相关VIP内容

【ICML2021】基于经典迭代算法的图神经网络

专知会员服务

30+阅读 · 2021年5月21日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

专知会员服务

9+阅读 · 2019年12月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Ranker-agnostic Contextual Position Bias Estimation

Arxiv

0+阅读 · 2021年7月28日

Causal inference methods for combining randomized trials and observational studies: a review

Arxiv

0+阅读 · 2021年7月28日

Information fusion between knowledge and data in Bayesian network structure learning

Information fusion between knowledge and data in Bayesian network structure learning

Arxiv

0+阅读 · 2021年7月27日

Invariant Representation Learning for Treatment Effect Estimation

Arxiv

0+阅读 · 2021年7月27日

Extrapolation Estimation for Nonparametric Regression with Measurement Error

Arxiv

0+阅读 · 2021年7月27日

Adaptive Social Learning

Arxiv

0+阅读 · 2021年7月26日

A Real Time Monitoring Approach for Bivariate Event Data

Arxiv

0+阅读 · 2021年7月26日

Adaptive Estimation and Uniform Confidence Bands for Nonparametric IV

Arxiv

0+阅读 · 2021年7月25日

Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support

Arxiv

0+阅读 · 2021年7月24日

An Adaptive State Aggregation Algorithm for Markov Decision Processes

Arxiv

0+阅读 · 2021年7月23日

微信扫码咨询专知VIP会员