利用信息借款和基于背景的转换进行非政策评价 (Off-Policy Evaluation Using Information Borrowing and Context-Based Switching) - 专知论文

会员服务 ·

0

估计/估计量 · INFORMS · 可约的 · Performer · NeurIPS 2019 ·

2021 年 12 月 18 日

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

翻译：利用信息借款和基于背景的转换进行非政策评价

Sutanoy Dasgupta,Yabo Niu,Kishan Panaganti,Dileep Kalathil,Debdeep Pati,Bani Mallick

from arxiv, 23 pages, 6 figures, manuscript under review

We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algorithms primarily focus on strategies to reduce the variance of the DR estimator arising from large IPS. We propose a new approach called the Doubly Robust with Information borrowing and Context-based switching (DR-IC) estimator that focuses on reducing both bias and variance. The DR-IC estimator replaces the standard DM estimator with a parametric reward model that borrows information from the 'closer' contexts through a correlation structure that depends on the IPS. The DR-IC estimator also adaptively interpolates between this modified DM estimator and a modified DR estimator based on a context-specific switching rule. We give provable guarantees on the performance of the DR-IC estimator. We also demonstrate the superior performance of the DR-IC estimator compared to the state-of-the-art OPE algorithms on a number of benchmark problems.

翻译：我们考虑了背景土匪的离政策评价问题,目标是利用伐木政策收集的数据估计目标政策的价值。对目标政策最受欢迎的方法,是通过结合直接方法(DM)估计仪和涉及反偏向性分数(IPS)的校正术语而获得的双强估计仪的变体。现有的算法主要侧重于减少大型IPS产生的DR估计仪差异的战略。我们提出了一种新办法,称为“Doubly Robust 与信息借款和基于背景的转换(DR-IC)估计仪,重点是减少偏差和差异。DMS-IC估计仪将标准估计数替换为“DM(DM)估计数”的参数性能模型,通过取决于IPS的关联结构从“Clober”背景中借用信息。DR-IC估计仪的测算仪和基于特定背景的转换规则(DR)调算码(DR-IC)的调控器。我们用SDRAA标准向国家测试基准数字的性能保证。

0

相关内容

估计/估计量

估计/估计量

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

55+阅读 · 2020年8月28日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【新书】MATLAB深度学习与机器学习、神经网络和人工智能（MATLAB Deep Learning With Machine Learning, Neural Networks and Artificial Intelligence），162页pdf，

【新书】MATLAB深度学习与机器学习、神经网络和人工智能（MATLAB Deep Learning With Machine Learning, Neural Networks and Artificial Intelligence），162页pdf，

专知会员服务

92+阅读 · 2020年1月13日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习 DQN 初探之2048

强化学习 DQN 初探之2048

DataFunTalk

7+阅读 · 2019年12月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

多高的AUC才算高？

多高的AUC才算高？

ResysChina

7+阅读 · 2016年12月7日

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

Arxiv

0+阅读 · 2022年2月22日

Batched Dueling Bandits

Arxiv

0+阅读 · 2022年2月22日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年2月22日

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2022年2月21日

A bandit-learning approach to multifidelity approximation

Arxiv

0+阅读 · 2022年2月21日

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

Arxiv

0+阅读 · 2022年2月19日

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Arxiv

0+阅读 · 2022年2月19日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

55+阅读 · 2020年8月28日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【新书】MATLAB深度学习与机器学习、神经网络和人工智能（MATLAB Deep Learning With Machine Learning, Neural Networks and Artificial Intelligence），162页pdf，

【新书】MATLAB深度学习与机器学习、神经网络和人工智能（MATLAB Deep Learning With Machine Learning, Neural Networks and Artificial Intelligence），162页pdf，

专知会员服务

92+阅读 · 2020年1月13日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

强化学习 DQN 初探之2048

强化学习 DQN 初探之2048

DataFunTalk

7+阅读 · 2019年12月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

多高的AUC才算高？

多高的AUC才算高？

ResysChina

7+阅读 · 2016年12月7日

相关论文

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

Arxiv

0+阅读 · 2022年2月22日

Batched Dueling Bandits

Arxiv

0+阅读 · 2022年2月22日

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

Arxiv

0+阅读 · 2022年2月22日

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2022年2月21日

A bandit-learning approach to multifidelity approximation

Arxiv

0+阅读 · 2022年2月21日

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

Arxiv

0+阅读 · 2022年2月19日

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Arxiv

0+阅读 · 2022年2月19日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员