深海离线政策评价仪器变数倒退问题 (On Instrumental Variable Regression for Deep Offline Policy Evaluation) - 专知论文

会员服务 ·

0

策略评估 · Q网络` · 估计/估计量 · 深度Q网络 · state-of-the-art ·

2022 年 11 月 23 日

On Instrumental Variable Regression for Deep Offline Policy Evaluation

翻译：深海离线政策评价仪器变数倒退问题

Yutian Chen,Liyuan Xu,Caglar Gulcehre,Tom Le Paine,Arthur Gretton,Nando de Freitas,Arnaud Doucet

from arxiv, Accepted by Journal of Machine Learning Research in 11/2022

We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being correlated. Hence, direct minimization of the Bellman error can result in significantly biased Q-function estimates. We explain why fixing the target Q-network in Deep Q-Networks and Fitted Q Evaluation provides a way of overcoming this confounding, thus shedding new light on this popular but not well understood trick in the deep RL literature. An alternative approach to address confounding is to leverage techniques developed in the causality literature, notably instrumental variables (IV). We bring together here the literature on IV and RL by investigating whether IV approaches can lead to improved Q-function estimates. This paper analyzes and compares a wide range of recent IV methods in the context of offline policy evaluation (OPE), where the goal is to estimate the value of a policy using logged data only. By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques. We find empirically that state-of-the-art OPE methods are closely matched in performance by some IV methods such as AGMM, which were not developed for OPE. We open-source all our code and datasets at https://github.com/liyuan9988/IVOPEwithACME.

翻译：我们通过尽量减少平方位贝曼的平均错误来估计国家行动价值(Q-功能)的流行强化学习(RL)战略,通过尽量减少平方位贝曼错误来估计国家行动值(Q-功能),导致一个倒退问题,因为投入和产出噪音相互关联。因此,直接尽量减少贝尔曼错误可能导致严重偏差的Q功能估计。我们解释了为什么在深Q-Networks和适合的Q-评价中确定目标Q-网络提供了克服这一混乱的方法,从而给这一广受欢迎但却在深层RLL文献中不为人所熟知的伎俩带来新的亮点。另一种解决混乱的方法是利用因果关系文献中开发的技术,特别是工具变量变量(IV)。我们在这里汇集了关于IV和RL的文献,调查了IV方法是否可导致改进Q-功能估计。本文分析并比较了离线政策评价(OPE)中近期四种方法的广泛范围,目的是仅仅利用已登录数据来估计政策的价值。我们采用不同的IV技术,我们无法在OP-EME文献中找到先前提出的OP-OP-E方法,尤其是工具,我们通过模型/OP-IMA-Corrial drod destal dest destal 方法,我们找到了O-I-I-I-I-I-ILT-ILO-S-S-S-I-I-S-S-ID-S-S-S-S-ID-ID-ID-S-S-S-S-S-S-S-S-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-

0

相关内容

策略评估

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Numb在肾脏细胞自噬中的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

溶酶体自我稳定通路对中性粒细胞胞外陷阱（NETs）形成的调控效应及机制

国家自然科学基金

0+阅读 · 2015年12月31日

非酒精性脂肪性肝炎肝郁脾虚证TLR4/TRIF通路与代谢组学机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

EB病毒ncRNA在Burkitt淋巴瘤发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Nrf2-Keap1-ARE信号通路在脊髓损伤后胶质瘢痕形成中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

硒拮抗补碘诱导的甲状腺损伤的表观遗传学机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

叶酸对动脉粥样硬化表观遗传学作用机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

TAp73和DNp73在苯并(a)芘诱导的DNA损伤应激反应中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Arxiv

0+阅读 · 2023年1月26日

Variable Selection for Doubly Robust Causal Inference

Arxiv

0+阅读 · 2023年1月26日

Inference in Marginal Structural Models by Automatic Targeted Bayesian and Minimum Loss-Based Estimation

Arxiv

0+阅读 · 2023年1月25日

Adversarial Learning-based Stance Classifier for COVID-19-related Health Policies

Arxiv

0+阅读 · 2023年1月25日

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

Arxiv

0+阅读 · 2023年1月24日

Semiparametric discrete data regression with Monte Carlo inference and prediction

Arxiv

0+阅读 · 2023年1月24日

Using Knowledge Graphs for Performance Prediction of Modular Optimization Algorithms

Arxiv

0+阅读 · 2023年1月24日

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

Arxiv

0+阅读 · 2023年1月23日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

估计/估计量

state-of-the-art

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Arxiv

0+阅读 · 2023年1月26日

Variable Selection for Doubly Robust Causal Inference

Arxiv

0+阅读 · 2023年1月26日

Inference in Marginal Structural Models by Automatic Targeted Bayesian and Minimum Loss-Based Estimation

Arxiv

0+阅读 · 2023年1月25日

Adversarial Learning-based Stance Classifier for COVID-19-related Health Policies

Arxiv

0+阅读 · 2023年1月25日

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

Arxiv

0+阅读 · 2023年1月24日

Semiparametric discrete data regression with Monte Carlo inference and prediction

Arxiv

0+阅读 · 2023年1月24日

Using Knowledge Graphs for Performance Prediction of Modular Optimization Algorithms

Arxiv

0+阅读 · 2023年1月24日

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

Arxiv

0+阅读 · 2023年1月23日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

Numb在肾脏细胞自噬中的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

溶酶体自我稳定通路对中性粒细胞胞外陷阱（NETs）形成的调控效应及机制

国家自然科学基金

0+阅读 · 2015年12月31日

非酒精性脂肪性肝炎肝郁脾虚证TLR4/TRIF通路与代谢组学机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

EB病毒ncRNA在Burkitt淋巴瘤发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Nrf2-Keap1-ARE信号通路在脊髓损伤后胶质瘢痕形成中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

硒拮抗补碘诱导的甲状腺损伤的表观遗传学机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

叶酸对动脉粥样硬化表观遗传学作用机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

TAp73和DNp73在苯并(a)芘诱导的DNA损伤应激反应中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员