非政策(或反事实)政策(或反事实)差异估计 (High-Confidence Off-Policy (or Counterfactual) Variance Estimation) - 专知论文

会员服务 ·

0

估计/估计量 · 方差 · 评论员 · 总回报 · 期望回报 ·

2021 年 1 月 25 日

High-Confidence Off-Policy (or Counterfactual) Variance Estimation

翻译：非政策(或反事实)政策(或反事实)差异估计

Yash Chandak,Shiv Shankar,Philip S. Thomas

from arxiv, Thirty-fifth AAAI Conference on Artificial Intelligence (AAAI 2021)

Many sequential decision-making systems leverage data collected using prior policies to propose a new policy. For critical applications, it is important that high-confidence guarantees on the new policy's behavior are provided before deployment, to ensure that the policy will behave as desired. Prior works have studied high-confidence off-policy estimation of the expected return, however, high-confidence off-policy estimation of the variance of returns can be equally critical for high-risk applications. In this paper, we tackle the previously open problem of estimating and bounding, with high confidence, the variance of returns from off-policy data

翻译：许多相继决策系统利用先前政策收集的数据来利用以往政策收集的数据来提出新政策。对于关键应用,重要的是在部署之前对新政策的行为提供高度信心保障,以确保政策按预期行事。先前的工作研究了对预期回报的高度信心非政策性估计,然而,对回报差异的高度信心非政策性估计对于高风险应用同样至关重要。在本文件中,我们以高度信心解决了以前尚未解决的问题,即估算和约束从非政策性数据得到的回报差异。

0

相关内容

估计/估计量

估计/估计量

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【CGAN论文笔记强烈推荐】基于CGAN的人脸深度图估计： Face Depth Estimation With Conditional Generative Adversarial Networks

专知会员服务

24+阅读 · 2020年1月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

大神一年100篇论文

大神一年100篇论文

CreateAMind

15+阅读 · 2018年12月31日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Scalable Estimation and Inference with Large-scale or Online Survival Data

Arxiv

0+阅读 · 2021年3月19日

Estimating FARIMA models with uncorrelated but non-independent error terms

Arxiv

0+阅读 · 2021年3月18日

Constructing confidence sets after lasso selection by randomized estimator augmentation

Arxiv

0+阅读 · 2021年3月17日

Regularized Behavior Value Estimation

Arxiv

0+阅读 · 2021年3月17日

GeCo: Quality Counterfactual Explanations in Real Time

Arxiv

0+阅读 · 2021年3月17日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Arxiv

4+阅读 · 2019年2月27日

Automatic Face Aging in Videos via Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年11月27日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【CGAN论文笔记强烈推荐】基于CGAN的人脸深度图估计： Face Depth Estimation With Conditional Generative Adversarial Networks

专知会员服务

24+阅读 · 2020年1月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

军事行动中人工智能系统目标交战的附带损伤评估模型 | 最新文献

NeurIPS 2025 | 自动化所新作速览（一）

美陆军协会（AUSA）2025 年会公布的美国十大武器与防务产品创新

NeurIPS 2025 | 自动化所新作速览（二）

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

大神一年100篇论文

大神一年100篇论文

CreateAMind

15+阅读 · 2018年12月31日

人体姿态估计资源大列表（Human Pose Estimation）

人体姿态估计资源大列表（Human Pose Estimation）

专知

9+阅读 · 2018年10月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Scalable Estimation and Inference with Large-scale or Online Survival Data

Arxiv

0+阅读 · 2021年3月19日

Estimating FARIMA models with uncorrelated but non-independent error terms

Arxiv

0+阅读 · 2021年3月18日

Constructing confidence sets after lasso selection by randomized estimator augmentation

Arxiv

0+阅读 · 2021年3月17日

Regularized Behavior Value Estimation

Arxiv

0+阅读 · 2021年3月17日

GeCo: Quality Counterfactual Explanations in Real Time

Arxiv

0+阅读 · 2021年3月17日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Arxiv

4+阅读 · 2019年2月27日

Automatic Face Aging in Videos via Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年11月27日

微信扫码咨询专知VIP会员