Doubly Doubust 在线学习最佳政策评价的间际估计 (Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning) - 专知论文

会员服务 ·

0

估计/估计量 · 策略评估 · 优化器 · 在线 · Learning ·

2023 年 1 月 29 日

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning

翻译：Doubly Doubust 在线学习最佳政策评价的间际估计

Ye Shen,Hengrui Cai,Rui Song

Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instruction on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a problem is particularly challenging due to the dependent data generated in the online environment, the unknown optimal policy, and the complex exploration and exploitation trade-off in the adaptive experiment. In this paper, we aim to overcome these difficulties in policy evaluation for online learning. We explicitly derive the probability of exploration that quantifies the probability of exploring the non-optimal actions under commonly used bandit algorithms. We use this probability to conduct valid inference on the online conditional mean estimator under each action and develop the doubly robust interval estimation (DREAM) method to infer the value under the estimated optimal policy in online learning. The proposed value estimator provides double protection on the consistency and is asymptotically normal with a Wald-type confidence interval provided. Extensive simulations and real data applications are conducted to demonstrate the empirical validity of the proposed DREAM method.

翻译：评估现行政策的业绩在许多领域,例如医学和经济学,对于早期停止在线试验和环境及时反馈提供关键指导,提供关键指导,说明如何及早停止在线试验和环境的及时反馈。在线学习的政策评价通过实时推断最佳政策(即价值)的平均值,从而吸引越来越多的注意力。然而,由于在线环境中产生的依赖性数据、未知的最佳政策以及适应性实验中复杂的探索和开发交易,这一问题特别具有挑战性。在本文中,我们的目标是克服在线学习政策评价中的这些困难。我们明确推断探索的可能性,以量化在常用的匪帮算法下探索非最佳行动的可能性。我们利用这种可能性,对每项行动下的在线有条件平均估计值(即价值)进行合理推论,并制订加倍可靠的间隔估计方法,以推断网上学习的估计最佳政策的价值。拟议的价值估计为一致性提供了双重保护,并且与瓦尔德式信心模型中的拟议数据模拟和模拟期相比,是正常的。模拟和模拟数据与模拟期数据,以模拟方式显示模拟数据的真实性。

0

相关内容

估计/估计量

估计/估计量

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Cystatin C对脑缺血后海马神经发生的影响及机制

国家自然科学基金

0+阅读 · 2014年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

非线性偏微分方程的非线性微分约束

国家自然科学基金

1+阅读 · 2013年12月31日

饱和脂肪酸对胚胎神经系统发育的影响及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

近日节律与代谢疾病

国家自然科学基金

0+阅读 · 2011年12月31日

静脉血栓脱落的危险因素MRI预警实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

早衰因子对β-分泌酶(BACE1)的调控与阿尔兹海默症致病机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

Optimal Partitions for Nonparametric Multivariate Entropy Estimation

Arxiv

0+阅读 · 2023年3月22日

When Doubly Robust Methods Meet Machine Learning for Estimating Treatment Effects from Real-World Data: A Comparative Study

Arxiv

0+阅读 · 2023年3月22日

Truly Bayesian Entropy Estimation

Arxiv

0+阅读 · 2023年3月21日

Cross-Domain Evaluation of a Deep Learning-Based Type Inference System

Arxiv

0+阅读 · 2023年3月21日

Practical Realization of Bessel's Correction for a Bias-Free Estimation of the Auto-Covariance and the Cross-Covariance Functions

Arxiv

0+阅读 · 2023年3月20日

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年3月20日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2023年3月20日

Estimating optimal treatment regimes in survival contexts using an instrumental variable

Arxiv

0+阅读 · 2023年3月18日

Confidence-aware 3D Gaze Estimation and Evaluation Metric

Arxiv

0+阅读 · 2023年3月17日

Bayesian Deep Learning via Subnetwork Inference

Arxiv

10+阅读 · 2021年2月18日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Optimal Partitions for Nonparametric Multivariate Entropy Estimation

Arxiv

0+阅读 · 2023年3月22日

When Doubly Robust Methods Meet Machine Learning for Estimating Treatment Effects from Real-World Data: A Comparative Study

Arxiv

0+阅读 · 2023年3月22日

Truly Bayesian Entropy Estimation

Arxiv

0+阅读 · 2023年3月21日

Cross-Domain Evaluation of a Deep Learning-Based Type Inference System

Arxiv

0+阅读 · 2023年3月21日

Practical Realization of Bessel's Correction for a Bias-Free Estimation of the Auto-Covariance and the Cross-Covariance Functions

Arxiv

0+阅读 · 2023年3月20日

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年3月20日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2023年3月20日

Estimating optimal treatment regimes in survival contexts using an instrumental variable

Arxiv

0+阅读 · 2023年3月18日

Confidence-aware 3D Gaze Estimation and Evaluation Metric

Arxiv

0+阅读 · 2023年3月17日

Bayesian Deep Learning via Subnetwork Inference

Arxiv

10+阅读 · 2021年2月18日

相关基金

Cystatin C对脑缺血后海马神经发生的影响及机制

国家自然科学基金

0+阅读 · 2014年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

非线性偏微分方程的非线性微分约束

国家自然科学基金

1+阅读 · 2013年12月31日

饱和脂肪酸对胚胎神经系统发育的影响及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

近日节律与代谢疾病

国家自然科学基金

0+阅读 · 2011年12月31日

静脉血栓脱落的危险因素MRI预警实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

早衰因子对β-分泌酶(BACE1)的调控与阿尔兹海默症致病机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员