SPEEED: 线性强心强力强力强盗中政策评价的实验设计 (SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 策略评估 · 异方差 · 线性的 · 优化器 ·

2023 年 1 月 29 日

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

翻译：SPEEED: 线性强心强力强力强盗中政策评价的实验设计

Subhojyoti Mukherjee,Qiaomin Xie,Josiah Hanna,Robert Nowak

In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected cumulative reward it will obtain when executed in an environment formalized as a multi-armed bandit. In this paper, we focus on linear bandit setting with heteroscedastic reward noise. This is the first work that focuses on such an optimal data collection strategy for policy evaluation involving heteroscedastic reward noise in the linear bandit setting. We first formulate an optimal design for weighted least squares estimates in the heteroscedastic linear bandit setting that reduces the MSE of the target policy. We term this as policy-weighted least square estimation and use this formulation to derive the optimal behavior policy for data collection. We then propose a novel algorithm SPEED (Structured Policy Evaluation Experimental Design) that tracks the optimal behavior policy and derive its regret with respect to the optimal behavior policy. Finally, we empirically validate that SPEED leads to policy evaluation with mean squared error comparable to the oracle strategy and significantly lower than simply running the target policy.

翻译：在本文中,我们研究了为线性土匪的政策评估收集最佳数据的问题。在政策评估中,我们得到了一项目标政策,要求我们估计在正式确定为多武装土匪的环境中执行时,它将获得的预期累积报酬。在本文中,我们侧重于线性土匪设置,带有超强的奖励噪音。这是在线性土匪环境中,为涉及超强的奖励噪音的政策评估制定最佳数据收集战略的第一份工作。我们首先为超强的线性线性土匪设置中的加权最低平方估计设计了最佳设计,以减少目标政策的MSE。我们将此称为政策加权最低平方估计,并使用这一提法来得出数据收集的最佳行为政策。我们然后提出一个新的新的算法SPEEED(结构化政策评价实验设计),跟踪最佳行为政策,并对最佳行为政策表示遗憾。最后,我们从经验上证实SPEED导致政策评价,其中存在与目标战略相近的中度错误,远远低于仅仅执行目标政策。

0

相关内容

赌博机/老虎机

赌博机/老虎机

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

删失数据超高维共线性模型的变量选择

国家自然科学基金

0+阅读 · 2017年12月31日

曲酸诱导成纤维细胞产生的TGF-β在光动力学治疗黑色素瘤中的增敏机制

国家自然科学基金

0+阅读 · 2016年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

铼离子液体溶液的热力学性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

表面等离激元增强宽光谱InGaN太阳能电池研究

国家自然科学基金

0+阅读 · 2012年12月31日

三维水下传感网部署与组网联合设计

国家自然科学基金

0+阅读 · 2012年12月31日

面向高维数据的稀疏正则化方法及应用

国家自然科学基金

2+阅读 · 2011年12月31日

一维硅纳米（微米）核壳结构阵列PEC光催化制氢

国家自然科学基金

0+阅读 · 2011年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

On lower bounds for the bias-variance trade-off

Arxiv

0+阅读 · 2023年3月20日

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年3月20日

Dictionary-based model reduction for state estimation

Arxiv

0+阅读 · 2023年3月19日

Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning

Arxiv

0+阅读 · 2023年3月18日

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

Arxiv

0+阅读 · 2023年3月17日

An evaluation framework for dimensionality reduction through sectional curvature

Arxiv

0+阅读 · 2023年3月17日

Towards More Objective Evaluation of Class Incremental Learning: Representation Learning Perspective

Arxiv

0+阅读 · 2023年3月17日

A New Covariate Selection Strategy for High Dimensional Data in Causal Effect Estimation with Multivariate Treatments

Arxiv

0+阅读 · 2023年3月17日

Collaborative Pure Exploration in Kernel Bandit

Arxiv

0+阅读 · 2023年3月16日

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Arxiv

1+阅读 · 2023年3月16日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

On lower bounds for the bias-variance trade-off

Arxiv

0+阅读 · 2023年3月20日

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年3月20日

Dictionary-based model reduction for state estimation

Arxiv

0+阅读 · 2023年3月19日

Optimal and Safe Estimation for High-Dimensional Semi-Supervised Learning

Arxiv

0+阅读 · 2023年3月18日

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

Arxiv

0+阅读 · 2023年3月17日

An evaluation framework for dimensionality reduction through sectional curvature

Arxiv

0+阅读 · 2023年3月17日

Towards More Objective Evaluation of Class Incremental Learning: Representation Learning Perspective

Arxiv

0+阅读 · 2023年3月17日

A New Covariate Selection Strategy for High Dimensional Data in Causal Effect Estimation with Multivariate Treatments

Arxiv

0+阅读 · 2023年3月17日

Collaborative Pure Exploration in Kernel Bandit

Arxiv

0+阅读 · 2023年3月16日

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Arxiv

1+阅读 · 2023年3月16日

相关基金

删失数据超高维共线性模型的变量选择

国家自然科学基金

0+阅读 · 2017年12月31日

曲酸诱导成纤维细胞产生的TGF-β在光动力学治疗黑色素瘤中的增敏机制

国家自然科学基金

0+阅读 · 2016年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

铼离子液体溶液的热力学性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

表面等离激元增强宽光谱InGaN太阳能电池研究

国家自然科学基金

0+阅读 · 2012年12月31日

三维水下传感网部署与组网联合设计

国家自然科学基金

0+阅读 · 2012年12月31日

面向高维数据的稀疏正则化方法及应用

国家自然科学基金

2+阅读 · 2011年12月31日

一维硅纳米（微米）核壳结构阵列PEC光催化制氢

国家自然科学基金

0+阅读 · 2011年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员