在有未知差异的两架高斯大盗中,使用增强的反概率加权反比重模拟器进行最优化固定预算最佳武器识别 (Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Weighting Estimator in Two-Armed Gaussian Bandits with Unknown Variances) - 专知论文

会员服务 ·

0

估计/估计量 · 赌博机/老虎机 · Weight · 优化器 · ARM ·

2022 年 1 月 21 日

Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Weighting Estimator in Two-Armed Gaussian Bandits with Unknown Variances

翻译：在有未知差异的两架高斯大盗中,使用增强的反概率加权反比重模拟器进行最优化固定预算最佳武器识别

Masahiro Kato,Kaito Ariu,Masaaki Imaizumi,Masatoshi Uehara,Masahiro Nomura,Chao Qin

We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances. The tightest lower bound on the complexity and an algorithm whose performance guarantee matches the lower bound have long been open problems when the variances are unknown and when the algorithm is agnostic to the optimal proportion of the arm draws. In this paper, we propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws and a recommendation rule using the augmented inverse probability weighting (AIPW) estimator, which is often used in the causal inference literature. We refer to our strategy as the RS-AIPW strategy. In the theoretical analysis, we first derive a large deviation principle for martingales, which can be used when the second moment converges in mean, and apply it to our proposed strategy. Then, we show that the proposed strategy is asymptotically optimal in the sense that the probability of misidentification achieves the lower bound by Kaufmann et al. (2016) when the sample size becomes infinitely large and the gap between the two arms goes to zero.

翻译：我们认为,在两只手持两只手的高斯山土匪中,固定预算最佳手臂识别问题差异不明。在复杂程度和算法上,其性能保证与较低界限相符的算法最窄的界限长期以来一直是开放的问题,因为差异不为人知,而且算法对手臂抽取的最佳比例是不可知的。在本文件中,我们提出了一个战略,其中包括一个抽样规则,按照估计目标分配的手臂抽取概率随机抽样(RS),以及一项建议规则,使用增加的反概率加权(AIPW)估测器(AIPW),这在因果关系文献中经常使用。我们把我们的战略称为RS-AIPW战略。在理论分析中,我们首先得出了一种巨大的马丁果偏离原则,可以在第二个时刻达到平均值时使用,然后将它应用于我们提出的战略。然后,我们表明,拟议的战略在确定误差的可能性达到Kaufmann et al. (1988) 和 Kaufmann et al. (1988) 的较低约束程度时,我们从微大和两只武器之间的距离为零时,因此认为最佳的概率最佳。

0

相关内容

估计/估计量

估计/估计量

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【机器学习与深度学习基础性算法】Foundational ML and DL Algorithms

【机器学习与深度学习基础性算法】Foundational ML and DL Algorithms

专知会员服务

34+阅读 · 2019年12月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

极大似然minwise哈希估计子研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于几何哈希算法的酶设计

国家自然科学基金

0+阅读 · 2013年12月31日

算术估算策略运用的认知与神经基础

国家自然科学基金

0+阅读 · 2012年12月31日

基因多态位点预测云南非小细胞肺癌预后风险研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于不确定性理论的遥感图像几何校正模型优化及定位精度评价方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

设施选址问题基于线性规划的近似算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

整合常见和罕见变异进行肺癌风险预测的统计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的若干参数及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

网络公用存储的可靠性与灾备技术

国家自然科学基金

0+阅读 · 2011年12月31日

量子秘密共享若干关键问题研究

国家自然科学基金

0+阅读 · 2011年12月31日

Age Optimal Sampling Under Unknown Delay Statistics

Age Optimal Sampling Under Unknown Delay Statistics

Arxiv

0+阅读 · 2022年4月20日

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Arxiv

0+阅读 · 2022年4月18日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Optimal Coding Theorems in Time-Bounded Kolmogorov Complexity

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

M-Estimation based on quasi-processes from discrete samples of Levy processes

Arxiv

0+阅读 · 2022年4月18日

Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect

Arxiv

0+阅读 · 2022年4月15日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Arxiv

0+阅读 · 2022年4月15日

Proximal nested sampling for high-dimensional Bayesian model selection

Proximal nested sampling for high-dimensional Bayesian model selection

Arxiv

0+阅读 · 2022年4月15日

A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation

Arxiv

0+阅读 · 2022年4月15日

VIP会员

文章信息

相关主题

估计/估计量

赌博机/老虎机

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【机器学习与深度学习基础性算法】Foundational ML and DL Algorithms

【机器学习与深度学习基础性算法】Foundational ML and DL Algorithms

专知会员服务

34+阅读 · 2019年12月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Age Optimal Sampling Under Unknown Delay Statistics

Age Optimal Sampling Under Unknown Delay Statistics

Arxiv

0+阅读 · 2022年4月20日

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Arxiv

0+阅读 · 2022年4月18日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Optimal Coding Theorems in Time-Bounded Kolmogorov Complexity

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

M-Estimation based on quasi-processes from discrete samples of Levy processes

Arxiv

0+阅读 · 2022年4月18日

Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect

Arxiv

0+阅读 · 2022年4月15日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Arxiv

0+阅读 · 2022年4月15日

Proximal nested sampling for high-dimensional Bayesian model selection

Proximal nested sampling for high-dimensional Bayesian model selection

Arxiv

0+阅读 · 2022年4月15日

A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation

Arxiv

0+阅读 · 2022年4月15日

相关基金

极大似然minwise哈希估计子研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于几何哈希算法的酶设计

国家自然科学基金

0+阅读 · 2013年12月31日

算术估算策略运用的认知与神经基础

国家自然科学基金

0+阅读 · 2012年12月31日

基因多态位点预测云南非小细胞肺癌预后风险研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于不确定性理论的遥感图像几何校正模型优化及定位精度评价方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

设施选址问题基于线性规划的近似算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

整合常见和罕见变异进行肺癌风险预测的统计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的若干参数及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

网络公用存储的可靠性与灾备技术

国家自然科学基金

0+阅读 · 2011年12月31日

量子秘密共享若干关键问题研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员