政策选择和最佳武器识别:根据其他加权政策迟缓状态对勘探抽样进行无症状分析 (Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling under Posterior Weighted Policy Regret) - 专知论文

会员服务 ·

0

Weight · ARM · 赌博机/老虎机 · 样本 · 目标函数 ·

2021 年 10 月 20 日

Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling under Posterior Weighted Policy Regret

翻译：政策选择和最佳武器识别:根据其他加权政策迟缓状态对勘探抽样进行无症状分析

Kaito Ariu,Masahiro Kato,Junpei Komiyama,Kenichiro McAlinn,Chao Qin

We consider the "policy choice" problem -- otherwise known as best arm identification in the bandit literature -- proposed by Kasy and Sautmann (2021) for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021) provides three asymptotic results that give theoretical guarantees for exploration sampling developed for this setting. We first show that the proof of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1 (2) are incorrect. We then show, through a counterexample, that Theorem 1 (3) is false. For the former two, we correct the statements and provide rigorous proofs. For Theorem 1 (3), we propose an alternative objective function, which we call posterior weighted policy regret, and derive its asymptotic optimality.

翻译：我们考虑了“政策选择”问题,即卡西和沙特曼(2021年)为适应性实验设计提议的“最佳手臂识别”问题。卡西和沙特曼(2021年)的理论1提供了三种无症状结果,为为为这一环境开发的勘探取样提供了理论保障。我们首先表明,理论1(1)的证据有技术问题,而理论1(2)的证明和陈述不正确。然后,我们通过反例表明,理论1(3)是虚假的。对于前两个,我们纠正了这些陈述并提供严格的证据。对于理论1(3),我们提出了一个替代目标功能,我们称之为后置政策偏重遗憾,并得出其无症状的最佳性。

0

相关内容

Weight

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

已删除

将门创投

4+阅读 · 2017年12月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Budget-limited distribution learning in multifidelity problems

Arxiv

0+阅读 · 2021年12月16日

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

Arxiv

0+阅读 · 2021年12月15日

A Targeted Approach to Confounder Selection for High-Dimensional Data

Arxiv

0+阅读 · 2021年12月15日

Greedy-Step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年12月15日

Bayesian Mendelian randomization with study heterogeneity and data partitioning for large studies

Arxiv

0+阅读 · 2021年12月15日

Measuring the accuracy of likelihood-free inference

Arxiv

0+阅读 · 2021年12月15日

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Arxiv

0+阅读 · 2021年12月15日

Hierarchically Fair Federated Learning

Arxiv

3+阅读 · 2020年5月1日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【ICML2021】核持续学习，Kernel Continual Learning

专知会员服务

32+阅读 · 2021年7月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美陆军五大转型方向

一种Agent自主性风险评估框架 | 最新文献

实时无人机指令处理：一种面向无人机系统的大语言模型方法

基于动态知识图谱的人工智能代理自主研究周期 | 文献

相关资讯

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

已删除

将门创投

4+阅读 · 2017年12月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Budget-limited distribution learning in multifidelity problems

Arxiv

0+阅读 · 2021年12月16日

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

Arxiv

0+阅读 · 2021年12月15日

A Targeted Approach to Confounder Selection for High-Dimensional Data

Arxiv

0+阅读 · 2021年12月15日

Greedy-Step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年12月15日

Bayesian Mendelian randomization with study heterogeneity and data partitioning for large studies

Arxiv

0+阅读 · 2021年12月15日

Measuring the accuracy of likelihood-free inference

Arxiv

0+阅读 · 2021年12月15日

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Arxiv

0+阅读 · 2021年12月15日

Hierarchically Fair Federated Learning

Arxiv

3+阅读 · 2020年5月1日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员