具有抽样成本的连续时匪徒 (Continuous Time Bandits With Sampling Costs) - 专知论文

会员服务 ·

0

赌博机/老虎机 · CASE · Continuity · 样本 · ARM ·

2021 年 7 月 12 日

Continuous Time Bandits With Sampling Costs

翻译：具有抽样成本的连续时匪徒

Rahul Vaze,Manjesh K. Hanawal

We consider a continuous-time multi-arm bandit problem (CTMAB), where the learner can sample arms any number of times in a given interval and obtain a random reward from each sample, however, increasing the frequency of sampling incurs an additive penalty/cost. Thus, there is a tradeoff between obtaining large reward and incurring sampling cost as a function of the sampling frequency. The goal is to design a learning algorithm that minimizes regret, that is defined as the difference of the payoff of the oracle policy and that of the learning algorithm. CTMAB is fundamentally different than the usual multi-arm bandit problem (MAB), e.g., even the single-arm case is non-trivial in CTMAB, since the optimal sampling frequency depends on the mean of the arm, which needs to be estimated. We first establish lower bounds on the regret achievable with any algorithm and then propose algorithms that achieve the lower bound up to logarithmic factors. For the single-arm case, we show that the lower bound on the regret is $\Omega((\log T)^2/\mu)$, where $\mu$ is the mean of the arm, and $T$ is the time horizon. For the multiple arms case, we show that the lower bound on the regret is $\Omega((\log T)^2 \mu/\Delta^2)$, where $\mu$ now represents the mean of the best arm, and $\Delta$ is the difference of the mean of the best and the second-best arm. We then propose an algorithm that achieves the bound up to constant terms.

翻译：我们考虑的是连续时间的多武器强盗问题(CTMAB),在这个问题中,学习者可以在给定间隔内对武器做多少次取样,并从每个抽样中获得随机的奖励,但是,增加取样的频率会产生累加惩罚/成本。因此,在获得大量奖励和产生取样成本之间有一个权衡,这是取样频率的一个函数。我们首先在任何算法所能实现的遗憾上设定较低的界限,然后提出达到低于逻辑系数的算法。在单武器案中,我们表明,遗憾的下限是美元(MAB),例如,即使是单武器案件在CTMAB中是非三重的,因为最佳取样频率取决于手臂的平均值。我们首先在任何算法中设定较低的遗憾界限,然后提出达到低于逻辑系数的下限值的算法。对于单武器案,我们表示的下限是美元(MONG(MGT) = 美元 2/\\ mu) 美元,而武器的中位值值值是美元和武器的下限值是平均值。

0

相关内容

赌博机/老虎机

赌博机/老虎机

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【NeurIPS2020】迭代深度图学习的图神经网络:更好和鲁棒的节点嵌入

【NeurIPS2020】迭代深度图学习的图神经网络:更好和鲁棒的节点嵌入

专知会员服务

31+阅读 · 2020年9月30日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019】贝叶斯优化的1-Bit CNNs 《Bayesian Optimized 1-Bit CNNs》

【ICCV 2019】贝叶斯优化的1-Bit CNNs 《Bayesian Optimized 1-Bit CNNs》

专知会员服务

16+阅读 · 2019年11月17日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Optimal pointwise sampling for $L^2$ approximation

Arxiv

0+阅读 · 2021年9月13日

Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions

Arxiv

0+阅读 · 2021年9月13日

Lenient Regret for Multi-Armed Bandits

Arxiv

0+阅读 · 2021年9月12日

Optimal Bounds for the $k$-cut Problem

Arxiv

0+阅读 · 2021年9月12日

Best-Arm Identification in Correlated Multi-Armed Bandits

Arxiv

0+阅读 · 2021年9月10日

Sharp regret bounds for empirical Bayes and compound decision problems

Arxiv

0+阅读 · 2021年9月10日

A Second-Order Nonlocal Approximation for Manifold Poisson Model with Dirichlet Boundary

Arxiv

0+阅读 · 2021年9月8日

Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo

Arxiv

0+阅读 · 2021年9月8日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【NeurIPS2020】迭代深度图学习的图神经网络:更好和鲁棒的节点嵌入

【NeurIPS2020】迭代深度图学习的图神经网络:更好和鲁棒的节点嵌入

专知会员服务

31+阅读 · 2020年9月30日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019】贝叶斯优化的1-Bit CNNs 《Bayesian Optimized 1-Bit CNNs》

【ICCV 2019】贝叶斯优化的1-Bit CNNs 《Bayesian Optimized 1-Bit CNNs》

专知会员服务

16+阅读 · 2019年11月17日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Optimal pointwise sampling for $L^2$ approximation

Arxiv

0+阅读 · 2021年9月13日

Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions

Arxiv

0+阅读 · 2021年9月13日

Lenient Regret for Multi-Armed Bandits

Arxiv

0+阅读 · 2021年9月12日

Optimal Bounds for the $k$-cut Problem

Arxiv

0+阅读 · 2021年9月12日

Best-Arm Identification in Correlated Multi-Armed Bandits

Arxiv

0+阅读 · 2021年9月10日

Sharp regret bounds for empirical Bayes and compound decision problems

Arxiv

0+阅读 · 2021年9月10日

A Second-Order Nonlocal Approximation for Manifold Poisson Model with Dirichlet Boundary

Arxiv

0+阅读 · 2021年9月8日

Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo

Arxiv

0+阅读 · 2021年9月8日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

微信扫码咨询专知VIP会员