如何与非固定的 $\ mathcal{X} $X} 有武装的土匪赌博毫无遗憾 (How to gamble with non-stationary $\mathcal{X}$-armed bandits and have no regrets) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 回合 · INTERACT · CASE · 向量化 ·

2021 年 1 月 17 日

How to gamble with non-stationary $\mathcal{X}$-armed bandits and have no regrets

翻译：如何与非固定的 $\ mathcal{X} $X} 有武装的土匪赌博毫无遗憾

Valeriy Avanesov

from arxiv, The algorithm is optimized, the theoretical result is more detailed now

In $\mathcal{X}$-armed bandit problem an agent sequentially interacts with environment which yields a reward based on the vector input the agent provides. The agent's goal is to maximise the sum of these rewards across some number of time steps. The problem and its variations have been a subject of numerous studies, suggesting sub-linear and some times optimal strategies. The given paper introduces a novel variation of the problem. We consider an environment, which can abruptly change its behaviour an unknown number of times. To that end we propose a novel strategy and prove it attains sub-linear cumulative regret. Moreover, in case of highly smooth relation between an action and the corresponding reward, the method is nearly optimal. The theoretical result are supported by experimental study.

翻译：在 $\ mathcal{X} $x} 有武装的土匪问题中, 代理人依次与环境发生互动, 根据代理人提供的矢量输入, 产生奖赏。代理人的目标是在一定时间步骤中将这些奖赏的总数最大化。这个问题及其差异已经成为许多研究的主题, 暗示了亚线性和一些时候的最佳战略。上一张论文提出了这个问题的新变异。我们认为环境可以突然改变其行为, 次数不详。为此, 我们提出了一个新战略, 并证明它取得了亚线性累积遗憾。此外, 如果行动与相应奖赏的关系非常顺利, 方法几乎是最佳的。实验性研究为理论结果提供了支持。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【干货书】金融数学概念和计算方法的导论，290页pdf

【干货书】金融数学概念和计算方法的导论，290页pdf

专知会员服务

65+阅读 · 2020年11月16日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Short minimal codes and covering codes via strong blocking sets in projective spaces

Arxiv

0+阅读 · 2021年3月12日

GA for feature selection of EEG heterogeneous data

Arxiv

0+阅读 · 2021年3月12日

Improved Exploration in Factored Average-Reward MDPs

Arxiv

0+阅读 · 2021年3月11日

Distributed Zero-Order Optimization under Adversarial Noise

Arxiv

0+阅读 · 2021年3月10日

On Modeling Human Perceptions of Allocation Policies with Uncertain Outcomes

Arxiv

0+阅读 · 2021年3月10日

On Information Gain and Regret Bounds in Gaussian Process Bandits

Arxiv

0+阅读 · 2021年3月9日

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Arxiv

4+阅读 · 2020年6月20日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【干货书】金融数学概念和计算方法的导论，290页pdf

【干货书】金融数学概念和计算方法的导论，290页pdf

专知会员服务

65+阅读 · 2020年11月16日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Short minimal codes and covering codes via strong blocking sets in projective spaces

Arxiv

0+阅读 · 2021年3月12日

GA for feature selection of EEG heterogeneous data

Arxiv

0+阅读 · 2021年3月12日

Improved Exploration in Factored Average-Reward MDPs

Arxiv

0+阅读 · 2021年3月11日

Distributed Zero-Order Optimization under Adversarial Noise

Arxiv

0+阅读 · 2021年3月10日

On Modeling Human Perceptions of Allocation Policies with Uncertain Outcomes

Arxiv

0+阅读 · 2021年3月10日

On Information Gain and Regret Bounds in Gaussian Process Bandits

Arxiv

0+阅读 · 2021年3月9日

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Arxiv

4+阅读 · 2020年6月20日

Energy-Based Hindsight Experience Prioritization

Arxiv

3+阅读 · 2018年10月8日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员