从有限到可计数武装强盗 (From Finite to Countable-Armed Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 相互独立的 · 成比例 · ARM · 均值 ·

2021 年 5 月 22 日

From Finite to Countable-Armed Bandits

翻译：从有限到可计数武装强盗

Anand Kalvit,Assaf Zeevi

We consider a stochastic bandit problem with countably many arms that belong to a finite set of types, each characterized by a unique mean reward. In addition, there is a fixed distribution over types which sets the proportion of each type in the population of arms. The decision maker is oblivious to the type of any arm and to the aforementioned distribution over types, but perfectly knows the total number of types occurring in the population of arms. We propose a fully adaptive online learning algorithm that achieves O(log n) distribution-dependent expected cumulative regret after any number of plays n, and show that this order of regret is best possible. The analysis of our algorithm relies on newly discovered concentration and convergence properties of optimism-based policies like UCB in finite-armed bandit problems with "zero gap," which may be of independent interest.

翻译：我们考虑的是属于一定种类的、具有独特平均奖赏特征的众多武器的巨大盗匪行为问题。此外,对各类武器进行固定分配,确定武器人口中每种类型的比例。决策者忽视了任何类型的武器,并完全了解武器人口中出现的类型总数。我们提出一个完全适应性的在线学习算法,在任何玩耍n之后,实现O(log n)分配依赖分配的累积遗憾,并表明这种遗憾的顺序是最好的。我们对算法的分析依赖于新发现的基于乐观政策的集中和趋同性,如UCBB在“零差距”的有限武装匪帮问题上的集中和趋同性,而“零差距”可能具有独立的兴趣。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

专知会员服务

63+阅读 · 2021年1月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Optimality of the Johnson-Lindenstrauss Dimensionality Reduction for Practical Measures

Arxiv

0+阅读 · 2021年7月14日

Robust Learning of Optimal Auctions

Arxiv

0+阅读 · 2021年7月13日

No Regrets for Learning the Prior in Bandits

Arxiv

0+阅读 · 2021年7月13日

Gaussian process interpolation: the choice of the family of models is more important than that of the selection criterion

Arxiv

0+阅读 · 2021年7月13日

The Fundamental Theorem of Natural Selection

Arxiv

0+阅读 · 2021年7月12日

Metalearning Linear Bandits by Prior Update

Arxiv

0+阅读 · 2021年7月12日

Continuous Time Bandits With Sampling Costs

Arxiv

0+阅读 · 2021年7月12日

Strategy Complexity of Mean Payoff, Total Payoff and Point Payoff Objectives in Countable MDPs

Arxiv

0+阅读 · 2021年7月10日

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

Arxiv

0+阅读 · 2021年7月10日

Safe Learning of Lifted Action Models

Arxiv

0+阅读 · 2021年7月9日

VIP会员

文章信息

相关主题

赌博机/老虎机

相互独立的

相关VIP内容

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

【Manning新书】C++并行实战，592页pdf，C++ Concurrency in Action

专知会员服务

63+阅读 · 2021年1月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Optimality of the Johnson-Lindenstrauss Dimensionality Reduction for Practical Measures

Arxiv

0+阅读 · 2021年7月14日

Robust Learning of Optimal Auctions

Arxiv

0+阅读 · 2021年7月13日

No Regrets for Learning the Prior in Bandits

Arxiv

0+阅读 · 2021年7月13日

Gaussian process interpolation: the choice of the family of models is more important than that of the selection criterion

Arxiv

0+阅读 · 2021年7月13日

The Fundamental Theorem of Natural Selection

Arxiv

0+阅读 · 2021年7月12日

Metalearning Linear Bandits by Prior Update

Arxiv

0+阅读 · 2021年7月12日

Continuous Time Bandits With Sampling Costs

Arxiv

0+阅读 · 2021年7月12日

Strategy Complexity of Mean Payoff, Total Payoff and Point Payoff Objectives in Countable MDPs

Arxiv

0+阅读 · 2021年7月10日

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

Arxiv

0+阅读 · 2021年7月10日

Safe Learning of Lifted Action Models

Arxiv

0+阅读 · 2021年7月9日

微信扫码咨询专知VIP会员