无数武装匪徒 (Rotting Infinitely Many-armed Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · ARM · 阈值 · 无限 · Continuity ·

2022 年 7 月 13 日

Rotting Infinitely Many-armed Bandits

翻译：无数武装匪徒

Jung-hun Kim,Milan Vojnovic,Se-Young Yun

We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $\Omega(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound $\tilde{O}(\max\{\varrho^{1/3}T,\sqrt{T}\})$, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max\{\varrho^{1/3}T,T^{3/4}\})$ regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value.

翻译：我们考虑的是无尽多武装匪徒的腐烂奖赏问题,在这种奖赏中,手臂每拉手臂的平均奖赏根据任意的趋势下降,最高腐烂率为$\varrho=o(1)美元。我们表明,这一学习问题有美元(Omega) ($maxávarrho=1/3}T,Sqrt{T}) 最差的情况是最低约束($T是地平线时间) 。我们还表明,一个匹配的上限($\tilde{O}(max ⁇ varrho}1/3}T,\sqrt{T}($),最高到一个多对数系数,可以通过一种算法实现,即每个手臂使用UCB指数和一个阈值来决定是继续拉一只胳膊还是把手臂从进一步考虑中除掉,当算法知道最大腐烂率$($)的价值时,当算法知道最大腐烂率值($_varrho}(max viráró}($__1/3}T,T,T_3/4_r_xrxxxrxrxrxrxxxxxrxxxxxxxxxxxxxxxxxx) 可以通过一个调整指数,可以通过一个不理解调定值。

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

DEM构建的多面函数抗差插值算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

有机－金属－配体自组装体系中有序介孔金属有机骨架化合物的合成及性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于几何约束lifting技术的细分小波变换研究

国家自然科学基金

0+阅读 · 2009年12月31日

CPS标准下AGC的最优松驰控制及其马尔可夫决策过程

国家自然科学基金

1+阅读 · 2008年12月31日

Optimal Physical Sorting of Mobile Agents

Arxiv

0+阅读 · 2022年9月6日

When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits

Arxiv

0+阅读 · 2022年9月6日

Multi-Armed Bandits with Self-Information Rewards

Arxiv

0+阅读 · 2022年9月6日

Differentially-private Distributed Algorithms for Aggregative Games with Guaranteed Convergence

Arxiv

0+阅读 · 2022年9月3日

MaxWeight With Discounted UCB: A Provably Stable Scheduling Policy for Nonstationary Multi-Server Systems With Unknown Statistics

Arxiv

0+阅读 · 2022年9月2日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

锚定情报：合成欺骗时代的地面真相

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Optimal Physical Sorting of Mobile Agents

Arxiv

0+阅读 · 2022年9月6日

When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits

Arxiv

0+阅读 · 2022年9月6日

Multi-Armed Bandits with Self-Information Rewards

Arxiv

0+阅读 · 2022年9月6日

Differentially-private Distributed Algorithms for Aggregative Games with Guaranteed Convergence

Arxiv

0+阅读 · 2022年9月3日

MaxWeight With Discounted UCB: A Provably Stable Scheduling Policy for Nonstationary Multi-Server Systems With Unknown Statistics

Arxiv

0+阅读 · 2022年9月2日

相关基金

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

DEM构建的多面函数抗差插值算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

有机－金属－配体自组装体系中有序介孔金属有机骨架化合物的合成及性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于几何约束lifting技术的细分小波变换研究

国家自然科学基金

0+阅读 · 2009年12月31日

CPS标准下AGC的最优松驰控制及其马尔可夫决策过程

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员