令人后悔的、最佳的 " 互不互不信任的多武装多武装强盗合作社 " 组织 (On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 正则化项 · Agent · CASES · 分解的 ·

2022 年 12 月 1 日

On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits

翻译：令人后悔的、最佳的 " 互不互不信任的多武装多武装强盗合作社 " 组织

Jialin Yi,Milan Vojnović

We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor when the number of arms is large enough relative to degrees of agents in the communication graph. We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter. We present numerical experiments validating our theoretical results and demonstrate cases when our algorithms outperform previously proposed algorithms.

翻译：我们认为,通过通信网络合作的代理商存在非随机多试剂多臂强盗问题,但拖延了。我们对所有代理商的个人悔恨度较低。我们显示,如果有合适的规范者和通信协议,合作的多试剂\emph{Follow-colent-leader}(FTRL)算法具有个人遗憾的上限,在与通信图中的代理商程度相比武器数量足够大时,与较低约束的固定系数相符。我们还表明,配有适当常规化器的FTRL算法对于使用边缘涂料参数进行缩放最为理想。我们展示了验证理论结果的数字实验,并展示了在我们的算法超过先前提议的算法时出现的情况。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

IGF-1R糖基化修饰通过调控内质网应激影响卵巢癌细胞生物学行为的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

亨廷顿结合蛋白HYPB与脯氨酸丰富区的相互作用和调节机制

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

颗粒材料中的偶应力效应及Cosserat介质本构模拟研究

国家自然科学基金

0+阅读 · 2011年12月31日

CA-rich顺式元件及其相互作用的反式因子对可变剪接的调控机制

国家自然科学基金

0+阅读 · 2009年12月31日

PI3K/AKT通路介导FAS与HER2相互作用调控大肠癌恶性表型及其分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

Arxiv

0+阅读 · 2023年2月2日

High-dimensional variable clustering based on sub-asymptotic maxima of a weakly dependent random process

Arxiv

0+阅读 · 2023年2月2日

Rethinking Warm-Starts with Predictions: Learning Predictions Close to Sets of Optimal Solutions for Faster $\text{L}$-/$\text{L}^\natural$-Convex Function Minimization

Arxiv

0+阅读 · 2023年2月2日

Deterministic algorithms for the Lovasz Local Lemma: simpler, more general, and more parallel

Arxiv

0+阅读 · 2023年2月1日

Delayed Feedback in Kernel Bandits

Arxiv

0+阅读 · 2023年2月1日

Max-Quantile Grouped Infinite-Arm Bandits

Arxiv

0+阅读 · 2023年2月1日

Agility and Target Distribution in the Dynamic Stochastic Traveling Salesman Problem

Arxiv

0+阅读 · 2023年2月1日

Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits

Arxiv

0+阅读 · 2023年1月31日

A Dynamic Programming Algorithm for Finding an Optimal Sequence of Informative Measurements

Arxiv

0+阅读 · 2023年1月31日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

Arxiv

0+阅读 · 2023年2月2日

High-dimensional variable clustering based on sub-asymptotic maxima of a weakly dependent random process

Arxiv

0+阅读 · 2023年2月2日

Rethinking Warm-Starts with Predictions: Learning Predictions Close to Sets of Optimal Solutions for Faster $\text{L}$-/$\text{L}^\natural$-Convex Function Minimization

Arxiv

0+阅读 · 2023年2月2日

Deterministic algorithms for the Lovasz Local Lemma: simpler, more general, and more parallel

Arxiv

0+阅读 · 2023年2月1日

Delayed Feedback in Kernel Bandits

Arxiv

0+阅读 · 2023年2月1日

Max-Quantile Grouped Infinite-Arm Bandits

Arxiv

0+阅读 · 2023年2月1日

Agility and Target Distribution in the Dynamic Stochastic Traveling Salesman Problem

Arxiv

0+阅读 · 2023年2月1日

Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits

Arxiv

0+阅读 · 2023年1月31日

A Dynamic Programming Algorithm for Finding an Optimal Sequence of Informative Measurements

Arxiv

0+阅读 · 2023年1月31日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

相关基金

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

IGF-1R糖基化修饰通过调控内质网应激影响卵巢癌细胞生物学行为的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

亨廷顿结合蛋白HYPB与脯氨酸丰富区的相互作用和调节机制

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

颗粒材料中的偶应力效应及Cosserat介质本构模拟研究

国家自然科学基金

0+阅读 · 2011年12月31日

CA-rich顺式元件及其相互作用的反式因子对可变剪接的调控机制

国家自然科学基金

0+阅读 · 2009年12月31日

PI3K/AKT通路介导FAS与HER2相互作用调控大肠癌恶性表型及其分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员