多玩人强盗,不观测碰撞信息 (Multiplayer bandits without observing collision information) - 专知论文

会员服务 ·

0

赌博机/老虎机 · INFORMS · MoDELS · 近似 · ARM ·

2021 年 4 月 4 日

Multiplayer bandits without observing collision information

翻译：多玩人强盗,不观测碰撞信息

Gabor Lugosi,Abbas Mehrabian

from arxiv, To appear in Mathematics of Operations Research. 34 pages

We study multiplayer stochastic multi-armed bandit problems in which the players cannot communicate and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider two feedback models: a model in which the players can observe whether a collision has occurred and a more difficult setup when no collision information is available. We give the first theoretical guarantees for the second model: an algorithm with a logarithmic regret, and an algorithm with a square-root regret type that does not depend on the gaps between the means. For the first model, we give the first square-root regret bounds that do not depend on the gaps. Building on these ideas, we also give an algorithm for reaching approximate Nash equilibria quickly in stochastic anti-coordination games.

翻译：我们研究多个玩家无法沟通的多武装强盗问题,如果两个或两个以上玩家拉起同一个手臂,就会发生碰撞,所涉玩家会得到零报酬。我们考虑了两个反馈模式:一个是玩家可以观察碰撞是否发生的模型,另一个是没有碰撞信息时更难设置。我们给第二个模式提供了第一个理论保证:一个带有对数悔恨的算法,一个不取决于手段间差距的平根遗憾类型的算法。对于第一个模式,我们给出了不依赖差距的第一块平方根遗憾界限。基于这些想法,我们还给出了一种算法,用于在随机反协调游戏中快速达到近似纳什平衡的算法。

0

相关内容

赌博机/老虎机

赌博机/老虎机

无模型强化学习研究综述

专知会员服务

134+阅读 · 2021年3月13日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

专知会员服务

58+阅读 · 2020年10月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

专知会员服务

78+阅读 · 2020年4月13日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

意识是一种数学模式

意识是一种数学模式

CreateAMind

3+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Average-Reward Off-Policy Policy Evaluation with Function Approximation

Arxiv

0+阅读 · 2021年5月27日

2nd-order Updates with 1st-order Complexity

2nd-order Updates with 1st-order Complexity

Arxiv

0+阅读 · 2021年5月27日

Approximate Support Recovery using Codes for Unsourced Multiple Access

Arxiv

0+阅读 · 2021年5月26日

Lenient Regret and Good-Action Identification in Gaussian Process Bandits

Arxiv

0+阅读 · 2021年5月26日

An algorithm-based multiple detection influence measure for high dimensional regression using expectile

Arxiv

0+阅读 · 2021年5月26日

Conservation laws for free-boundary fluid layers

Arxiv

0+阅读 · 2021年5月25日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

无模型强化学习研究综述

专知会员服务

134+阅读 · 2021年3月13日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

专知会员服务

58+阅读 · 2020年10月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

专知会员服务

78+阅读 · 2020年4月13日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

意识是一种数学模式

意识是一种数学模式

CreateAMind

3+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Average-Reward Off-Policy Policy Evaluation with Function Approximation

Arxiv

0+阅读 · 2021年5月27日

2nd-order Updates with 1st-order Complexity

2nd-order Updates with 1st-order Complexity

Arxiv

0+阅读 · 2021年5月27日

Approximate Support Recovery using Codes for Unsourced Multiple Access

Arxiv

0+阅读 · 2021年5月26日

Lenient Regret and Good-Action Identification in Gaussian Process Bandits

Arxiv

0+阅读 · 2021年5月26日

An algorithm-based multiple detection influence measure for high dimensional regression using expectile

Arxiv

0+阅读 · 2021年5月26日

Conservation laws for free-boundary fluid layers

Arxiv

0+阅读 · 2021年5月25日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员