分权合作社土匪小组中的合作伙伴-公有企业的比值 (Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Extensibility · 上置信界限 · TEAM · 置信度 ·

2021 年 10 月 2 日

Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

翻译：分权合作社土匪小组中的合作伙伴-公有企业的比值

Erdem Bıyık,Anusha Lalitha,Rajarshi Saha,Andrea Goldsmith,Dorsa Sadigh

from arxiv, 14 pages, 14 figures. To be presented at "Artificial Intelligence for Human-Robot Interaction (AI-HRI) at AAAI Fall Symposium Series"

When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on the entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively collaborate in a similar way by capturing a model of their partners. In this work, we propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent collaboration. We demonstrate that na\"ive extensions of single-agent optimal MAB algorithms fail when applied for decentralized bandit teams. Instead, we propose a Partner-Aware strategy for joint sequential decision-making that extends the well-known single-agent Upper Confidence Bound algorithm. We analytically show that our proposed strategy achieves logarithmic regret, and provide extensive experiments involving human-AI and human-robot collaboration to validate our theoretical findings. Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.

翻译：当人类彼此合作时,他们往往通过观察他人来作出决定,并思考其行动对整个团队可能带来的后果,而不是贪婪地为自身着想。我们希望我们的AI代理机构通过捕捉其伙伴的模型来以类似的方式有效地合作。在这项工作中,我们提议和分析一个分散的多武装盗匪(MAB)问题,同时将奖励作为更一般的多剂合作的抽象概念。我们证明,在应用分散的土匪团队时,单剂最佳MAB算法的反向扩展失败。相反,我们提议了一个伙伴-软件战略,用于联合的顺序决策,以扩展众所周知的单一代理人的高度信任包件算法。我们分析性地表明,我们提出的战略实现了对论的遗憾,并提供了涉及人类-AI和人类-机器人合作的广泛实验,以验证我们的理论发现。我们的结果表明,拟议的伙伴-认识战略超越了其他已知的方法,我们的人类专题研究表明,人类更愿意与执行我们的伙伴-觉知战略的AI代理人合作。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

专知会员服务

124+阅读 · 2020年12月7日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

KDD 2020关于深度推荐系统与CTR预估工业界必读的论文

KDD 2020关于深度推荐系统与CTR预估工业界必读的论文

AINLP

4+阅读 · 2020年6月30日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Reinforcement Learning for General LTL Objectives Is Intractable

Reinforcement Learning for General LTL Objectives Is Intractable

Arxiv

0+阅读 · 2021年11月24日

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

Arxiv

0+阅读 · 2021年11月24日

Learning Strategies in Decentralized Matching Markets under Uncertain Preferences

Arxiv

0+阅读 · 2021年11月23日

Status-quo policy gradient in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年11月23日

A Modular Framework for Centrality and Clustering in Complex Networks

Arxiv

0+阅读 · 2021年11月23日

Optimal Content Caching and Recommendation with Age of Information

Arxiv

0+阅读 · 2021年11月23日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Generative Graph Convolutional Network for Growing Graphs

Generative Graph Convolutional Network for Growing Graphs

Arxiv

3+阅读 · 2019年3月6日

On Improving Decentralized Hysteretic Deep Reinforcement Learning

On Improving Decentralized Hysteretic Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年12月15日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

赌博机/老虎机

上置信界限

相关VIP内容

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

【斯坦福新书】决策算法，464页pdf，Algorithms for Decision Making

专知会员服务

124+阅读 · 2020年12月7日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

相关资讯

KDD 2020关于深度推荐系统与CTR预估工业界必读的论文

KDD 2020关于深度推荐系统与CTR预估工业界必读的论文

AINLP

4+阅读 · 2020年6月30日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Reinforcement Learning for General LTL Objectives Is Intractable

Reinforcement Learning for General LTL Objectives Is Intractable

Arxiv

0+阅读 · 2021年11月24日

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

Arxiv

0+阅读 · 2021年11月24日

Learning Strategies in Decentralized Matching Markets under Uncertain Preferences

Arxiv

0+阅读 · 2021年11月23日

Status-quo policy gradient in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年11月23日

A Modular Framework for Centrality and Clustering in Complex Networks

Arxiv

0+阅读 · 2021年11月23日

Optimal Content Caching and Recommendation with Age of Information

Arxiv

0+阅读 · 2021年11月23日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Generative Graph Convolutional Network for Growing Graphs

Generative Graph Convolutional Network for Growing Graphs

Arxiv

3+阅读 · 2019年3月6日

On Improving Decentralized Hysteretic Deep Reinforcement Learning

On Improving Decentralized Hysteretic Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年12月15日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员