强盗社会学习:在近视行为下探索 (Bandit Social Learning: Exploration under Myopic Behavior) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Agent · Learning · 贪心逐层预训练 · 相互独立的 ·

2023 年 2 月 15 日

Bandit Social Learning: Exploration under Myopic Behavior

翻译：强盗社会学习:在近视行为下探索

Kiarash Banihashem,MohammadTaghi Hajiaghayi,Suho Shin,Aleksandrs Slivkins

We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledge

翻译：我们研究社会学习动态,让代理人集体遵循简单的多武装土匪协议。代理人按顺序到达, 选择武器并获得相关奖赏。每个代理人观察前代理人的全部历史( 武器和奖赏), 没有私人信号。当代理人集体面临勘探- 开采交易时, 每种代理人的行为都是短视的, 不考虑勘探。激励情景的情景涉及到在线平台的审查和评级。我们允许一系列与( 参数化的) 信心间隔一致的短视行为, 包括“ 不受偏见的” 行为以及各种行为偏差。虽然这些行为的极端版本符合众所周知的土匪算法, 但我们证明, 更温和的版本会导致探索失败, 从而导致代理人数量的直线率。我们通过分析“ 温和乐观” 经纪人来提供相应的遗憾。作为独立利益的特殊案例, 我们从多种武装匪徒的贪婪算法失败中获得了一般结果。这是文献中的第一个结果, 最深了解我们的知识。

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

云市场下云服务商与终端用户间利益均衡的优化模型与算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

山茱萸环烯醚萜苷对蛋白磷酸酶2A催化亚基翻译后修饰的调节机制及在拮抗Tau蛋白过度磷酸化中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

MAPK通路在气道高反应性发生中对G-蛋白偶联受体的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

ERK调节NOX-1介导的ROS启动创面愈合中早期炎症反应的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

镉胁迫诱导拟南芥细胞内MLH1和MSH2基因突变及甲基化改变的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Generative Agents: Interactive Simulacra of Human Behavior

Arxiv

3+阅读 · 2023年4月7日

Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine

Arxiv

0+阅读 · 2023年4月6日

Persuading to Prepare for Quitting Smoking with a Virtual Coach: Using States and User Characteristics to Predict Behavior

Arxiv

0+阅读 · 2023年4月5日

On the Concentration of the Minimizers of Empirical Risks

Arxiv

0+阅读 · 2023年4月3日

Kernel-level Rootkit Detection, Prevention and Behavior Profiling: A Taxonomy and Survey

Arxiv

0+阅读 · 2023年4月2日

Connected and Automated Vehicles in Mixed-Traffic: Learning Human Driver Behavior for Effective On-Ramp Merging

Arxiv

0+阅读 · 2023年4月1日

Decentralized Attack Search and the Design of Bug Bounty Schemes

Arxiv

0+阅读 · 2023年3月31日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

赌博机/老虎机

贪心逐层预训练

相互独立的

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Generative Agents: Interactive Simulacra of Human Behavior

Arxiv

3+阅读 · 2023年4月7日

Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine

Arxiv

0+阅读 · 2023年4月6日

Persuading to Prepare for Quitting Smoking with a Virtual Coach: Using States and User Characteristics to Predict Behavior

Arxiv

0+阅读 · 2023年4月5日

On the Concentration of the Minimizers of Empirical Risks

Arxiv

0+阅读 · 2023年4月3日

Kernel-level Rootkit Detection, Prevention and Behavior Profiling: A Taxonomy and Survey

Arxiv

0+阅读 · 2023年4月2日

Connected and Automated Vehicles in Mixed-Traffic: Learning Human Driver Behavior for Effective On-Ramp Merging

Arxiv

0+阅读 · 2023年4月1日

Decentralized Attack Search and the Design of Bug Bounty Schemes

Arxiv

0+阅读 · 2023年3月31日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

云市场下云服务商与终端用户间利益均衡的优化模型与算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

山茱萸环烯醚萜苷对蛋白磷酸酶2A催化亚基翻译后修饰的调节机制及在拮抗Tau蛋白过度磷酸化中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

MAPK通路在气道高反应性发生中对G-蛋白偶联受体的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

ERK调节NOX-1介导的ROS启动创面愈合中早期炎症反应的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

镉胁迫诱导拟南芥细胞内MLH1和MSH2基因突变及甲基化改变的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员