Bandit社交学习：在近视行为下的探索 (Bandit Social Learning: Exploration under Myopic Behavior) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 社交 · 学习动态 · 贪心算法 · 参数化 ·

2023 年 4 月 28 日

Bandit Social Learning: Exploration under Myopic Behavior

翻译：Bandit社交学习：在近视行为下的探索

Kiarash Banihashem,MohammadTaghi Hajiaghayi,Suho Shin,Aleksandrs Slivkins

We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledge

翻译：我们研究代理人集体遵循简单的多臂赌博协议的社交学习动态。代理人依次到达，选择臂并收到相关的奖励。每个代理人观察前面代理人的全部历史（臂和奖励），且没有私有信号。虽然代理人们需要在探索和开发之间进行权衡，但每个代理人都会近视行为，而不考虑探索。激励场景涉及在线平台上的评论和评级。我们允许各种与（参数化的）置信区间一致的近视行为，包括“无偏”行为以及各种行为偏差。虽然这些行为的极端版本对应于众所周知的赌博算法，但我们证明了中等版本会导致严重的探索失败，因此遗憾率与代理人数量成线性关系。我们通过分析“适度乐观”的代理人提供了匹配的遗憾上界。作为一个特殊的研究兴趣，我们获得了多臂赌博中贪心算法失败的一般结果。据我们所知，这是文献中的首次这样的结果。

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

《行为与认知机器人学》，241页pdf

《行为与认知机器人学》，241页pdf

专知会员服务

54+阅读 · 2021年4月11日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

专知会员服务

22+阅读 · 2020年1月15日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

碳交易、互惠偏好与供应链减排博弈研究

国家自然科学基金

1+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

无控制条件下上下文感知和遮挡鲁棒的人脸对齐研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于内生性社会学习机制的股票市场参与决策研究

国家自然科学基金

0+阅读 · 2013年12月31日

焦虑情绪对社会决策行为的影响

国家自然科学基金

2+阅读 · 2013年12月31日

有理动力系统中的拓扑和拟共形几何

国家自然科学基金

1+阅读 · 2012年12月31日

lincRNA-BCSCA1募集PRC2调控膀胱肿瘤干细胞自我更新的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

补偿性还是非补偿性规则：探析风险决策的行为与神经机制

国家自然科学基金

0+阅读 · 2011年12月31日

投资消费与劳动供给的随机控制模型中最佳退休年龄决策问题研究

国家自然科学基金

0+阅读 · 2011年12月31日

IRS-1在头颈部鳞状上皮癌转移中作用和机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Subject Granular Differential Privacy in Federated Learning

Arxiv

0+阅读 · 2023年6月15日

Behavioral Cloning via Search in Embedded Demonstration Dataset

Arxiv

0+阅读 · 2023年6月15日

Decentralized Social Navigation with Non-Cooperative Robots via Bi-Level Optimization

Arxiv

1+阅读 · 2023年6月15日

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

Arxiv

0+阅读 · 2023年6月14日

Bandit Social Learning: Exploration under Myopic Behavior

Arxiv

0+阅读 · 2023年6月14日

A Novel Driver Distraction Behavior Detection Based on Self-Supervised Learning Framework with Masked Image Modeling

Arxiv

0+阅读 · 2023年6月13日

Impact of Experiencing Misrecognition by Teachable Agents on Learning and Rapport

Arxiv

0+阅读 · 2023年6月11日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

《行为与认知机器人学》，241页pdf

《行为与认知机器人学》，241页pdf

专知会员服务

54+阅读 · 2021年4月11日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

专知会员服务

22+阅读 · 2020年1月15日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】迈向鲁棒的零样本强化学习

一种基于视觉算法生成三维场景重建的多任务系统 | 2025最新200页

【普林斯顿博士论文】量化、评估与缓解现代机器学习系统中的风险

遥感中基于深度学习的领域自适应方法：全面综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Subject Granular Differential Privacy in Federated Learning

Arxiv

0+阅读 · 2023年6月15日

Behavioral Cloning via Search in Embedded Demonstration Dataset

Arxiv

0+阅读 · 2023年6月15日

Decentralized Social Navigation with Non-Cooperative Robots via Bi-Level Optimization

Arxiv

1+阅读 · 2023年6月15日

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

Arxiv

0+阅读 · 2023年6月14日

Bandit Social Learning: Exploration under Myopic Behavior

Arxiv

0+阅读 · 2023年6月14日

A Novel Driver Distraction Behavior Detection Based on Self-Supervised Learning Framework with Masked Image Modeling

Arxiv

0+阅读 · 2023年6月13日

Impact of Experiencing Misrecognition by Teachable Agents on Learning and Rapport

Arxiv

0+阅读 · 2023年6月11日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

相关基金

碳交易、互惠偏好与供应链减排博弈研究

国家自然科学基金

1+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

无控制条件下上下文感知和遮挡鲁棒的人脸对齐研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于内生性社会学习机制的股票市场参与决策研究

国家自然科学基金

0+阅读 · 2013年12月31日

焦虑情绪对社会决策行为的影响

国家自然科学基金

2+阅读 · 2013年12月31日

有理动力系统中的拓扑和拟共形几何

国家自然科学基金

1+阅读 · 2012年12月31日

lincRNA-BCSCA1募集PRC2调控膀胱肿瘤干细胞自我更新的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

补偿性还是非补偿性规则：探析风险决策的行为与神经机制

国家自然科学基金

0+阅读 · 2011年12月31日

投资消费与劳动供给的随机控制模型中最佳退休年龄决策问题研究

国家自然科学基金

0+阅读 · 2011年12月31日

IRS-1在头颈部鳞状上皮癌转移中作用和机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员