多武装强盗的适应性定值,配有复合和匿名反馈 (Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback) - 专知论文

会员服务 ·

0

赌博机/老虎机 · CASE · INFORMS · 周期的 · CASES ·

2020 年 12 月 15 日

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

翻译：多武装强盗的适应性定值,配有复合和匿名反馈

Siwei Wang,Haoyun Wang,Longbo Huang

We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards of the action, convoluted with rewards from pulling other arms, successively. Existing results on this model require prior knowledge about the reward interval size as an input to their algorithms. In this paper, we propose adaptive algorithms for both the stochastic and the adversarial cases, without requiring any prior information about the reward interval. For the stochastic case, we prove that our algorithm guarantees a regret that matches the lower bounds (in order). For the adversarial case, we propose the first algorithm to jointly handle non-oblivious adversary and unknown reward interval size. We also conduct simulations based on real-world dataset. The results show that our algorithms outperform existing benchmarks.

翻译：我们用复合和匿名反馈来研究多武装土匪(MAB)问题。在这个模型中,拉手臂的奖励在一段时间内扩散(我们称这个时期为奖赏间隔),玩家从动作中得到部分回报,连续地从拉其他手臂得到报酬。这个模型的现有结果要求事先了解奖赏间隔大小,作为算法的输入。在本文中,我们建议对随机和对立案例采用适应性算法,而无需事先了解奖励间隔。对于这个随机案例,我们证明我们的算法保证了与较低界限(按顺序)相匹配的遗憾。对于对抗性案例,我们建议采用第一个算法,共同处理非明显的对立和未知的奖赏间隔大小。我们还根据真实世界数据集进行模拟。结果显示,我们的算法超过了现有的基准。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【AAAI2021】元学习器的冷启动序列推荐

【AAAI2021】元学习器的冷启动序列推荐

专知会员服务

41+阅读 · 2020年12月19日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

专知会员服务

27+阅读 · 2020年6月10日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

已删除

将门创投

7+阅读 · 2018年4月18日

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

专知

14+阅读 · 2018年3月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Boosting for Online Convex Optimization

Arxiv

0+阅读 · 2021年2月18日

Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2021年2月17日

COMBO: Conservative Offline Model-Based Policy Optimization

COMBO: Conservative Offline Model-Based Policy Optimization

Arxiv

1+阅读 · 2021年2月16日

Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy

Arxiv

0+阅读 · 2021年2月15日

Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent

Arxiv

0+阅读 · 2021年2月15日

Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation

Arxiv

0+阅读 · 2021年2月15日

Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization

Arxiv

0+阅读 · 2021年2月15日

Multi-Agent Multi-Armed Bandits with Limited Communication

Arxiv

0+阅读 · 2021年2月10日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【AAAI2021】元学习器的冷启动序列推荐

【AAAI2021】元学习器的冷启动序列推荐

专知会员服务

41+阅读 · 2020年12月19日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

专知会员服务

27+阅读 · 2020年6月10日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《巡飞弹药（爆炸性无人机）威胁态势分析》最新24页报告

《军用后勤无人机：破解战场运输挑战的创新方案》

人工智能战争：以色列、伊朗与新型AI战争形态

《俄乌战争：现代战争未来的启示与经验》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

已删除

将门创投

7+阅读 · 2018年4月18日

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

专知

14+阅读 · 2018年3月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Boosting for Online Convex Optimization

Arxiv

0+阅读 · 2021年2月18日

Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2021年2月17日

COMBO: Conservative Offline Model-Based Policy Optimization

COMBO: Conservative Offline Model-Based Policy Optimization

Arxiv

1+阅读 · 2021年2月16日

Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy

Arxiv

0+阅读 · 2021年2月15日

Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent

Arxiv

0+阅读 · 2021年2月15日

Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation

Arxiv

0+阅读 · 2021年2月15日

Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization

Arxiv

0+阅读 · 2021年2月15日

Multi-Agent Multi-Armed Bandits with Limited Communication

Arxiv

0+阅读 · 2021年2月10日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

微信扫码咨询专知VIP会员