保守环境环境组合连锁连锁抢劫 (Conservative Contextual Combinatorial Cascading Bandit) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 级联 · 基准 · 学成 · 时间步 ·

2021 年 4 月 23 日

Conservative Contextual Combinatorial Cascading Bandit

翻译：保守环境环境组合连锁连锁抢劫

Kun Wang,Canzhe Zhao,Shuai Li,Shuo Shao

Conservative mechanism is a desirable property in decision-making problems which balance the tradeoff between the exploration and exploitation. We propose the novel \emph{conservative contextual combinatorial cascading bandit ($C^4$-bandit)}, a cascading online learning game which incorporates the conservative mechanism. At each time step, the learning agent is given some contexts and has to recommend a list of items but not worse than the base strategy and then observes the reward by some stopping rules. We design the $C^4$-UCB algorithm to solve the problem and prove its n-step upper regret bound for two situations: known baseline reward and unknown baseline reward. The regret in both situations can be decomposed into two terms: (a) the upper bound for the general contextual combinatorial cascading bandit; and (b) a constant term for the regret from the conservative mechanism. We also improve the bound of the conservative contextual combinatorial bandit as a by-product. Experiments on synthetic data demonstrate its advantages and validate our theoretical analysis.

翻译：保守机制是平衡勘探与开发之间平衡的决策问题中的一种可取的财产。我们提议了新颖的 \ emph{ 保守背景组合式连锁条纹(C$4$-bandit)},这是一个包含保守机制的连锁在线学习游戏。在每一个步骤中,学习代理都给出了一些背景,必须建议一份项目清单,但并不比基本战略更差,然后通过一些停止规则来观察奖励。我们设计了$C$4$-UCB算法来解决问题,并证明它对于两种情况(已知基线奖赏和未知基线奖赏)具有正步的上层后悔感。两种情况下的遗憾可以分解为两个条件:(a) 总背景组合式连锁条纹带;(b) 保守机制的遗憾持续期。我们还改进了保守背景组合带带带作为副产品的约束。关于合成数据的实验显示了其优势并证实了我们的理论分析。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【WWW2021 】洛伦兹图卷积神经网络

专知会员服务

44+阅读 · 2021年5月26日

【WWW2021】神经协同推理

专知会员服务

58+阅读 · 2021年5月17日

【KDD2020】百度地图上用于估计旅行时间的上下文时空图注意网络

专知会员服务

23+阅读 · 2020年10月19日

【KDD2020】图模型信息融合

专知会员服务

39+阅读 · 2020年10月15日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【IJCAI2020】图神经网络预测结构化实体交互

【IJCAI2020】图神经网络预测结构化实体交互

专知会员服务

43+阅读 · 2020年5月13日

【论文】用于推理的概率逻辑神经网络（Probabilistic Logic Neural Networks for Reasoning）

【论文】用于推理的概率逻辑神经网络（Probabilistic Logic Neural Networks for Reasoning）

专知会员服务

104+阅读 · 2019年12月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：近期15篇推荐系统论文

LibRec 精选：近期15篇推荐系统论文

LibRec智能推荐

5+阅读 · 2019年3月5日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

ICLR 2018最佳论文AMSGrad能够取代Adam吗

ICLR 2018最佳论文AMSGrad能够取代Adam吗

论智

6+阅读 · 2018年4月20日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

The Power of Randomization: Efficient and Effective Algorithms for Constrained Submodular Maximization

Arxiv

0+阅读 · 2021年6月15日

A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization

A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization

Arxiv

0+阅读 · 2021年6月14日

Decentralized Inertial Best-Response with Voluntary and Limited Communication in Random Communication Networks

Arxiv

0+阅读 · 2021年6月13日

Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm

Arxiv

0+阅读 · 2021年6月10日

An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling

Arxiv

0+阅读 · 2021年6月5日

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

Arxiv

0+阅读 · 2021年6月5日

Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks

Arxiv

0+阅读 · 2021年6月5日

A Contextual Bandit Bake-off

Arxiv

0+阅读 · 2021年6月4日

Cascading Bandit under Differential Privacy

Arxiv

0+阅读 · 2021年6月4日

Compassionately Conservative Balanced Cuts for Image Segmentation

Arxiv

5+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【WWW2021 】洛伦兹图卷积神经网络

专知会员服务

44+阅读 · 2021年5月26日

【WWW2021】神经协同推理

专知会员服务

58+阅读 · 2021年5月17日

【KDD2020】百度地图上用于估计旅行时间的上下文时空图注意网络

专知会员服务

23+阅读 · 2020年10月19日

【KDD2020】图模型信息融合

专知会员服务

39+阅读 · 2020年10月15日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【IJCAI2020】图神经网络预测结构化实体交互

【IJCAI2020】图神经网络预测结构化实体交互

专知会员服务

43+阅读 · 2020年5月13日

【论文】用于推理的概率逻辑神经网络（Probabilistic Logic Neural Networks for Reasoning）

【论文】用于推理的概率逻辑神经网络（Probabilistic Logic Neural Networks for Reasoning）

专知会员服务

104+阅读 · 2019年12月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：近期15篇推荐系统论文

LibRec 精选：近期15篇推荐系统论文

LibRec智能推荐

5+阅读 · 2019年3月5日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

ICLR 2018最佳论文AMSGrad能够取代Adam吗

ICLR 2018最佳论文AMSGrad能够取代Adam吗

论智

6+阅读 · 2018年4月20日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

The Power of Randomization: Efficient and Effective Algorithms for Constrained Submodular Maximization

Arxiv

0+阅读 · 2021年6月15日

A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization

A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization

Arxiv

0+阅读 · 2021年6月14日

Decentralized Inertial Best-Response with Voluntary and Limited Communication in Random Communication Networks

Arxiv

0+阅读 · 2021年6月13日

Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm

Arxiv

0+阅读 · 2021年6月10日

An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling

Arxiv

0+阅读 · 2021年6月5日

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

Arxiv

0+阅读 · 2021年6月5日

Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks

Arxiv

0+阅读 · 2021年6月5日

A Contextual Bandit Bake-off

Arxiv

0+阅读 · 2021年6月4日

Cascading Bandit under Differential Privacy

Arxiv

0+阅读 · 2021年6月4日

Compassionately Conservative Balanced Cuts for Image Segmentation

Arxiv

5+阅读 · 2018年3月27日

微信扫码咨询专知VIP会员