保守环境环境组合连锁连锁抢劫 (Conservative Contextual Combinatorial Cascading Bandit) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 级联 · 基准 · 学成 · 时间步 ·

2021 年 4 月 17 日

Conservative Contextual Combinatorial Cascading Bandit

翻译：保守环境环境组合连锁连锁抢劫

Kun Wang,Canzhe Zhao,Shuai Li,Shuo Shao

Conservative mechanism is a desirable property in decision-making problems which balance the tradeoff between the exploration and exploitation. We propose the novel \emph{conservative contextual combinatorial cascading bandit ($C^4$-bandit)}, a cascading online learning game which incorporates the conservative mechanism. At each time step, the learning agent is given some contexts and has to recommend a list of items but not worse than the base strategy and then observes the reward by some stopping rules. We design the $C^4$-UCB algorithm to solve the problem and prove its n-step upper regret bound for two situations: known baseline reward and unknown baseline reward. The regret in both situations can be decomposed into two terms: (a) the upper bound for the general contextual combinatorial cascading bandit; and (b) a constant term for the regret from the conservative mechanism. The algorithm can be directly applied to the search engine and recommender system. Experiments on synthetic data demonstrate its advantages and validate our theoretical analysis.

翻译：保守机制是平衡勘探与开发之间平衡的决策问题中的一种可取的财产。我们提议采用新颖的 \ emph{ 保守背景组合式组合式带式带宽土匪(C$4$-bandit)},这是一个包含保守机制的连锁在线学习游戏。每一次步骤,学习代理人都有一定的背景,必须建议一份项目清单,但不得比基本战略更差,然后通过一些停止规则来观察奖励。我们设计了$C$4$-UCB算法,以解决问题,并证明它对于两种情况(已知基线奖赏和未知基线奖赏)具有正步的上层后悔感。两种情况下的遗憾可以分解为两个条件:(a) 总背景组合带宽阔带带宽;(b) 保守机制的遗憾持续期。算法可以直接应用于搜索引擎和建议系统。合成数据的实验显示了其优点并证实了我们的理论分析。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

22+阅读 · 2020年6月19日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【干货书】实战推荐系统，Practical Recommender Systems，432页pdf

【干货书】实战推荐系统，Practical Recommender Systems，432页pdf

专知会员服务

180+阅读 · 2020年4月17日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

已删除

将门创投

4+阅读 · 2017年11月1日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback

Arxiv

0+阅读 · 2021年6月9日

Regret and Cumulative Constraint Violation Analysis for Online Convex Optimization with Long Term Constraints

Arxiv

0+阅读 · 2021年6月9日

Scale Free Adversarial Multi Armed Bandits

Arxiv

0+阅读 · 2021年6月8日

FedDR -- Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization

Arxiv

0+阅读 · 2021年6月8日

On Learning to Rank Long Sequences with Contextual Bandits

Arxiv

0+阅读 · 2021年6月7日

Thresholded Lasso Bandit

Arxiv

0+阅读 · 2021年6月6日

Causal Bandits with Unknown Graph Structure

Arxiv

0+阅读 · 2021年6月5日

An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling

Arxiv

0+阅读 · 2021年6月5日

Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks

Arxiv

0+阅读 · 2021年6月5日

A Contextual Bandit Bake-off

Arxiv

0+阅读 · 2021年6月4日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

22+阅读 · 2020年6月19日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【干货书】实战推荐系统，Practical Recommender Systems，432页pdf

【干货书】实战推荐系统，Practical Recommender Systems，432页pdf

专知会员服务

180+阅读 · 2020年4月17日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

已删除

将门创投

4+阅读 · 2017年11月1日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

相关论文

A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback

Arxiv

0+阅读 · 2021年6月9日

Regret and Cumulative Constraint Violation Analysis for Online Convex Optimization with Long Term Constraints

Arxiv

0+阅读 · 2021年6月9日

Scale Free Adversarial Multi Armed Bandits

Arxiv

0+阅读 · 2021年6月8日

FedDR -- Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization

Arxiv

0+阅读 · 2021年6月8日

On Learning to Rank Long Sequences with Contextual Bandits

Arxiv

0+阅读 · 2021年6月7日

Thresholded Lasso Bandit

Arxiv

0+阅读 · 2021年6月6日

Causal Bandits with Unknown Graph Structure

Arxiv

0+阅读 · 2021年6月5日

An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling

Arxiv

0+阅读 · 2021年6月5日

Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks

Arxiv

0+阅读 · 2021年6月5日

A Contextual Bandit Bake-off

Arxiv

0+阅读 · 2021年6月4日

微信扫码咨询专知VIP会员