A Bake-Off Bake-Off (A Contextual Bandit Bake-off) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · 优化器 · 学成 · INTERACT ·

2021 年 6 月 4 日

A Contextual Bandit Bake-off

翻译：A Bake-Off Bake-Off

Alberto Bietti,Alekh Agarwal,John Langford

from arxiv, JMLR

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to empirically evaluate contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. We find that a recent method (Foster et al., 2018) using optimism under uncertainty works the best overall. A surprisingly close second is a simple greedy baseline that only explores implicitly through the diversity of contexts, followed by a variant of Online Cover (Agarwal et al., 2014) which tends to be more conservative but robust to problem specification by design. Along the way, we also evaluate various components of contextual bandit algorithm design such as loss estimators. Overall, this is a thorough study and review of contextual bandit methodology.

翻译：尽管最近统计和计算效率方法取得了多项成功,但这些算法的实际行为仍然不甚为人知。我们利用大量监管的学习数据集对背景土匪算法进行实证性评估,重点是通过依赖优化或从监督的学习中获得的触手法学习的实用方法。我们发现,在不确定性下使用乐观的近期方法(Foster等人,2018年)总体上最为有效。令人惊讶的第二点是简单的贪婪基线,它仅通过背景多样性隐含地探索,然后是在线覆盖的变体(Agarwal等人,2014年),该变体往往比较保守,但通过设计来质疑规格。与此同时,我们还评估了背景土匪算法设计的各个组成部分,如损失估计。总体而言,这是对背景土匪方法的彻底研究和审查。

0

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

《行为与认知机器人学》，241页pdf

《行为与认知机器人学》，241页pdf

专知会员服务

54+阅读 · 2021年4月11日

【WWW2021】双曲图卷积网络的协同过滤

【WWW2021】双曲图卷积网络的协同过滤

专知会员服务

40+阅读 · 2021年3月26日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

波士顿大学Francesco《在线学习导论》2020书册，126页pdf详述在线学习最新进展

波士顿大学Francesco《在线学习导论》2020书册，126页pdf详述在线学习最新进展

专知会员服务

60+阅读 · 2020年5月13日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec智能推荐

5+阅读 · 2017年6月12日

Zero-Round Active Learning

Arxiv

0+阅读 · 2021年8月6日

Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm

Arxiv

0+阅读 · 2021年8月4日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

Exploration in Online Advertising Systems with Deep Uncertainty-Aware Learning

Arxiv

6+阅读 · 2020年11月25日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

SQL-Rank: A Listwise Approach to Collaborative Ranking

Arxiv

6+阅读 · 2018年2月28日

A Hierarchical Contextual Attention-based GRU Network for Sequential Recommendation

Arxiv

5+阅读 · 2017年12月7日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

相关VIP内容

《行为与认知机器人学》，241页pdf

《行为与认知机器人学》，241页pdf

专知会员服务

54+阅读 · 2021年4月11日

【WWW2021】双曲图卷积网络的协同过滤

【WWW2021】双曲图卷积网络的协同过滤

专知会员服务

40+阅读 · 2021年3月26日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

波士顿大学Francesco《在线学习导论》2020书册，126页pdf详述在线学习最新进展

波士顿大学Francesco《在线学习导论》2020书册，126页pdf详述在线学习最新进展

专知会员服务

60+阅读 · 2020年5月13日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec智能推荐

5+阅读 · 2017年6月12日

相关论文

Zero-Round Active Learning

Arxiv

0+阅读 · 2021年8月6日

Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm

Arxiv

0+阅读 · 2021年8月4日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

Exploration in Online Advertising Systems with Deep Uncertainty-Aware Learning

Arxiv

6+阅读 · 2020年11月25日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

SQL-Rank: A Listwise Approach to Collaborative Ranking

Arxiv

6+阅读 · 2018年2月28日

A Hierarchical Contextual Attention-based GRU Network for Sequential Recommendation

Arxiv

5+阅读 · 2017年12月7日

Multi-Task Learning with Labeled and Unlabeled Tasks

Arxiv

3+阅读 · 2017年6月8日

微信扫码咨询专知VIP会员