信息选择的背景强盗排级 (Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 上置信界限 · 上下文赌博机/上下文老虎机 · INFORMS · 秩 ·

2021 年 10 月 8 日

Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

翻译：信息选择的背景强盗排级

Michael Rawson,Jade Freeman

Contextual multi-armed bandits (CMAB) have been widely used for learning to filter and prioritize information according to a user's interest. In this work, we analyze top-K ranking under the CMAB framework where the top-K arms are chosen iteratively to maximize a reward. The context, which represents a set of observable factors related to the user, is used to increase prediction accuracy compared to a standard multi-armed bandit. Contextual bandit methods have mostly been studied under strict linearity assumptions, but we drop that assumption and learn non-linear stochastic reward functions with deep neural networks. We introduce a novel algorithm called the Deep Upper Confidence Bound (UCB) algorithm. Deep UCB balances exploration and exploitation with a separate neural network to model the learning convergence. We compare the performance of many bandit algorithms varying K over real-world data sets with high-dimensional data and non-linear reward functions. Empirical results show that the performance of Deep UCB often outperforms though it is sensitive to the problem and reward setup. Additionally, we prove theoretical regret bounds on Deep UCB giving convergence to optimality for the weak class of CMAB problems.

翻译：多武装大盗(CMAB)被广泛用于学习根据用户的兴趣筛选和优先排序信息。在这项工作中,我们分析了在CMAB框架下的顶级K级算法,即高K武器是迭接式选择,以获得最大限度的奖励。背景是一组与用户有关的可观测因素,用来提高预测准确性,与标准的多武装大盗相比。背景强盗方法大多是在严格的线性假设下研究的,但我们放弃了这一假设,并学习了深神经网络的非线性随机奖赏功能。我们引入了一种新型算法,称为深高层信任算法(UCB) 。深UCB平衡了探索和开发,与一个单独的神经网络进行平衡,以模拟学习趋同。我们用高维数据和非线性奖赏功能对许多不同的K型大盗算法的性进行了比较。实证结果显示,Deep UCB的性能往往超越了对问题敏感度和奖赏设置的完美性。此外,我们证明深UCB在深度UB的理论上有悔。

0

相关内容

赌博机/老虎机

赌博机/老虎机

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Youla-REN: Learning Nonlinear Feedback Policies with Robust Stability Guarantees

Arxiv

0+阅读 · 2021年12月2日

Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Arxiv

0+阅读 · 2021年12月2日

Penalized and Decentralized Contextual Bandit Learning for WLAN Channel Allocation with Contention-Driven Feature Extraction

Arxiv

0+阅读 · 2021年12月1日

Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio

Arxiv

0+阅读 · 2021年12月1日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Maximizing Marginal Fairness for Dynamic Learning to Rank

Arxiv

7+阅读 · 2021年2月18日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification

Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification

Arxiv

4+阅读 · 2019年2月11日

Learning a Deep Listwise Context Model for Ranking Refinement

Arxiv

4+阅读 · 2018年4月23日

VIP会员

文章信息

相关主题

赌博机/老虎机

上置信界限

上下文赌博机/上下文老虎机

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Youla-REN: Learning Nonlinear Feedback Policies with Robust Stability Guarantees

Arxiv

0+阅读 · 2021年12月2日

Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Arxiv

0+阅读 · 2021年12月2日

Penalized and Decentralized Contextual Bandit Learning for WLAN Channel Allocation with Contention-Driven Feature Extraction

Arxiv

0+阅读 · 2021年12月1日

Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio

Arxiv

0+阅读 · 2021年12月1日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Maximizing Marginal Fairness for Dynamic Learning to Rank

Arxiv

7+阅读 · 2021年2月18日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification

Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification

Arxiv

4+阅读 · 2019年2月11日

Learning a Deep Listwise Context Model for Ranking Refinement

Arxiv

4+阅读 · 2018年4月23日

微信扫码咨询专知VIP会员