具有深度代表面和浅浅探索的神经环境土匪 (Neural Contextual Bandits with Deep Representation and Shallow Exploration) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · 原始特征 · 层 · 上置信界限 ·

2020 年 12 月 3 日

Neural Contextual Bandits with Deep Representation and Shallow Exploration

翻译：具有深度代表面和浅浅探索的神经环境土匪

Pan Xu,Zheng Wen,Handong Zhao,Quanquan Gu

from arxiv, 28 pages, 1 figure, 1 table

We study a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the reward generating function is unknown. We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network (deep representation learning), and uses an upper confidence bound (UCB) approach to explore in the last linear layer (shallow exploration). We prove that under standard assumptions, our proposed algorithm achieves $\tilde{O}(\sqrt{T})$ finite-time regret, where $T$ is the learning time horizon. Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.

翻译：我们研究一个背景强盗一般类别, 每一个背景行动配对都与原始特性矢量相关, 但奖赏生成功能却未知。我们提出一种新的学习算法, 利用深ReLU神经网络的最后隐藏层来改变原始特性矢量( 深层代表学习), 并使用高置信约束( UCB) 方法来探索最后线性层( 浅层探索 ) 。我们证明根据标准假设, 我们提议的算法实现了$\ tilde{O} (\\ sqrt{T}) $- 有限时间的遗憾, 在那里, $T 是学习的时间范围。与现有的神经背景强盗算法相比, 我们的方法在计算上效率要高得多, 因为它只需要在深层神经网络的最后一层进行探索。

0

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

专知会员服务

110+阅读 · 2022年3月2日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

43+阅读 · 2020年12月6日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【AAAI2020】知识图谱表示，获取和应用的综述 25页PDF A Survey on Knowledge Graphs: Representation, Acquisition and Applications

【AAAI2020】知识图谱表示，获取和应用的综述 25页PDF A Survey on Knowledge Graphs: Representation, Acquisition and Applications

专知会员服务

95+阅读 · 2020年3月29日

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

专知会员服务

42+阅读 · 2020年3月17日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【图机器学习论文】图表示学习:方法与应用（Representation Learning on Graphs: Methods and Applications）

【图机器学习论文】图表示学习:方法与应用（Representation Learning on Graphs: Methods and Applications）

专知会员服务

147+阅读 · 2019年12月16日

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

专知会员服务

91+阅读 · 2019年12月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

LassoNet: A Neural Network with Feature Sparsity

Arxiv

0+阅读 · 2021年1月21日

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Arxiv

0+阅读 · 2021年1月21日

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Arxiv

0+阅读 · 2021年1月20日

Hand-Based Person Identification using Global and Part-Aware Deep Feature Representation Learning

Arxiv

0+阅读 · 2021年1月19日

Learning Abstract Task Representations

Arxiv

0+阅读 · 2021年1月19日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Deep Learning for Learning Graph Representations

Arxiv

35+阅读 · 2020年1月2日

DAG-GNN: DAG Structure Learning with Graph Neural Networks

Arxiv

8+阅读 · 2019年4月22日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

上置信界限

相关VIP内容

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

专知会员服务

110+阅读 · 2022年3月2日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

43+阅读 · 2020年12月6日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【AAAI2020】知识图谱表示，获取和应用的综述 25页PDF A Survey on Knowledge Graphs: Representation, Acquisition and Applications

【AAAI2020】知识图谱表示，获取和应用的综述 25页PDF A Survey on Knowledge Graphs: Representation, Acquisition and Applications

专知会员服务

95+阅读 · 2020年3月29日

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

专知会员服务

42+阅读 · 2020年3月17日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【图机器学习论文】图表示学习:方法与应用（Representation Learning on Graphs: Methods and Applications）

【图机器学习论文】图表示学习:方法与应用（Representation Learning on Graphs: Methods and Applications）

专知会员服务

147+阅读 · 2019年12月16日

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

专知会员服务

91+阅读 · 2019年12月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

LassoNet: A Neural Network with Feature Sparsity

Arxiv

0+阅读 · 2021年1月21日

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Arxiv

0+阅读 · 2021年1月21日

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Arxiv

0+阅读 · 2021年1月20日

Hand-Based Person Identification using Global and Part-Aware Deep Feature Representation Learning

Arxiv

0+阅读 · 2021年1月19日

Learning Abstract Task Representations

Arxiv

0+阅读 · 2021年1月19日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Deep Learning for Learning Graph Representations

Arxiv

35+阅读 · 2020年1月2日

DAG-GNN: DAG Structure Learning with Graph Neural Networks

Arxiv

8+阅读 · 2019年4月22日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

微信扫码咨询专知VIP会员