无限武装线性直线强盗的严格悔恨弹道 (Tight Regret Bounds for Infinite-armed Linear Contextual Bandits) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · 线性的 · 无限 · Machine Learning ·

2021 年 1 月 26 日

Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

翻译：无限武装线性直线强盗的严格悔恨弹道

Yingkai Li,Yining Wang,Xi Chen,Yuan Zhou

from arxiv, 10 pages, accepted for presentation at AISTATS 2021

Linear contextual bandit is an important class of sequential decision making problems with a wide range of applications to recommender systems, online advertising, healthcare, and many other machine learning related tasks. While there is a lot of prior research, tight regret bounds of linear contextual bandit with infinite action sets remain open. In this paper, we address this open problem by considering the linear contextual bandit with (changing) infinite action sets. We prove a regret upper bound on the order of $O(\sqrt{d^2T\log T})\times \text{poly}(\log\log T)$ where $d$ is the domain dimension and $T$ is the time horizon. Our upper bound matches the previous lower bound of $\Omega(\sqrt{d^2 T\log T})$ in [Li et al., 2019] up to iterated logarithmic terms.

翻译：线性背景土匪是一个重要的顺序决策问题类别, 涉及到推荐系统、在线广告、医疗保健和其他许多机器学习相关任务的多种应用。虽然有许多先前的研究, 但线性背景土匪与无限动作组的严格遗憾界限仍然开放。在本文中, 我们通过考虑线性背景土匪与( 更改) 无限动作组来解决这个问题。我们证明对 $O (\\ sqrt{ d2T\log T}\ t)\ times\ text{poly} (\log\log t) (poly) (\log\ t) ($d) 是域维度, $T$ 是时间范围。我们的上边框匹配了在 [ Li 和 al, 2019] 至迭代对数术语中的 $\ Omega (\ sqrt{ d ⁇ 2 T\log T} 之前较低的约束值。

0

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

最新《图理论》笔记书，98页pdf

最新《图理论》笔记书，98页pdf

专知会员服务

76+阅读 · 2020年12月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】图理论与应用，270页pdf

专知会员服务

86+阅读 · 2020年12月5日

近期必读的 NeurIPS2020 80多篇【图机器学习】相关论文

专知会员服务

54+阅读 · 2020年11月3日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习在材料科学中的应用综述，21页pdf

机器学习在材料科学中的应用综述，21页pdf

专知会员服务

49+阅读 · 2019年9月24日

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

专知

82+阅读 · 2020年2月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

Arxiv

0+阅读 · 2021年3月23日

Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits

Arxiv

0+阅读 · 2021年3月23日

Bandits with many optimal arms

Arxiv

0+阅读 · 2021年3月23日

Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1/n)$

Arxiv

0+阅读 · 2021年3月22日

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction

Arxiv

0+阅读 · 2021年3月22日

Efficient Processing of k-regret Minimization Queries with Theoretical Guarantees

Arxiv

0+阅读 · 2021年3月22日

UCB-based Algorithms for Multinomial Logistic Regression Bandits

Arxiv

0+阅读 · 2021年3月21日

Bilinear Classes: A Structural Framework for Provable Generalization in RL

Arxiv

0+阅读 · 2021年3月19日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

Machine Learning

相关VIP内容

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

最新《图理论》笔记书，98页pdf

最新《图理论》笔记书，98页pdf

专知会员服务

76+阅读 · 2020年12月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】图理论与应用，270页pdf

专知会员服务

86+阅读 · 2020年12月5日

近期必读的 NeurIPS2020 80多篇【图机器学习】相关论文

专知会员服务

54+阅读 · 2020年11月3日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习在材料科学中的应用综述，21页pdf

机器学习在材料科学中的应用综述，21页pdf

专知会员服务

49+阅读 · 2019年9月24日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

17篇必看[知识图谱Knowledge Graphs] 论文@AAAI2020

专知

82+阅读 · 2020年2月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

Arxiv

0+阅读 · 2021年3月23日

Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits

Arxiv

0+阅读 · 2021年3月23日

Bandits with many optimal arms

Arxiv

0+阅读 · 2021年3月23日

Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1/n)$

Arxiv

0+阅读 · 2021年3月22日

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction

Arxiv

0+阅读 · 2021年3月22日

Efficient Processing of k-regret Minimization Queries with Theoretical Guarantees

Arxiv

0+阅读 · 2021年3月22日

UCB-based Algorithms for Multinomial Logistic Regression Bandits

Arxiv

0+阅读 · 2021年3月21日

Bilinear Classes: A Structural Framework for Provable Generalization in RL

Arxiv

0+阅读 · 2021年3月19日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

微信扫码咨询专知VIP会员