用于非静止环境的加权高叙斯进程 (Weighted Gaussian Process Bandits for Non-stationary Environments) - 专知论文

会员服务 ·

0

Weight · 赌博机/老虎机 · Processing（编程语言） · 高斯过程回归 · 回合 ·

2021 年 7 月 6 日

Weighted Gaussian Process Bandits for Non-stationary Environments

翻译：用于非静止环境的加权高叙斯进程

Yuntian Deng,Xingyu Zhou,Baekjin Kim,Ambuj Tewari,Abhishek Gupta,Ness Shroff

In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are potentially of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.

翻译：在本文中, 我们考虑高山进程( GP) 土匪优化在非静止环境中的问题。为了捕捉外部变化, 黑盒功能允许在复制的内核Hilbert空间( RKHS) 中进行时间变化。为此, 我们开发了新颖的以加权高山进程回归为基础的UCB型算法。关键的挑战是如何应对无限的地貌图。为此, 我们利用内核近流技术来证明亚线性遗憾捆绑, 这是对加权时间变化的匪徒的第一个( 重) 子线性遗憾保证, 这是对加权非线性线性强盗和标准的GP- UCB算法的普遍化。此外, 在加权高空进程回归和一般权重回归方面, 也实现了新的集中不平等。我们还为加权最大信息收益提供了通用的上限和权重上限。这些结果对于诸如新闻排名和适应性定价等应用可能具有独立的兴趣, 其权重可以用来测量现有数据收益的重要性或质量。最后, 我们强调在比较现有数据实验中, 将现有方法的偏重性。

0

相关内容

Weight

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

ADER:Adapting between Exploration and Robustness for Actor-Critic Methods

Arxiv

0+阅读 · 2021年9月8日

Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications

Arxiv

0+阅读 · 2021年9月6日

Error Controlled Actor-Critic

Error Controlled Actor-Critic

Arxiv

0+阅读 · 2021年9月6日

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

Arxiv

0+阅读 · 2021年9月6日

Error estimates of local energy regularization for the logarithmic Schrodinger equation

Arxiv

0+阅读 · 2021年9月6日

Modeling Online Behavior in Recommender Systems: The Importance of Temporal Context

Arxiv

0+阅读 · 2021年9月5日

High-Dimensional Sparse Linear Bandits

Arxiv

0+阅读 · 2021年9月4日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

Inverse Reinforcement Learning via Deep Gaussian Process

Arxiv

3+阅读 · 2017年5月4日

VIP会员

文章信息

相关主题

赌博机/老虎机

Processing（编程语言）

高斯过程回归

相关VIP内容

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025】基于奖励引导解码的多模态大语言模型控制

【CMU博士论文】基于深度学习的高效贝叶斯实验设计

《数据安全国家标准体系（2025版）》征求意见稿

2025年中国AI算力基础设施发展趋势洞察

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

ADER:Adapting between Exploration and Robustness for Actor-Critic Methods

Arxiv

0+阅读 · 2021年9月8日

Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications

Arxiv

0+阅读 · 2021年9月6日

Error Controlled Actor-Critic

Error Controlled Actor-Critic

Arxiv

0+阅读 · 2021年9月6日

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

Arxiv

0+阅读 · 2021年9月6日

Error estimates of local energy regularization for the logarithmic Schrodinger equation

Arxiv

0+阅读 · 2021年9月6日

Modeling Online Behavior in Recommender Systems: The Importance of Temporal Context

Arxiv

0+阅读 · 2021年9月5日

High-Dimensional Sparse Linear Bandits

Arxiv

0+阅读 · 2021年9月4日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

Inverse Reinforcement Learning via Deep Gaussian Process

Arxiv

3+阅读 · 2017年5月4日

微信扫码咨询专知VIP会员