Discounted Thompson Sampling for Non-Stationary Bandit Problems - 专知论文

会员服务 ·

0

赌博机/老虎机 · 样本 · 衰减系数 · 相互独立的 · state-of-the-art ·

2023 年 5 月 22 日

Discounted Thompson Sampling for Non-Stationary Bandit Problems

翻译：暂无翻译

Han Qi,Yue Wang,Li Zhu

Non-stationary multi-armed bandit (NS-MAB) problems have recently received significant attention. NS-MAB are typically modelled in two scenarios: abruptly changing, where reward distributions remain constant for a certain period and change at unknown time steps, and smoothly changing, where reward distributions evolve smoothly based on unknown dynamics. In this paper, we propose Discounted Thompson Sampling (DS-TS) with Gaussian priors to address both non-stationary settings. Our algorithm passively adapts to changes by incorporating a discounted factor into Thompson Sampling. DS-TS method has been experimentally validated, but analysis of the regret upper bound is currently lacking. Under mild assumptions, we show that DS-TS with Gaussian priors can achieve nearly optimal regret bound on the order of $\tilde{O}(\sqrt{TB_T})$ for abruptly changing and $\tilde{O}(T^{\beta})$ for smoothly changing, where $T$ is the number of time steps, $B_T$ is the number of breakpoints, $\beta$ is associated with the smoothly changing environment and $\tilde{O}$ hides the parameters independent of $T$ as well as logarithmic terms. Furthermore, empirical comparisons between DS-TS and other non-stationary bandit algorithms demonstrate its competitive performance. Specifically, when prior knowledge of the maximum expected reward is available, DS-TS has the potential to outperform state-of-the-art algorithms.

翻译：暂无翻译

0

相关内容

赌博机/老虎机

赌博机/老虎机

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

肠道靶向性骨髓间充质干细胞通过重建肠道微生态来治疗实验性IBD的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

循经取穴治疗原发性高血压的宿主代谢-肠道微生物Cross-talk机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同生境条件下齿瓣石斛不同生长时期菌根真菌多样性研究

国家自然科学基金

0+阅读 · 2013年12月31日

量子自旋格子系统的拓扑序、量子动力学和量子quench

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

次生布风层对分选流化床稳定性的协同作用

国家自然科学基金

0+阅读 · 2011年12月31日

结直肠癌细胞外基质的动态变化特征及其对上皮间质转化的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

扰动可积非哈密顿系统的极限环分支

国家自然科学基金

0+阅读 · 2011年12月31日

生态恢复对红壤严重侵蚀地土壤水库重建的影响与机制

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

Range Avoidance for Constant-Depth Circuits: Hardness and Algorithms

Arxiv

0+阅读 · 2023年7月7日

PAC bounds of continuous Linear Parameter-Varying systems related to neural ODEs

Arxiv

0+阅读 · 2023年7月7日

BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

Arxiv

0+阅读 · 2023年7月7日

The computational asymptotics of Gaussian variational inference and the Laplace approximation

Arxiv

0+阅读 · 2023年7月5日

D-optimal Subsampling Design for Massive Data Linear Regression

Arxiv

0+阅读 · 2023年7月5日

Exact and Parameterized Algorithms for the Independent Cutset Problem

Arxiv

0+阅读 · 2023年7月5日

A $p$-step-ahead sequential adaptive algorithm for D-optimal nonlinear regression design

Arxiv

0+阅读 · 2023年7月5日

Conditional and Residual Methods in Scalable Coding for Humans and Machines

Arxiv

0+阅读 · 2023年7月4日

A Non-Classical Parameterization for Density Estimation Using Sample Moments

Arxiv

0+阅读 · 2023年7月4日

Optimal Surrogate Boundary Selection and Scalability Studies for the Shifted Boundary Method on Octree Meshes

Arxiv

0+阅读 · 2023年7月4日

VIP会员

文章信息

相关主题

赌博机/老虎机

相互独立的

state-of-the-art

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

乌克兰太空研究（2022-2024年） | 176页

【CMU博士论文】大型语言模型的隐性特性

国防领域人工智能走向何方？

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Range Avoidance for Constant-Depth Circuits: Hardness and Algorithms

Arxiv

0+阅读 · 2023年7月7日

PAC bounds of continuous Linear Parameter-Varying systems related to neural ODEs

Arxiv

0+阅读 · 2023年7月7日

BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

Arxiv

0+阅读 · 2023年7月7日

The computational asymptotics of Gaussian variational inference and the Laplace approximation

Arxiv

0+阅读 · 2023年7月5日

D-optimal Subsampling Design for Massive Data Linear Regression

Arxiv

0+阅读 · 2023年7月5日

Exact and Parameterized Algorithms for the Independent Cutset Problem

Arxiv

0+阅读 · 2023年7月5日

A $p$-step-ahead sequential adaptive algorithm for D-optimal nonlinear regression design

Arxiv

0+阅读 · 2023年7月5日

Conditional and Residual Methods in Scalable Coding for Humans and Machines

Arxiv

0+阅读 · 2023年7月4日

A Non-Classical Parameterization for Density Estimation Using Sample Moments

Arxiv

0+阅读 · 2023年7月4日

Optimal Surrogate Boundary Selection and Scalability Studies for the Shifted Boundary Method on Octree Meshes

Arxiv

0+阅读 · 2023年7月4日

相关基金

肠道靶向性骨髓间充质干细胞通过重建肠道微生态来治疗实验性IBD的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

循经取穴治疗原发性高血压的宿主代谢-肠道微生物Cross-talk机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

不同生境条件下齿瓣石斛不同生长时期菌根真菌多样性研究

国家自然科学基金

0+阅读 · 2013年12月31日

量子自旋格子系统的拓扑序、量子动力学和量子quench

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

次生布风层对分选流化床稳定性的协同作用

国家自然科学基金

0+阅读 · 2011年12月31日

结直肠癌细胞外基质的动态变化特征及其对上皮间质转化的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

扰动可积非哈密顿系统的极限环分支

国家自然科学基金

0+阅读 · 2011年12月31日

生态恢复对红壤严重侵蚀地土壤水库重建的影响与机制

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员