以 Markovian 和 i.i.d. 设置为基准的适应性 KL-UCB Bandit 算法设置 (Adaptive KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings) - 专知论文

会员服务 ·

0

赌博机/老虎机 · ARM · 可辨认的 · 马尔可夫链 · 情景 ·

2021 年 7 月 31 日

Adaptive KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

翻译：以 Markovian 和 i.i.d. 设置为基准的适应性 KL-UCB Bandit 算法设置

Arghyadip Roy,Sanjay Shakkottai,R. Srikant

In the regret-based formulation of multi-armed bandit (MAB) problems, except in rare instances, much of the literature focuses on arms with i.i.d. rewards. In this paper, we consider the problem of obtaining regret guarantees for MAB problems in which the rewards of each arm form a Markov chain which may not belong to a single parameter exponential family. To achieve logarithmic regret in such problems is not difficult: a variation of standard KL-UCB does the job. However, the constants obtained from such an analysis are poor for the following reason: i.i.d. rewards are a special case of Markov rewards and it is difficult to design an algorithm that works well independent of whether the underlying model is truly Markovian or i.i.d. To overcome this issue, we introduce a novel algorithm that identifies whether the rewards from each arm are truly Markovian or i.i.d. using a Hellinger distance-based test. Our algorithm then switches from using a standard KL-UCB to a specialized version of KL-UCB when it determines that the arm reward is Markovian, thus resulting in low regret for both i.i.d. and Markovian settings.

翻译：在多武装匪徒(MAB)问题的基于遗憾的表述中,除了极少数情况外,许多文献都集中在武器上,以一.d.奖励为特例。在本文中,我们考虑了如何为MAB问题获得遗憾保证的问题,在这些问题上,每只手臂的奖赏形成一个可能不属于单一参数指数式家族的Markov链条。要在这些问题上实现对数的遗憾并不困难:标准KL-UCB的变异工作。然而,从这种分析中获得的常数之所以差,原因如下:一.i.d.奖赏是Markov奖赏的一个特例,很难设计出一种完全独立于基本模型是Markovian还是i.d.的算法。为了克服这一问题,我们采用了一种新奇的算法,确定每只手臂的奖赏是否真正是Markovian或i.d.,使用Hellinger远程测试。我们的算法随后将使用标准KL-UCB的常数转换为KL-UCB的专门版本,当它确定奖赏是Markovian和Markovi的低遗憾。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

4+阅读 · 2018年11月20日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Algorithm is Experiment: Machine Learning, Market Design, and Policy Eligibility Rules

Arxiv

0+阅读 · 2021年10月4日

Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

Arxiv

0+阅读 · 2021年10月2日

Best Principal Submatrix Selection for the Maximum Entropy Sampling Problem: Scalable Algorithms and Performance Guarantees

Best Principal Submatrix Selection for the Maximum Entropy Sampling Problem: Scalable Algorithms and Performance Guarantees

Arxiv

0+阅读 · 2021年10月1日

Efficient Importance Sampling for Large Sums of Independent and Identically Distributed Random Variables

Arxiv

0+阅读 · 2021年10月1日

Adapting Bandit Algorithms for Settings with Sequentially Available Arms

Adapting Bandit Algorithms for Settings with Sequentially Available Arms

Arxiv

0+阅读 · 2021年9月30日

Multi-index Antithetic Stochastic Gradient Algorithm

Arxiv

0+阅读 · 2021年9月30日

Adaptive stratified sampling for non-smooth problems

Arxiv

0+阅读 · 2021年9月30日

A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms

Arxiv

0+阅读 · 2021年9月30日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

A Dual Approach to Scalable Verification of Deep Networks

A Dual Approach to Scalable Verification of Deep Networks

Arxiv

3+阅读 · 2018年8月3日

VIP会员

文章信息

相关主题

赌博机/老虎机

马尔可夫链

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

4+阅读 · 2018年11月20日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Algorithm is Experiment: Machine Learning, Market Design, and Policy Eligibility Rules

Arxiv

0+阅读 · 2021年10月4日

Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

Arxiv

0+阅读 · 2021年10月2日

Best Principal Submatrix Selection for the Maximum Entropy Sampling Problem: Scalable Algorithms and Performance Guarantees

Best Principal Submatrix Selection for the Maximum Entropy Sampling Problem: Scalable Algorithms and Performance Guarantees

Arxiv

0+阅读 · 2021年10月1日

Efficient Importance Sampling for Large Sums of Independent and Identically Distributed Random Variables

Arxiv

0+阅读 · 2021年10月1日

Adapting Bandit Algorithms for Settings with Sequentially Available Arms

Adapting Bandit Algorithms for Settings with Sequentially Available Arms

Arxiv

0+阅读 · 2021年9月30日

Multi-index Antithetic Stochastic Gradient Algorithm

Arxiv

0+阅读 · 2021年9月30日

Adaptive stratified sampling for non-smooth problems

Arxiv

0+阅读 · 2021年9月30日

A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms

Arxiv

0+阅读 · 2021年9月30日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

A Dual Approach to Scalable Verification of Deep Networks

A Dual Approach to Scalable Verification of Deep Networks

Arxiv

3+阅读 · 2018年8月3日

微信扫码咨询专知VIP会员