全球非静止多武装多武装强盗全球非固定性实时分析 (Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 回合 · Performer · 优化器 · MINE ·

2021 年 7 月 23 日

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

翻译：全球非静止多武装多武装强盗全球非固定性实时分析

Junpei Komiyama,Edouard Fouché,Junya Honda

We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time. We introduce the adaptive resetting bandit (ADR-bandit), which is a class of bandit algorithms that leverages adaptive windowing techniques from the data stream community. We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques, which are of independent interest in the data mining community. Furthermore, we conduct a finite-time analysis of ADR-bandit in two typical environments: an abrupt environment where changes occur instantaneously and a gradual environment where changes occur progressively. We demonstrate that ADR-bandit has nearly optimal performance when the abrupt or global changes occur in a coordinated manner that we call global changes. We demonstrate that forced exploration is unnecessary when we restrict the interest to the global changes. Unlike the existing nonstationary bandit algorithms, ADR-bandit has optimal performance in stationary environments as well as nonstationary environments with global changes. Our experiments show that the proposed algorithms outperform the existing approaches in synthetic and real-world environments.

翻译：我们考虑的是非静止的多武装土匪问题,因为随着时间推移武器变化的模型参数。我们引入了适应性重新定型土匪(ADR-bandit),这是一种利用数据流社区适应性窗口技术的一类土匪算法。我们首先对适应性窗口技术产生的估计者的质量提供新的保障,这些技术是数据采矿界独立感兴趣的。此外,我们还在两个典型环境中对ADR-bandit进行了有限的时间分析:即瞬间变化的突发环境,以及逐渐变化的环境。我们证明,当突变或全球变化以协调的方式发生时,ADR-bandit几乎具有最佳性能,我们称之为全球变化。我们证明,当我们限制对全球变化的兴趣时,强制探索是不必要的。与现有的非静止型土匪算法不同,ADR-bidit在固定环境中和与全球变化的非静止环境中的表现最优。我们的实验表明,拟议的算法超出了合成和现实世界环境中的现有方法。

0

相关内容

赌博机/老虎机

赌博机/老虎机

中科大《计算机体系结构》2021课程，附课件

中科大《计算机体系结构》2021课程，附课件

专知会员服务

77+阅读 · 2021年4月4日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

面向健康的大数据与人工智能，103页ppt

面向健康的大数据与人工智能，103页ppt

专知会员服务

117+阅读 · 2020年12月29日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】高动态环境的语义单目SLAM

【泡泡一分钟】高动态环境的语义单目SLAM

泡泡机器人SLAM

5+阅读 · 2019年3月27日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit

Arxiv

0+阅读 · 2021年9月23日

Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits

Arxiv

0+阅读 · 2021年9月23日

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年9月23日

Clustering performance analysis using new correlation based cluster validity indices

Arxiv

0+阅读 · 2021年9月23日

Stigmergy-based collision-avoidance algorithm for self-organising swarms

Stigmergy-based collision-avoidance algorithm for self-organising swarms

Arxiv

0+阅读 · 2021年9月22日

Modewise Operators, the Tensor Restricted Isometry Property, and Low-Rank Tensor Recovery

Arxiv

0+阅读 · 2021年9月21日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

中科大《计算机体系结构》2021课程，附课件

中科大《计算机体系结构》2021课程，附课件

专知会员服务

77+阅读 · 2021年4月4日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

面向健康的大数据与人工智能，103页ppt

面向健康的大数据与人工智能，103页ppt

专知会员服务

117+阅读 · 2020年12月29日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身智能的多模态数据存储与检索：综述

《算法战争研究计划全景评估》35页

【CMU博士论文】水下三维视觉感知与生成

智能体战争：自主人工智能军备竞赛全景透视

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】高动态环境的语义单目SLAM

【泡泡一分钟】高动态环境的语义单目SLAM

泡泡机器人SLAM

5+阅读 · 2019年3月27日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit

Arxiv

0+阅读 · 2021年9月23日

Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits

Arxiv

0+阅读 · 2021年9月23日

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年9月23日

Clustering performance analysis using new correlation based cluster validity indices

Arxiv

0+阅读 · 2021年9月23日

Stigmergy-based collision-avoidance algorithm for self-organising swarms

Stigmergy-based collision-avoidance algorithm for self-organising swarms

Arxiv

0+阅读 · 2021年9月22日

Modewise Operators, the Tensor Restricted Isometry Property, and Low-Rank Tensor Recovery

Arxiv

0+阅读 · 2021年9月21日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Unsupervised Meta-Learning for Reinforcement Learning

Arxiv

8+阅读 · 2018年6月12日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

微信扫码咨询专知VIP会员