动态赌博机及其带有自回归时间结构的自适应机制 (Dynamic Bandits with an Auto-Regressive Temporal Structure) - 专知论文

会员服务 ·

0

赌博机 · 结构 · 算法 · 自适应 · 动态性 ·

2023 年 4 月 5 日

Dynamic Bandits with an Auto-Regressive Temporal Structure

翻译：动态赌博机及其带有自回归时间结构的自适应机制

Qinyi Chen,Negin Golrezaei,Djallel Bouneffouf

from arxiv, 41 pages, 4 figures

Multi-armed bandit (MAB) problems are mainly studied under two extreme settings known as stochastic and adversarial. These two settings, however, do not capture realistic environments such as search engines and marketing and advertising, in which rewards stochastically change in time. Motivated by that, we introduce and study a dynamic MAB problem with stochastic temporal structure, where the expected reward of each arm is governed by an auto-regressive (AR) model. Due to the dynamic nature of the rewards, simple "explore and commit" policies fail, as all arms have to be explored continuously over time. We formalize this by characterizing a per-round regret lower bound, where the regret is measured against a strong (dynamic) benchmark. We then present an algorithm whose per-round regret almost matches our regret lower bound. Our algorithm relies on two mechanisms: (i) alternating between recently pulled arms and unpulled arms with potential, and (ii) restarting. These mechanisms enable the algorithm to dynamically adapt to changes and discard irrelevant past information at a suitable rate. In numerical studies, we further demonstrate the strength of our algorithm under non-stationary settings.

翻译：多臂赌博机（MAB）问题主要在随机和对抗两个极端设置下进行研究。然而，这两种设置不能捕捉到搜索引擎、市场和广告等实际环境，这些环境中奖励在时间上随机变化。受此启发，我们引入并研究了带有随机时间结构的动态MAB问题，其中每个臂的期望奖励由自回归（AR）模型控制。由于奖励的动态性质，简单的“探索和承诺”策略失败了，因为所有臂都必须持续探索。我们通过表征每轮遗憾下限来形式化这一点，其中遗憾以与强（动态）基准相比进行衡量。然后，我们提出了一种算法，其每轮遗憾几乎与我们的遗憾下限相匹配。我们的算法依赖于两种机制：（i）交替选择最近被拉臂和具有潜力的未被拉臂和（ii）重新启动。这些机制使算法能够动态适应变化，并以适当的速度丢弃无关的过去信息。在数值研究中，我们进一步证明了我们的算法在非稳态设置下的强大性能。

0

相关内容

赌博机

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

【书籍】优化与编程：线性、非线性、动态、随机和Matlab应用，Optimizations and Programming: Linear, Nonlinear, Dynamic, Stochastic and Applications with Matlab

【书籍】优化与编程：线性、非线性、动态、随机和Matlab应用，Optimizations and Programming: Linear, Nonlinear, Dynamic, Stochastic and Applications with Matlab

专知会员服务

28+阅读 · 2022年4月8日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【ICML2020】强化学习中基于模型的方法，279页ppt

【ICML2020】强化学习中基于模型的方法，279页ppt

专知会员服务

47+阅读 · 2020年10月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新六篇推荐系统相关论文—注意力机制、多任务、协同跨网络、非结构化文本、TransRev、章节推荐

【论文推荐】最新六篇推荐系统相关论文—注意力机制、多任务、协同跨网络、非结构化文本、TransRev、章节推荐

专知

12+阅读 · 2018年4月26日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

非线性Cahn-Hilliard型方程自适应高阶稳定数值方法分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于弱线性回归树在线学习的自适应视频目标检测算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

针对结构健康监测的无线传感网校准问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

无机纳米药物载体影响血浆蛋白结构及功能的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

动态和粘弹性断裂力学分析的解析奇异单元

国家自然科学基金

0+阅读 · 2012年12月31日

基于有限理性的动态定价模型及其实证研究

国家自然科学基金

1+阅读 · 2012年12月31日

Internet环境下构件的自适应组装与验证研究

国家自然科学基金

0+阅读 · 2012年12月31日

可变带宽交换光网络的自适应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

含超支化结构水凝胶及控释特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

垃圾邮件过滤的优化目标、建模及顺序回归研究

国家自然科学基金

0+阅读 · 2009年12月31日

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

Arxiv

0+阅读 · 2023年5月25日

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Arxiv

0+阅读 · 2023年5月25日

Fully Dynamic Algorithm for Constrained Submodular Optimization

Arxiv

0+阅读 · 2023年5月24日

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

Arxiv

0+阅读 · 2023年5月24日

Challenges of ELA-guided Function Evolution using Genetic Programming

Arxiv

0+阅读 · 2023年5月24日

IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers

Arxiv

0+阅读 · 2023年5月24日

Dynamical noise can enhance high-order statistical structure in complex systems

Arxiv

0+阅读 · 2023年5月22日

Sequential Memory with Temporal Predictive Coding

Arxiv

0+阅读 · 2023年5月19日

Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion

Arxiv

0+阅读 · 2023年5月19日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

VIP会员

文章信息

相关主题

相关VIP内容

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

【书籍】优化与编程：线性、非线性、动态、随机和Matlab应用，Optimizations and Programming: Linear, Nonlinear, Dynamic, Stochastic and Applications with Matlab

【书籍】优化与编程：线性、非线性、动态、随机和Matlab应用，Optimizations and Programming: Linear, Nonlinear, Dynamic, Stochastic and Applications with Matlab

专知会员服务

28+阅读 · 2022年4月8日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【ICML2020】强化学习中基于模型的方法，279页ppt

【ICML2020】强化学习中基于模型的方法，279页ppt

专知会员服务

47+阅读 · 2020年10月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新六篇推荐系统相关论文—注意力机制、多任务、协同跨网络、非结构化文本、TransRev、章节推荐

【论文推荐】最新六篇推荐系统相关论文—注意力机制、多任务、协同跨网络、非结构化文本、TransRev、章节推荐

专知

12+阅读 · 2018年4月26日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

Arxiv

0+阅读 · 2023年5月25日

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Arxiv

0+阅读 · 2023年5月25日

Fully Dynamic Algorithm for Constrained Submodular Optimization

Arxiv

0+阅读 · 2023年5月24日

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

Arxiv

0+阅读 · 2023年5月24日

Challenges of ELA-guided Function Evolution using Genetic Programming

Arxiv

0+阅读 · 2023年5月24日

IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers

Arxiv

0+阅读 · 2023年5月24日

Dynamical noise can enhance high-order statistical structure in complex systems

Arxiv

0+阅读 · 2023年5月22日

Sequential Memory with Temporal Predictive Coding

Arxiv

0+阅读 · 2023年5月19日

Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion

Arxiv

0+阅读 · 2023年5月19日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

相关基金

非线性Cahn-Hilliard型方程自适应高阶稳定数值方法分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于弱线性回归树在线学习的自适应视频目标检测算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

针对结构健康监测的无线传感网校准问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

无机纳米药物载体影响血浆蛋白结构及功能的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

动态和粘弹性断裂力学分析的解析奇异单元

国家自然科学基金

0+阅读 · 2012年12月31日

基于有限理性的动态定价模型及其实证研究

国家自然科学基金

1+阅读 · 2012年12月31日

Internet环境下构件的自适应组装与验证研究

国家自然科学基金

0+阅读 · 2012年12月31日

可变带宽交换光网络的自适应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

含超支化结构水凝胶及控释特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

垃圾邮件过滤的优化目标、建模及顺序回归研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员