动态线性线性土匪 (Dynamical Linear Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 线性的 · 隐状态 · 情景 · Extensibility ·

2022 年 11 月 16 日

Dynamical Linear Bandits

翻译：动态线性线性土匪

Marco Mussi,Alberto Maria Metelli,Marcello Restelli

In many real-world sequential decision-making problems, an action does not immediately reflect on the feedback and spreads its effects over a long time frame. For instance, in online advertising, investing in a platform produces an increase of awareness, but the actual reward, i.e., a conversion, might occur far in the future. Furthermore, whether a conversion takes place depends on: how fast the awareness grows, its vanishing effects, and the synergy or interference with other advertising platforms. Previous work has investigated the Multi-Armed Bandit framework with the possibility of delayed and aggregated feedback, without a particular structure on how an action propagates in the future, disregarding possible dynamical effects. In this paper, we introduce a novel setting, the Dynamical Linear Bandits (DLB), an extension of the linear bandits characterized by a hidden state. When an action is performed, the learner observes a noisy reward whose mean is a linear function of the hidden state and of the action. Then, the hidden state evolves according to a linear dynamics, affected by the performed action too. We start by introducing the setting, discussing the notion of optimal policy, and deriving an expected regret lower bound. Then, we provide an any-time optimistic regret minimization algorithm, Dynamical Linear Upper Confidence Bound (DynLin-UCB), that suffers an expected regret of order O(c d sqrt(T)), where c is a constant dependent on the properties of the linear dynamical evolution, and d is the dimension of the action vector. Finally, we conduct a numerical validation on a synthetic environment and on real-world data to show the effectiveness of DynLin-UCB in comparison with several baselines.

翻译：在许多现实世界的顺序决策问题中,一项行动并不立即反思反馈,并在很长的时间内传播其影响。例如,在网上广告中,投资于平台会提高人们的认识,但实际的奖励,即转换,可能会在未来发生。此外,转换是否发生取决于:认识增长的速度、其消失的影响,以及与其他广告平台的协同作用或干扰。此前的工作调查了多Armed Bandit 框架,有可能延迟和汇总依赖性反馈,而没有关于一项行动如何在未来传播的特定结构,无视可能的动态效应。在本文中,我们引入了一个新的设置,即动态线性基线(DLB),这是以隐藏状态为特征的线性强盗的延伸。在采取行动时,学习者会看到一个响亮的奖励,其意思是隐藏状态和动作的线性功能。随后,隐藏状态根据线性动态动态变化演变,受所实施的行动也受到影响。我们开始引入关于最佳政策的设置,讨论最佳政策的概念,并且将动态性D值的数值推算出一个我们所期望的B级的数值排序。

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Allo-HSCT后NEU1介导GPIbα去唾液酸化在持续性血小板减少症发生机制中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

非线性Cahn-Hilliard型方程自适应高阶稳定数值方法分析

国家自然科学基金

0+阅读 · 2013年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

游离脂肪酸对人血管平滑肌细胞TLR4信号调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

复杂地层和环境下软土盾构隧道开挖面稳定渐进破坏分析

国家自然科学基金

0+阅读 · 2009年12月31日

组织因子途径抑制物2对小鼠动脉粥样硬化不稳定斑块的保护作用及机制探讨

国家自然科学基金

0+阅读 · 2009年12月31日

Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年1月13日

Memory Efficient Continual Learning with Transformers

Memory Efficient Continual Learning with Transformers

Arxiv

0+阅读 · 2023年1月13日

A Nearly-Linear Time Algorithm for Minimizing Risk of Conflict in Social Networks

Arxiv

0+阅读 · 2023年1月13日

Learning Dynamical Systems From Invariant Measures

Arxiv

0+阅读 · 2023年1月13日

Learning to Perceive in Deep Model-Free Reinforcement Learning

Arxiv

0+阅读 · 2023年1月13日

Distributional Robustness Bounds Generalization Errors

Arxiv

0+阅读 · 2023年1月12日

Approximate Information States for Worst-Case Control and Learning in Uncertain Systems

Arxiv

0+阅读 · 2023年1月12日

Maximum likelihood estimation and prediction error for a Mat{é}rn model on the circle

Arxiv

0+阅读 · 2023年1月12日

Learning Implicit Priors for Motion Optimization

Arxiv

0+阅读 · 2023年1月11日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年1月13日

Memory Efficient Continual Learning with Transformers

Memory Efficient Continual Learning with Transformers

Arxiv

0+阅读 · 2023年1月13日

A Nearly-Linear Time Algorithm for Minimizing Risk of Conflict in Social Networks

Arxiv

0+阅读 · 2023年1月13日

Learning Dynamical Systems From Invariant Measures

Arxiv

0+阅读 · 2023年1月13日

Learning to Perceive in Deep Model-Free Reinforcement Learning

Arxiv

0+阅读 · 2023年1月13日

Distributional Robustness Bounds Generalization Errors

Arxiv

0+阅读 · 2023年1月12日

Approximate Information States for Worst-Case Control and Learning in Uncertain Systems

Arxiv

0+阅读 · 2023年1月12日

Maximum likelihood estimation and prediction error for a Mat{é}rn model on the circle

Arxiv

0+阅读 · 2023年1月12日

Learning Implicit Priors for Motion Optimization

Arxiv

0+阅读 · 2023年1月11日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

相关基金

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Allo-HSCT后NEU1介导GPIbα去唾液酸化在持续性血小板减少症发生机制中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

非线性Cahn-Hilliard型方程自适应高阶稳定数值方法分析

国家自然科学基金

0+阅读 · 2013年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

游离脂肪酸对人血管平滑肌细胞TLR4信号调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

复杂地层和环境下软土盾构隧道开挖面稳定渐进破坏分析

国家自然科学基金

0+阅读 · 2009年12月31日

组织因子途径抑制物2对小鼠动脉粥样硬化不稳定斑块的保护作用及机制探讨

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员