多武装强盗问题, 带有临时分解的奖赏: 当部分反馈计数时 (Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Extensibility · 情景 · 知识 (knowledge) · INFORMS ·

2022 年 6 月 1 日

Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

翻译：多武装强盗问题, 带有临时分解的奖赏: 当部分反馈计数时

Giulia Romano,Andrea Agostini,Francesco Trovò,Nicola Gatti,Marcello Restelli

There is a rising interest in industrial online applications where data becomes available sequentially. Inspired by the recommendation of playlists to users where their preferences can be collected during the listening of the entire playlist, we study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB), in which the stochastic reward associated with the pull of an arm is partitioned over a finite number of consecutive rounds following the pull. This setting, unexplored so far to the best of our knowledge, is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull instead of being fully disclosed in a single, potentially delayed round. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW, which exploit the partial information disclosed by the reward collected over time. We show that our algorithms provide better asymptotical regret upper bounds than delayed-feedback bandit algorithms when a property characterizing a broad set of reward structures of practical interest, namely alpha-smoothness, holds. We also empirically evaluate their performance across a wide range of settings, both synthetically generated and from a real-world media recommendation problem.

翻译：在连续提供数据的工业在线应用程序中,人们越来越感兴趣。在播放列表建议给用户的建议激励下,可以在整个播放列表中收集他们的喜好,我们研究一种新的土匪环境,即多武装盗匪和临时派奖赏(TP-MAB),在这种环境中,与拉动手臂有关的悬疑性奖赏被分成一定数量的连续轮,在拉动后,通过有限的连续几轮来分解。这种背景,迄今为止,我们最了解的情况尚未探讨,是延迟退缩的匪徒的自然延伸,在拉动之后,其奖赏可能超过一定时间范围,而不是在单一的、可能推迟的回合中充分披露。我们提供了两种算法来解决TP-MAB问题,即TP-UCB-FRFR和TP-UCB-EW,在拉动后,利用所收集的奖赏所披露的部分信息。我们算算算法比延迟退缩的土匪高。当我们从一个不折不扣式的媒体算法结构中评估其实际业绩,也就是从一个不折不折不折不扣的模拟的合成奖状结构中,从一个实际的合成奖项中产生。

0

相关内容

赌博机/老虎机

赌博机/老虎机

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

基于Ricci流与Normal Cycle理论的非限制环境下三维人脸识别研究

国家自然科学基金

0+阅读 · 2013年12月31日

我国梨野生种质的遗传多样性及谱系地理学研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

SRBSDV诱导的水稻代谢物及其在病害流行中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于云计算的建筑全生命期BIM集成与应用关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

喜温嗜酸硫杆菌Acidithiobacillus caldus基因组不稳定性对其环境适应性的影响

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

miR-146a遗传多态性与冠心病遗传易感性及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Lightweight Automated Feature Monitoring for Data Streams

Lightweight Automated Feature Monitoring for Data Streams

Arxiv

0+阅读 · 2022年7月18日

Robust Simulation-Based Inference in Cosmology with Bayesian Neural Networks

Arxiv

0+阅读 · 2022年7月18日

Local Clustering in Contextual Multi-Armed Bandits

Arxiv

0+阅读 · 2022年7月18日

Optimal Round and Sample-Size Complexity for Partitioning in Parallel Sorting

Arxiv

0+阅读 · 2022年7月17日

Collaborative Best Arm Identification with Limited Communication on Non-IID Data

Arxiv

0+阅读 · 2022年7月16日

Spatial point process via regularisation modelling of ambulance call risk

Arxiv

0+阅读 · 2022年7月16日

FRAS: Federated Reinforcement Learning empowered Adaptive Point Cloud Video Streaming

Arxiv

0+阅读 · 2022年7月15日

Approximation of Optimal Control Problems for the Navier-Stokes equation via multilinear HJB-POD

Arxiv

0+阅读 · 2022年7月15日

Linear prediction of point process times and marks

Arxiv

0+阅读 · 2022年7月15日

Learning to Separate Voices by Spatial Regions

Arxiv

0+阅读 · 2022年7月15日

VIP会员

文章信息

相关主题

赌博机/老虎机

知识 (knowledge)

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

相关论文

Lightweight Automated Feature Monitoring for Data Streams

Lightweight Automated Feature Monitoring for Data Streams

Arxiv

0+阅读 · 2022年7月18日

Robust Simulation-Based Inference in Cosmology with Bayesian Neural Networks

Arxiv

0+阅读 · 2022年7月18日

Local Clustering in Contextual Multi-Armed Bandits

Arxiv

0+阅读 · 2022年7月18日

Optimal Round and Sample-Size Complexity for Partitioning in Parallel Sorting

Arxiv

0+阅读 · 2022年7月17日

Collaborative Best Arm Identification with Limited Communication on Non-IID Data

Arxiv

0+阅读 · 2022年7月16日

Spatial point process via regularisation modelling of ambulance call risk

Arxiv

0+阅读 · 2022年7月16日

FRAS: Federated Reinforcement Learning empowered Adaptive Point Cloud Video Streaming

Arxiv

0+阅读 · 2022年7月15日

Approximation of Optimal Control Problems for the Navier-Stokes equation via multilinear HJB-POD

Arxiv

0+阅读 · 2022年7月15日

Linear prediction of point process times and marks

Arxiv

0+阅读 · 2022年7月15日

Learning to Separate Voices by Spatial Regions

Arxiv

0+阅读 · 2022年7月15日

相关基金

基于Ricci流与Normal Cycle理论的非限制环境下三维人脸识别研究

国家自然科学基金

0+阅读 · 2013年12月31日

我国梨野生种质的遗传多样性及谱系地理学研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

SRBSDV诱导的水稻代谢物及其在病害流行中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于云计算的建筑全生命期BIM集成与应用关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

喜温嗜酸硫杆菌Acidithiobacillus caldus基因组不稳定性对其环境适应性的影响

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

miR-146a遗传多态性与冠心病遗传易感性及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员