具有综合匿名反馈的非调查性强盗 (Nonstochastic Bandits with Composite Anonymous Feedback) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 情景 · 变换 · 损失 · CASE ·

2021 年 12 月 6 日

Nonstochastic Bandits with Composite Anonymous Feedback

翻译：具有综合匿名反馈的非调查性强盗

Nicolò Cesa-Bianchi,Tommaso Cesari,Roberto Colomboni,Claudio Gentile,Yishay Mansour

We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that the transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order $\sqrt{(d+1)KT}$, where $d$ is the maximum delay, $K$ is the number of arms, and $T$ is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting.

翻译：我们调查了一个非随机的匪徒环境,在这个环境里,行动的损失不是立即向玩家收取,而是以对抗的方式分散在随后的回合中。玩家在每轮结束时观察到的瞬间损失是先前所玩动作的许多损失组成部分的总和。这个环境作为一个特殊案例包括了有延迟反馈的匪徒较容易完成的任务, 这是一个经过仔细研究的框架, 玩家可以单独观察延迟的损失。我们的第一个贡献是将标准土匪算法转换成一个在较困难的环境中可以操作的算法: 我们用原始算法的稳定性和遗憾来约束已经改变的算法的遗憾。然后, 我们显示, 将一个经过适当调整的 FTRL 与 Tsalllis entropy 转换成一个价格为 $sqrt{( d+1)KT} 的遗憾, 美元是最大的延迟, 美元是武器的数量, 美元是时间跨度。我们表明, 通过展示一个比值( 到一个记录系数) 无法普遍改善我们的结果。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

专知会员服务

155+阅读 · 2021年5月9日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【经典书】图理论与应用，270页pdf

专知会员服务

86+阅读 · 2020年12月5日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

已删除

将门创投

4+阅读 · 2019年11月20日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

LibRec 精选：位置感知的长序列会话推荐

LibRec 精选：位置感知的长序列会话推荐

LibRec智能推荐

3+阅读 · 2019年5月17日

Remote Contextual Bandits

Remote Contextual Bandits

Arxiv

0+阅读 · 2022年2月10日

Cooperative Online Learning with Feedback Graphs

Arxiv

0+阅读 · 2022年2月10日

Gaussian Process Bandit Optimization with Few Batches

Arxiv

0+阅读 · 2022年2月10日

Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

Arxiv

0+阅读 · 2022年2月9日

Min Morse: Approximability & Applications

Arxiv

0+阅读 · 2022年2月9日

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

Arxiv

0+阅读 · 2022年2月9日

Optimal Clustering with Bandit Feedback

Arxiv

0+阅读 · 2022年2月9日

Budgeted Combinatorial Multi-Armed Bandits

Arxiv

0+阅读 · 2022年2月8日

Real Negatives Matter: Continuous Training with Real Negatives for Delayed Feedback Modeling

Arxiv

8+阅读 · 2021年4月29日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

专知会员服务

155+阅读 · 2021年5月9日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【经典书】图理论与应用，270页pdf

专知会员服务

86+阅读 · 2020年12月5日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体化人工智能：架构、应用及未来发展方向的综合综述

《自主武器》365页书籍

联邦学习综述：多层次聚合技术的系统分类、实验洞察与未来前沿

人工智能在空战中的局限及其真正适用领域

相关资讯

已删除

将门创投

4+阅读 · 2019年11月20日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

LibRec 精选：位置感知的长序列会话推荐

LibRec 精选：位置感知的长序列会话推荐

LibRec智能推荐

3+阅读 · 2019年5月17日

相关论文

Remote Contextual Bandits

Remote Contextual Bandits

Arxiv

0+阅读 · 2022年2月10日

Cooperative Online Learning with Feedback Graphs

Arxiv

0+阅读 · 2022年2月10日

Gaussian Process Bandit Optimization with Few Batches

Arxiv

0+阅读 · 2022年2月10日

Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

Arxiv

0+阅读 · 2022年2月9日

Min Morse: Approximability & Applications

Arxiv

0+阅读 · 2022年2月9日

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

Arxiv

0+阅读 · 2022年2月9日

Optimal Clustering with Bandit Feedback

Arxiv

0+阅读 · 2022年2月9日

Budgeted Combinatorial Multi-Armed Bandits

Arxiv

0+阅读 · 2022年2月8日

Real Negatives Matter: Continuous Training with Real Negatives for Delayed Feedback Modeling

Arxiv

8+阅读 · 2021年4月29日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

微信扫码咨询专知VIP会员