在有拖延的对立强盗中不进行加权-区域学习 (No Weighted-Regret Learning in Adversarial Bandits with Delays) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Weight · 学成 · 情景 · 代价函数 ·

2022 年 5 月 13 日

No Weighted-Regret Learning in Adversarial Bandits with Delays

翻译：在有拖延的对立强盗中不进行加权-区域学习

Ilai Bistritz,Zhengyuan Zhou,Xi Chen,Nicholas Bambos,Jose Blanchet

from arxiv, Accepted to JMLR. This is an extended journal version of the preliminary conference paper "Online EXP3 Learning in Adversarial Bandits with Delayed Feedback" published in Neurips 2019

Consider a scenario where a player chooses an action in each round $t$ out of $T$ rounds and observes the incurred cost after a delay of $d_{t}$ rounds. The cost functions and the delay sequence are chosen by an adversary. We show that in a non-cooperative game, the expected weighted ergodic distribution of play converges to the set of coarse correlated equilibria if players use algorithms that have "no weighted-regret" in the above scenario, even if they have linear regret due to too large delays. For a two-player zero-sum game, we show that no weighted-regret is sufficient for the weighted ergodic average of play to converge to the set of Nash equilibria. We prove that the FKM algorithm with $n$ dimensions achieves an expected regret of $O\left(nT^{\frac{3}{4}}+\sqrt{n}T^{\frac{1}{3}}D^{\frac{1}{3}}\right)$ and the EXP3 algorithm with $K$ arms achieves an expected regret of $O\left(\sqrt{\log K\left(KT+D\right)}\right)$ even when $D=\sum_{t=1}^{T}d_{t}$ and $T$ are unknown. These bounds use a novel doubling trick that, under mild assumptions, provably retains the regret bound for when $D$ and $T$ are known. Using these bounds, we show that FKM and EXP3 have no weighted-regret even for $d_{t}=O\left(t\log t\right)$. Therefore, algorithms with no weighted-regret can be used to approximate a CCE of a finite or convex unknown game that can only be simulated with bandit feedback, even if the simulation involves significant delays.

翻译：当玩家从$T 回合中选择每回合的美元动作时, 当玩家从$T 回合中选择每回合的美元, 并观察在拖延 $+$ 回合后发生的成本。成本函数和延迟序列由对手选择。我们显示, 在不合作的游戏中, 如果玩家使用在上述情景中“ 不加权- regret” 的算法, 即使他们由于太多的延迟而有线性遗憾。对于两玩家零和游戏, 我们显示, 任何加权- regret 的游戏均不足以让加权的游戏平均值与 Nash equiliblibraria 组合。我们证明, 如果玩家使用“ 没有加权- regdiscrit ” 的游戏, 则FKM 算算算算出“ 没有加权- regretretretal ” 组合, 也可以使用“ ligreal- drequetal $@ral\\\\\\\\\\\\ k rmax lax lax lax a lax lax lax a lax lax) un a lix, lax lix a lix lib lib lib lex lib lib lib lib lib lib lib lib lib lib 。

0

相关内容

赌博机/老虎机

赌博机/老虎机

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

18F-FEAU/HSV1-tk PET/CT显像无创性监测iPS细胞移植治疗脊髓损伤的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

磁性纳米粒蓄积诱导ROS激活自噬杀伤胶质瘤细胞的机制初探

国家自然科学基金

0+阅读 · 2013年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

ROS抑制DUSP6活性在ERK1/2诱导的放射性脑损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白乙酰化和去乙酰化对MRTF-A抗脑缺血诱导神经细胞凋亡的影响及机制

国家自然科学基金

0+阅读 · 2011年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

Hessian-Free Second-Order Adversarial Examples for Adversarial Learning

Hessian-Free Second-Order Adversarial Examples for Adversarial Learning

Arxiv

0+阅读 · 2022年7月4日

Autonomous Drug Design with Multi-armed Bandits

Arxiv

0+阅读 · 2022年7月4日

Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

Arxiv

0+阅读 · 2022年7月4日

Origins of Low-dimensional Adversarial Perturbations

Arxiv

0+阅读 · 2022年7月4日

Adversarial Bandits Robust to Switching Targets

Arxiv

0+阅读 · 2022年7月4日

Information Flow Guided Synthesis (Full Version)

Arxiv

0+阅读 · 2022年7月4日

Improved Generalization Bounds for Adversarially Robust Learning

Arxiv

0+阅读 · 2022年7月1日

Targeted learning in observational studies with multi-level treatments: An evaluation of antipsychotic drug treatment safety for patients with serious mental illnesses

Targeted learning in observational studies with multi-level treatments: An evaluation of antipsychotic drug treatment safety for patients with serious mental illnesses

Arxiv

0+阅读 · 2022年6月30日

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Arxiv

0+阅读 · 2022年6月29日

Off-the-grid learning of sparse mixtures from a continuous dictionary

Arxiv

0+阅读 · 2022年6月29日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

军事战术边缘计算的重要性

《欧洲天空盾牌倡议：应对无人机饱和攻击与高超音速导弹的多层防空演进与挑战》报告

《美军使用大语言模型技术生成领域特定文档》2025最新379页

《代理生成式人工智能与国家安全：提升竞争力的政策建议》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Hessian-Free Second-Order Adversarial Examples for Adversarial Learning

Hessian-Free Second-Order Adversarial Examples for Adversarial Learning

Arxiv

0+阅读 · 2022年7月4日

Autonomous Drug Design with Multi-armed Bandits

Arxiv

0+阅读 · 2022年7月4日

Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

Arxiv

0+阅读 · 2022年7月4日

Origins of Low-dimensional Adversarial Perturbations

Arxiv

0+阅读 · 2022年7月4日

Adversarial Bandits Robust to Switching Targets

Arxiv

0+阅读 · 2022年7月4日

Information Flow Guided Synthesis (Full Version)

Arxiv

0+阅读 · 2022年7月4日

Improved Generalization Bounds for Adversarially Robust Learning

Arxiv

0+阅读 · 2022年7月1日

Targeted learning in observational studies with multi-level treatments: An evaluation of antipsychotic drug treatment safety for patients with serious mental illnesses

Targeted learning in observational studies with multi-level treatments: An evaluation of antipsychotic drug treatment safety for patients with serious mental illnesses

Arxiv

0+阅读 · 2022年6月30日

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Arxiv

0+阅读 · 2022年6月29日

Off-the-grid learning of sparse mixtures from a continuous dictionary

Arxiv

0+阅读 · 2022年6月29日

相关基金

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

18F-FEAU/HSV1-tk PET/CT显像无创性监测iPS细胞移植治疗脊髓损伤的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

磁性纳米粒蓄积诱导ROS激活自噬杀伤胶质瘤细胞的机制初探

国家自然科学基金

0+阅读 · 2013年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

ROS抑制DUSP6活性在ERK1/2诱导的放射性脑损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白乙酰化和去乙酰化对MRTF-A抗脑缺血诱导神经细胞凋亡的影响及机制

国家自然科学基金

0+阅读 · 2011年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员