与非线性环境强盗和Markov决定程序不确定加权的腐败-Robbust 腐败比重和Markov决定程序 (Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes) - 专知论文

会员服务 ·

0

上下文赌博机/上下文老虎机 · 赌博机/老虎机 · 广义函数 · Weight · 线性的 ·

2022 年 12 月 12 日

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

翻译：与非线性环境强盗和Markov决定程序不确定加权的腐败-Robbust 腐败比重和Markov决定程序

Chenlu Ye,Wei Xiong,Quanquan Gu,Tong Zhang

from arxiv, We study the corruption-robust MDPs and contextual bandits with general function approximation

Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O}(\sqrt{T}\zeta)$ regret bound, where $T$ is the number of rounds and $\zeta$ is the total amount of corruption. In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$. The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit \citep{he2022nearly} and a new weighted estimator of uncertainty for the general function class. In contrast to the existing analysis that heavily relies on the linear structure, we develop a novel technique to control the sum of weighted uncertainty, thus establishing the final regret bounds. We then generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $\zeta$ in the scenario of general function approximation. Notably, our algorithms achieve regret bounds either nearly match the performance lower bound or improve the existing methods for all the corruption levels and in both known and unknown $\zeta$ cases.

翻译：尽管在反腐败对抗性腐败的强化学习(RL)问题上存在重大兴趣和进展,但目前的工程要么局限于线性设置,要么导致不希望看到的 $tilde{O}(sqrt{T ⁇ zeta) $(sqrt{T ⁇ Zezeta) 遗憾,因为美元是圆轮数,美元是腐败的总量。在本文中,我们考虑到具有一般功能近似值的背景土匪,并提出了一种计算效率的算法,以实现1美元(美元)的遗憾。拟议的算法要么局限于线性设置,要么导致最近开发的不确定性加权的最小方程从线性背景带回缩(sqrt{cit{ciep{he2022nearly}) 和对一般功能类别不确定性的新的加权估计值。与目前严重依赖线性结构的分析相比,我们开发了一种控制加权不确定性总和数值的新技术,从而确立了最后的遗憾界限。我们随后将我们的算法概括为Sepsodic MDP设置,并首先实现对腐败水平的增量依赖,即美元和我们已知的低级的智能状态,在一般的智能状态上都实现了。

0

相关内容

上下文赌博机/上下文老虎机

上下文赌博机/上下文老虎机

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于Cre/loxP系统的肝特异性表达REGγ转基因小鼠的建立及脂质代谢分析

国家自然科学基金

0+阅读 · 2013年12月31日

无线传感器网络中功率受限的分布式矢量估计

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

实时多次取样的超快太赫兹传感/成像技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

BEC的保几何结构数值模拟与研究

国家自然科学基金

0+阅读 · 2011年12月31日

随机微分方程中的参数估计与假设检验问题

国家自然科学基金

0+阅读 · 2009年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

A Framework for Overparameterized Learning

Arxiv

0+阅读 · 2023年2月13日

Universal Online Optimization in Dynamic Environments via Uniclass Prediction

Arxiv

0+阅读 · 2023年2月13日

Beyond UCB: Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

Arxiv

0+阅读 · 2023年2月12日

Approximate Factor Models with Weaker Loadings

Arxiv

0+阅读 · 2023年2月12日

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes

Arxiv

0+阅读 · 2023年2月12日

Online Meta-Learning For Hybrid Model-Based Deep Receivers

Arxiv

0+阅读 · 2023年2月11日

Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making

Arxiv

0+阅读 · 2023年2月10日

Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees

Arxiv

0+阅读 · 2023年2月10日

Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization

Arxiv

0+阅读 · 2023年2月9日

Importance Sampling Deterministic Annealing for Clustering

Arxiv

0+阅读 · 2023年2月9日

VIP会员

文章信息

相关主题

上下文赌博机/上下文老虎机

赌博机/老虎机

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Framework for Overparameterized Learning

Arxiv

0+阅读 · 2023年2月13日

Universal Online Optimization in Dynamic Environments via Uniclass Prediction

Arxiv

0+阅读 · 2023年2月13日

Beyond UCB: Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

Arxiv

0+阅读 · 2023年2月12日

Approximate Factor Models with Weaker Loadings

Arxiv

0+阅读 · 2023年2月12日

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes

Arxiv

0+阅读 · 2023年2月12日

Online Meta-Learning For Hybrid Model-Based Deep Receivers

Arxiv

0+阅读 · 2023年2月11日

Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making

Arxiv

0+阅读 · 2023年2月10日

Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees

Arxiv

0+阅读 · 2023年2月10日

Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization

Arxiv

0+阅读 · 2023年2月9日

Importance Sampling Deterministic Annealing for Clustering

Arxiv

0+阅读 · 2023年2月9日

相关基金

基于Cre/loxP系统的肝特异性表达REGγ转基因小鼠的建立及脂质代谢分析

国家自然科学基金

0+阅读 · 2013年12月31日

无线传感器网络中功率受限的分布式矢量估计

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

实时多次取样的超快太赫兹传感/成像技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

BEC的保几何结构数值模拟与研究

国家自然科学基金

0+阅读 · 2011年12月31日

随机微分方程中的参数估计与假设检验问题

国家自然科学基金

0+阅读 · 2009年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员