A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints - 专知论文

会员服务 ·

0

约束 · Processing（编程语言） · Learning · 情景 · 学习器 ·

2023 年 4 月 27 日

A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints

翻译：暂无翻译

Jacopo Germano,Francesco Emanuele Stradi,Gianmarco Genalti,Matteo Castiglioni,Alberto Marchesi,Nicola Gatti

We study online learning in episodic constrained Markov decision processes (CMDPs), where the goal of the learner is to collect as much reward as possible over the episodes, while guaranteeing that some long-term constraints are satisfied during the learning process. Rewards and constraints can be selected either stochastically or adversarially, and the transition function is not known to the learner. While online learning in classical unconstrained MDPs has received considerable attention over the last years, the setting of CMDPs is still largely unexplored. This is surprising, since in real-world applications, such as, e.g., autonomous driving, automated bidding, and recommender systems, there are usually additional constraints and specifications that an agent has to obey during the learning process. In this paper, we provide the first best-of-both-worlds algorithm for CMDPs with long-term constraints. Our algorithm is capable of handling settings in which rewards and constraints are selected either stochastically or adversarially, without requiring any knowledge of the underling process. Moreover, our algorithm matches state-of-the-art regret and constraint violation bounds for settings in which constraints are selected stochastically, while it is the first to provide guarantees in the case in which they are chosen adversarially.

翻译：暂无翻译

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

量子自旋格子系统的拓扑序、量子动力学和量子quench

国家自然科学基金

0+阅读 · 2012年12月31日

拓扑绝缘体薄膜与异质结制备及表面拓扑电子态研究

国家自然科学基金

0+阅读 · 2011年12月31日

Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Arxiv

0+阅读 · 2023年6月13日

POD-based reduced order methods for optimal control problems governed by parametric partial differential equation with varying boundary control

Arxiv

0+阅读 · 2023年6月13日

Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

Arxiv

0+阅读 · 2023年6月13日

Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

Arxiv

0+阅读 · 2023年6月12日

Viewpoint Generation using Feature-Based Constrained Spaces for Robot Vision Systems

Arxiv

0+阅读 · 2023年6月12日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能对军事行动进行建模》

《利用人工智能学习、优化与推演美国海军作战部队的战略布局与分散（续文）》

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

《指挥官意图消息中关键概念自动提取》最新47页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Arxiv

0+阅读 · 2023年6月13日

POD-based reduced order methods for optimal control problems governed by parametric partial differential equation with varying boundary control

Arxiv

0+阅读 · 2023年6月13日

Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

Arxiv

0+阅读 · 2023年6月13日

Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

Arxiv

0+阅读 · 2023年6月12日

Viewpoint Generation using Feature-Based Constrained Spaces for Robot Vision Systems

Arxiv

0+阅读 · 2023年6月12日

相关基金

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

量子自旋格子系统的拓扑序、量子动力学和量子quench

国家自然科学基金

0+阅读 · 2012年12月31日

拓扑绝缘体薄膜与异质结制备及表面拓扑电子态研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员