通过减少无区域减少无区域减少教育实现可实现重设无强化学习 (Provable Reset-free Reinforcement Learning by No-Regret Reduction) - 专知论文

会员服务 ·

0

Learning · Performer · 强化学习 · Markov · 线性的 ·

2023 年 1 月 6 日

Provable Reset-free Reinforcement Learning by No-Regret Reduction

翻译：通过减少无区域减少无区域减少教育实现可实现重设无强化学习

Hoai-An Nguyen,Ching-An Cheng

from arxiv, Full version of the paper accepted to AAAI 2023 RL4PROD Workshop. Full version of the paper also submitted to a conference and under review to be published

Real-world reinforcement learning (RL) is often severely limited since typical RL algorithms heavily rely on the reset mechanism to sample proper initial states. In practice, the reset mechanism is expensive to implement due to the need for human intervention or heavily engineered environments. To make learning more practical, we propose a generic no-regret reduction to systematically design reset-free RL algorithms. Our reduction turns reset-free RL into a two-player game. We show that achieving sublinear regret in this two player game would imply learning a policy that has both sublinear performance regret and sublinear total number of resets in the original RL problem. This means that the agent eventually learns to perform optimally and avoid resets. By this reduction, we design an instantiation for linear Markov decision processes, which is the first provably correct reset-free RL algorithm to our knowledge.

翻译：由于典型的 RL 算法严重依赖重置机制来测试正确的初始状态,因此通常会受到严重限制,因为典型的 RL 算法严重依赖重置机制来测试正确的初始状态。实际上, 重置机制由于需要人际干预或设计得力的环境, 实施成本昂贵。为了让学习更加实用, 我们建议使用一种通用的无雷特减法来系统设计无重置 RL 算法。我们的降法将无重置 RL 转换成一个双玩家游戏。我们显示, 实现这两次玩家游戏的亚线性差将意味着学习一种政策, 该政策在原始 RL 问题中既具有亚线性性性表现的遗憾, 也具有次线性的总的累累。这意味着代理商最终学会了最佳的操作, 并避免重置。通过这一减法, 我们为线性Markov 决策程序设计了一个即我们知识中第一个可辨正确重置 RL 的重置 RL 算法。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

EGFR信号通路调控肿瘤相关巨噬细胞极化的机制及其在细胞恶性转化中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

P42.3通过TGF-β信号通路介导大肠癌上皮-间质转化（EMT）促进侵袭转移的发生及其调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

LncRNA MEG3在mTOR促进骨肉瘤恶性增殖中的作用及其调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

AGEs-MAPK-ENaCs信号通路在高血压水盐代谢中的调控作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-Cyren与舌鳞癌预后的关系及调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Trx1/FOXO1信号通路调控肝癌耐药的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Epimorphin调控的miR-107在肝癌侵袭和转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Crif1调控Nrf2-ARE信号通路促进BMSCs抗辐射损伤机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

辐射诱导的人鼻咽癌耐药细胞株(CNE1/R)中PECAM-1信号传导机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

Learning not to Regret

Arxiv

0+阅读 · 2023年3月2日

Breaking the Curse of Multiagency: Provably Efficient Decentralized Multi-Agent RL with Function Approximation

Arxiv

0+阅读 · 2023年3月2日

Parameter Sharing with Network Pruning for Scalable Multi-Agent Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

Arxiv

0+阅读 · 2023年3月1日

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

Arxiv

0+阅读 · 2023年3月1日

A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

91+阅读 · 2022年1月14日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

Learning not to Regret

Arxiv

0+阅读 · 2023年3月2日

Breaking the Curse of Multiagency: Provably Efficient Decentralized Multi-Agent RL with Function Approximation

Arxiv

0+阅读 · 2023年3月2日

Parameter Sharing with Network Pruning for Scalable Multi-Agent Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

Arxiv

0+阅读 · 2023年3月1日

DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

Arxiv

0+阅读 · 2023年3月1日

A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

91+阅读 · 2022年1月14日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

相关基金

EGFR信号通路调控肿瘤相关巨噬细胞极化的机制及其在细胞恶性转化中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

P42.3通过TGF-β信号通路介导大肠癌上皮-间质转化（EMT）促进侵袭转移的发生及其调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

LncRNA MEG3在mTOR促进骨肉瘤恶性增殖中的作用及其调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

AGEs-MAPK-ENaCs信号通路在高血压水盐代谢中的调控作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-Cyren与舌鳞癌预后的关系及调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Trx1/FOXO1信号通路调控肝癌耐药的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Epimorphin调控的miR-107在肝癌侵袭和转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Crif1调控Nrf2-ARE信号通路促进BMSCs抗辐射损伤机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

辐射诱导的人鼻咽癌耐药细胞株(CNE1/R)中PECAM-1信号传导机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员