了解多任务政策优化中的默认政策 (Towards an Understanding of Default Policies in Multitask Policy Optimization) - 专知论文

会员服务 ·

0

可理解性 · Performer · 优化器 · Principle · 正则化项 ·

2022 年 1 月 19 日

Towards an Understanding of Default Policies in Multitask Policy Optimization

翻译：了解多任务政策优化中的默认政策

Ted Moskovitz,Michael Arbel,Jack Parker-Holder,Aldo Pacchiano

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training more generally capable agents. Here, we take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. Using these results, we then derive a principled RPO algorithm for multitask learning with strong performance guarantees.

翻译：最近深层强化学习的成功在很大程度上是由正规化的政策优化算法(RPO)驱动的,该算法在多个领域都有很强的绩效。在这套方法中,对代理人进行了培训,以最大限度地增加累积报酬,同时惩罚偏离某些参考或默认政策的行为。除了经验成功外,还有坚实的理论基础来理解适用于单项任务的RPO方法,与自然梯度、信任区域和变异方法相关联。然而,对于多任务环境中的默认政策的适当属性,正式理解有限,而当实地转向培训更普遍能力强的代理人时,这是一个日益重要的领域。在这里,我们迈出了第一步,将默认政策的质量与对优化的影响正式联系起来,从而填补这一空白。然后利用这些结果,我们得出了具有强有力的绩效保障的多任务学习原则性 RPO算法。

0

相关内容

可理解性

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

非局部Schrödinger方程的高效守恒算法

国家自然科学基金

0+阅读 · 2015年12月31日

可积系统的代数与几何结构

国家自然科学基金

0+阅读 · 2013年12月31日

c-Abl基因缺失与PrPSc诱导神经元细胞氧化应激机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

图像处理中的Toeplitz矩阵压缩恢复理论与快速算法

国家自然科学基金

0+阅读 · 2012年12月31日

基于复值ICA和张量分解的完备fMRI数据分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

缺血脑损伤中TRPM7/ChaK1介导神经元Annexin 1膜转位及分泌在小胶质细胞活化中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于零空间追踪的电压波动与闪变检测算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂三维建筑物模型的LOD定量规划与自动简化方法

国家自然科学基金

0+阅读 · 2011年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于梯度Kriging方法的轮胎花纹形状优化

国家自然科学基金

0+阅读 · 2011年12月31日

Hephaestus: A large scale multitask dataset towards InSAR understanding

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Arxiv

0+阅读 · 2022年4月18日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Selection of proposal distributions for multiple importance sampling

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

Arxiv

0+阅读 · 2022年4月17日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

Enforcing fairness in private federated learning via the modified method of differential multipliers

Enforcing fairness in private federated learning via the modified method of differential multipliers

Arxiv

0+阅读 · 2022年4月15日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025】基于奖励引导解码的多模态大语言模型控制

【CMU博士论文】基于深度学习的高效贝叶斯实验设计

《数据安全国家标准体系（2025版）》征求意见稿

2025年中国AI算力基础设施发展趋势洞察

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Hephaestus: A large scale multitask dataset towards InSAR understanding

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Arxiv

0+阅读 · 2022年4月18日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Selection of proposal distributions for multiple importance sampling

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

Arxiv

0+阅读 · 2022年4月17日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

Enforcing fairness in private federated learning via the modified method of differential multipliers

Enforcing fairness in private federated learning via the modified method of differential multipliers

Arxiv

0+阅读 · 2022年4月15日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

相关基金

非局部Schrödinger方程的高效守恒算法

国家自然科学基金

0+阅读 · 2015年12月31日

可积系统的代数与几何结构

国家自然科学基金

0+阅读 · 2013年12月31日

c-Abl基因缺失与PrPSc诱导神经元细胞氧化应激机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

图像处理中的Toeplitz矩阵压缩恢复理论与快速算法

国家自然科学基金

0+阅读 · 2012年12月31日

基于复值ICA和张量分解的完备fMRI数据分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

缺血脑损伤中TRPM7/ChaK1介导神经元Annexin 1膜转位及分泌在小胶质细胞活化中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

基于零空间追踪的电压波动与闪变检测算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂三维建筑物模型的LOD定量规划与自动简化方法

国家自然科学基金

0+阅读 · 2011年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于梯度Kriging方法的轮胎花纹形状优化

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员