利用平均值:对《劳工法》中KL规范化的分析 (Leverage the Average: an Analysis of KL Regularization in RL) - 专知论文

会员服务 ·

0

正则化项 · Extensibility · 值迭代 · 估计误差 · Performer ·

2021 年 1 月 6 日

Leverage the Average: an Analysis of KL Regularization in RL

翻译：利用平均值:对《劳工法》中KL规范化的分析

Nino Vieillard,Tadashi Kozuno,Bruno Scherrer,Olivier Pietquin,Rémi Munos,Matthieu Geist

from arxiv, NeurIPS 2020

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance. Yet, only little is understood theoretically about why KL regularization helps, so far. We study KL regularization within an approximate value iteration scheme and show that it implicitly averages q-values. Leveraging this insight, we provide a very strong performance bound, the very first to combine two desirable aspects: a linear dependency to the horizon (instead of quadratic) and an error propagation term involving an averaging effect of the estimation errors (instead of an accumulation effect). We also study the more general case of an additional entropy regularizer. The resulting abstract scheme encompasses many existing RL algorithms. Some of our assumptions do not hold with neural networks, so we complement this theoretical analysis with an extensive empirical study.

翻译：最近利用Kullback-Leiber (KL) 正规化作为核心组成部分的强化学习算法(RL) 最近的使用 Kullback-Leiber (KL) 正规化作为核心组成部分的算法表现出了出色的表现。然而,理论上对KL 正规化迄今为止的帮助作用知之甚少。我们在一个近似值迭代方案范围内研究KL 正规化,并表明它隐含了平均值。我们利用了这一洞察力,提供了非常强大的实绩约束, 首先是结合了两个可取的方面: 对地平线的线性依赖性( 而不是对二次曲线的依赖性), 以及一个错误传播术语, 涉及估计误差的平均效果( 而不是累积效应 ) 。我们还研究了增加一个增制正弦化器的更一般案例。由此产生的抽象方案包含许多现有的RL 算法。我们的一些假设并不与神经网络相容, 因此我们用广泛的实验研究来补充这一理论分析。

0

相关内容

正则化项

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文推荐】最新五篇生成对抗网络相关论文—异构推理、姿态归一化图像生成、权重共享、对抗泛化方法、深层语义哈希、高分辨率深度卷积

【论文推荐】最新五篇生成对抗网络相关论文—异构推理、姿态归一化图像生成、权重共享、对抗泛化方法、深层语义哈希、高分辨率深度卷积

专知

7+阅读 · 2018年5月16日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Posterior Average Effects

Arxiv

0+阅读 · 2021年3月5日

Local Averaging Helps: Hierarchical Federated Learning and Convergence Analysis

Arxiv

0+阅读 · 2021年3月4日

Evaluation of Complexity Measures for Deep Learning Generalization in Medical Image Analysis

Arxiv

0+阅读 · 2021年3月4日

ByzShield: An Efficient and Robust System for Distributed Training

Arxiv

0+阅读 · 2021年3月4日

Adversarial Information Bottleneck

Arxiv

0+阅读 · 2021年3月3日

Online Adversarial Attacks

Arxiv

0+阅读 · 2021年3月2日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models

ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models

Arxiv

3+阅读 · 2018年9月6日

IEOPF: An Active Contour Model for Image Segmentation with Inhomogeneities Estimated by Orthogonal Primary Functions

Arxiv

10+阅读 · 2018年1月20日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文推荐】最新五篇生成对抗网络相关论文—异构推理、姿态归一化图像生成、权重共享、对抗泛化方法、深层语义哈希、高分辨率深度卷积

【论文推荐】最新五篇生成对抗网络相关论文—异构推理、姿态归一化图像生成、权重共享、对抗泛化方法、深层语义哈希、高分辨率深度卷积

专知

7+阅读 · 2018年5月16日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Posterior Average Effects

Arxiv

0+阅读 · 2021年3月5日

Local Averaging Helps: Hierarchical Federated Learning and Convergence Analysis

Arxiv

0+阅读 · 2021年3月4日

Evaluation of Complexity Measures for Deep Learning Generalization in Medical Image Analysis

Arxiv

0+阅读 · 2021年3月4日

ByzShield: An Efficient and Robust System for Distributed Training

Arxiv

0+阅读 · 2021年3月4日

Adversarial Information Bottleneck

Arxiv

0+阅读 · 2021年3月3日

Online Adversarial Attacks

Arxiv

0+阅读 · 2021年3月2日

Generalization and Regularization in DQN

Generalization and Regularization in DQN

Arxiv

6+阅读 · 2019年1月30日

ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models

ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models

Arxiv

3+阅读 · 2018年9月6日

IEOPF: An Active Contour Model for Image Segmentation with Inhomogeneities Estimated by Orthogonal Primary Functions

Arxiv

10+阅读 · 2018年1月20日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员