安全重放 (Replay For Safety) - 专知论文

会员服务 ·

0

经验回放 · 有偏 · Buffer（公司） · 优化器 · 经验池 ·

2021 年 12 月 8 日

Replay For Safety

翻译：安全重放

Liran Szlak,Ohad Shamir

Experience replay \citep{lin1993reinforcement, mnih2015human} is a widely used technique to achieve efficient use of data and improved performance in RL algorithms. In experience replay, past transitions are stored in a memory buffer and re-used during learning. Various suggestions for sampling schemes from the replay buffer have been suggested in previous works, attempting to optimally choose those experiences which will most contribute to the convergence to an optimal policy. Here, we give some conditions on the replay sampling scheme that will ensure convergence, focusing on the well-known Q-learning algorithm in the tabular setting. After establishing sufficient conditions for convergence, we turn to suggest a slightly different usage for experience replay - replaying memories in a biased manner as a means to change the properties of the resulting policy. We initiate a rigorous study of experience replay as a tool to control and modify the properties of the resulting policy. In particular, we show that using an appropriate biased sampling scheme can allow us to achieve a \emph{safe} policy. We believe that using experience replay as a biasing mechanism that allows controlling the resulting policy in desirable ways is an idea with promising potential for many applications.

翻译：经验重现 \ citep{ lin1993 referencement, mnih2015human} 是广泛使用的一种技术, 以实现数据的高效使用和改进RL 算法的性能。在经验重放中, 过去过渡被存储在记忆缓冲中, 并在学习期间再次使用。在先前的作品中, 对重放缓冲的抽样方案提出了各种建议, 试图以最佳方式选择那些最有助于形成最佳政策统一的经验。在这里, 我们给重播抽样方案提供一些条件, 以确保趋同, 重点是列表环境中众所周知的Q- 学习算法。在为趋同创造足够的条件后, 我们转而建议对经验重播略为不同的使用 -- 以偏见的方式重现记忆, 以此改变所产生政策的性质。我们开始对重播经验进行严格研究, 作为控制和修改所产生政策属性的工具。特别是, 我们表明, 使用适当的偏移抽样方案可以让我们实现 \ emph{ safefelf} 政策。我们认为, 将经验重放作为偏向偏向性机制, 使得能够以许多可取的方式控制由此产生的政策。

0

相关内容

经验回放

Python编程基础，121页ppt

Python编程基础，121页ppt

专知会员服务

49+阅读 · 2021年1月1日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

已删除

将门创投

3+阅读 · 2019年4月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

PCENet: High Dimensional Deep Surrogate Modeling

Arxiv

0+阅读 · 2022年2月10日

The effective noise of Stochastic Gradient Descent

Arxiv

0+阅读 · 2022年2月10日

Batch size-invariance for policy optimization

Arxiv

0+阅读 · 2022年2月7日

NeRV: Neural Representations for Videos

Arxiv

9+阅读 · 2021年10月26日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Visual Reinforcement Learning with Imagined Goals

Arxiv

8+阅读 · 2018年7月12日

ADMM-based Networked Stochastic Variational Inference

Arxiv

3+阅读 · 2018年2月27日

Experience-driven Networking: A Deep Reinforcement Learning based Approach

Arxiv

9+阅读 · 2018年1月17日

VIP会员

文章信息

相关主题

Buffer（公司）

相关VIP内容

Python编程基础，121页ppt

Python编程基础，121页ppt

专知会员服务

49+阅读 · 2021年1月1日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

人工智能赋能自主武器与人类控制第一部分：人类控制与机器学习的设计和开发 | 46页

军事指挥控制系统：2025年5种用途

人工智能赋能自主武器与人类控制第二部分：人类控制与军事指挥官 | 38页

相关资讯

已删除

将门创投

3+阅读 · 2019年4月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

PCENet: High Dimensional Deep Surrogate Modeling

Arxiv

0+阅读 · 2022年2月10日

The effective noise of Stochastic Gradient Descent

Arxiv

0+阅读 · 2022年2月10日

Batch size-invariance for policy optimization

Arxiv

0+阅读 · 2022年2月7日

NeRV: Neural Representations for Videos

Arxiv

9+阅读 · 2021年10月26日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Visual Reinforcement Learning with Imagined Goals

Arxiv

8+阅读 · 2018年7月12日

ADMM-based Networked Stochastic Variational Inference

Arxiv

3+阅读 · 2018年2月27日

Experience-driven Networking: A Deep Reinforcement Learning based Approach

Arxiv

9+阅读 · 2018年1月17日

微信扫码咨询专知VIP会员