神经网络兼容非政策性自然行为者 -- -- 竞争算法 (Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm) - 专知论文

会员服务 ·

0

优化器 · Neural Networks · 学成 · Networking · 近似 ·

2021 年 10 月 19 日

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

翻译：神经网络兼容非政策性自然行为者 -- -- 竞争算法

Raghuram Bharadwaj Diddigi,Prateek Jain,Prabuchandran K. J.,Shalabh Bhatnagar

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data obtained from the given policy (known as the behavior policy). As the optimal policy can be very different from the behavior policy, learning optimal behavior is very hard in the "off-policy" setting compared to the "on-policy" setting where new data from the policy updates will be utilized in learning. This work proposes an off-policy natural actor-critic algorithm that utilizes state-action distribution correction for handling the off-policy behavior and the natural policy gradient for sample efficiency. The existing natural gradient-based actor-critic algorithms with convergence guarantees require fixed features for approximating both policy and value functions. This often leads to sub-optimal learning in many RL applications. On the other hand, our proposed algorithm utilizes compatible features that enable one to use arbitrary neural networks to approximate the policy and the value function and guarantee convergence to a locally optimal policy. We illustrate the benefit of the proposed off-policy natural gradient algorithm by comparing it with the vanilla gradient actor-critic algorithm on benchmark RL tasks.

翻译：从现有数据中学习最佳行为是加强学习中最重要的问题之一。这项工作在《强化学习》中被称为“ 脱政策控制”, 代理商的目标是根据从特定政策获得的数据( 称为行为政策) 计算最佳政策。由于最佳政策可能与行为政策大不相同, 与“ 脱政策” 设置相比, “ 脱政策” 设置学习最佳行为非常困难, 政策更新中的新数据将被用于学习。这项工作建议采用一种非政策性自然行为者- 批评算法, 利用国家行动分配校正处理离政策行为和自然政策梯度, 以抽样效率为目的。现有的基于梯度的行为者- 批评算法, 与趋同的保证要求与政策和价值功能相近的固定特征。这往往导致在许多 RL 应用程序中进行次优化学习。另一方面, 我们提议的算法使用兼容的特性, 使得人们能够使用任意的神经网络来比较政策和价值功能, 并保证与当地最佳政策趋同。我们用拟议的越轨式矩阵来比较拟议的越轨性趋势。

0

相关内容

优化器

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

专知会员服务

42+阅读 · 2021年1月11日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【综述论文】A Survey on Dynamic Network Embedding，动态网络嵌入综述论文

【综述论文】A Survey on Dynamic Network Embedding，动态网络嵌入综述论文

专知会员服务

101+阅读 · 2020年6月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Greedy-Step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年12月15日

Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning

Arxiv

0+阅读 · 2021年12月15日

A Fully Single Loop Algorithm for Bilevel Optimization without Hessian Inverse

Arxiv

0+阅读 · 2021年12月10日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

dynnode2vec: Scalable Dynamic Network Embedding

dynnode2vec: Scalable Dynamic Network Embedding

Arxiv

14+阅读 · 2018年12月6日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Scalable attribute-aware network embedding with locality

Arxiv

4+阅读 · 2018年4月30日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

专知会员服务

42+阅读 · 2021年1月11日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【综述论文】A Survey on Dynamic Network Embedding，动态网络嵌入综述论文

【综述论文】A Survey on Dynamic Network Embedding，动态网络嵌入综述论文

专知会员服务

101+阅读 · 2020年6月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Greedy-Step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年12月15日

Automatic tuning of hyper-parameters of reinforcement learning algorithms using Bayesian optimization with behavioral cloning

Arxiv

0+阅读 · 2021年12月15日

A Fully Single Loop Algorithm for Bilevel Optimization without Hessian Inverse

Arxiv

0+阅读 · 2021年12月10日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

dynnode2vec: Scalable Dynamic Network Embedding

dynnode2vec: Scalable Dynamic Network Embedding

Arxiv

14+阅读 · 2018年12月6日

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing

Arxiv

4+阅读 · 2018年11月13日

Scalable attribute-aware network embedding with locality

Arxiv

4+阅读 · 2018年4月30日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员