Diff-DAC: 平均多任务深层强化学习分布式行为者-批评 (Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning) - 专知论文

会员服务 ·

0

学成 · 强化学习 · 深度强化学习 · 近似 · INFORMS ·

2018 年 4 月 22 日

Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning

翻译：Diff-DAC: 平均多任务深层强化学习分布式行为者-批评

Sergio Valcarcel Macua,Aleksi Tukiainen,Daniel García-Ocaña Hernández,David Baldazo,Enrique Munoz de Cote,Santiago Zazo

We propose a fully distributed actor-critic algorithm approximated by deep neural networks, named \textit{Diff-DAC}, with application to single-task and to average multitask reinforcement learning (MRL). Each agent has access to data from its local task only, but it aims to learn a policy that performs well on average for the whole set of tasks. During the learning process, agents communicate their value-policy parameters to their neighbors, diffusing the information across the network, so that they converge to a common policy, with no need for a central node. The method is scalable, since the computational and communication costs per agent grow with its number of neighbors. We derive Diff-DAC's from duality theory and provide novel insights into the standard actor-critic framework, showing that it is actually an instance of the dual ascent method that approximates the solution of a linear program. Experiments suggest that Diff-DAC can outperform the single previous distributed MRL approach (i.e., Dist-MTLPS) and even the centralized architecture.

翻译：我们提出一个完全分布式的由深层神经网络(称为\textit{Diff-DAC})所近似于的行为者-批评算法,该算法应用到单任务和平均多任务强化学习(MRL)中。每个代理商只能从当地任务中获取数据,但该算法的目的是学习一个在全套任务中平均运行良好的政策。在学习过程中,代理商向邻居传达其价值政策参数,在整个网络中传播信息,以便他们聚集到一个共同的政策中,不需要一个中心节点。这个方法可以推广,因为每个代理商的计算和通信费用随着其邻居人数的增加而增加。我们从双重理论中得出Diff-DAC的理论,并对标准的行为者-批评框架提供新的洞见,表明它实际上是接近线性方案解决方案的双倍分法的例子。实验表明,Diff-DAC可以超越以往单一的分布式MRL方法(即DST-MT-MTLPS),甚至中央结构。

4

相关内容

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

End-to-End Video Captioning with Multitask Reinforcement Learning

End-to-End Video Captioning with Multitask Reinforcement Learning

Arxiv

3+阅读 · 2019年1月1日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

FuzzerGym: A Competitive Framework for Fuzzing and Learning

FuzzerGym: A Competitive Framework for Fuzzing and Learning

Arxiv

4+阅读 · 2018年7月19日

Online Deep Metric Learning

Arxiv

8+阅读 · 2018年5月15日

Learning Rich Features for Image Manipulation Detection

Arxiv

9+阅读 · 2018年5月13日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

深度强化学习

相关VIP内容

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

《美陆军近战整合企业现代化计划（2025—2026）》最新报告

以色列-伊朗空战：短暂而激烈冲突的启示

《动态作战支援演习框架构建》80页

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

End-to-End Video Captioning with Multitask Reinforcement Learning

End-to-End Video Captioning with Multitask Reinforcement Learning

Arxiv

3+阅读 · 2019年1月1日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

FuzzerGym: A Competitive Framework for Fuzzing and Learning

FuzzerGym: A Competitive Framework for Fuzzing and Learning

Arxiv

4+阅读 · 2018年7月19日

Online Deep Metric Learning

Arxiv

8+阅读 · 2018年5月15日

Learning Rich Features for Image Manipulation Detection

Arxiv

9+阅读 · 2018年5月13日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员