与有限时间分析有关的传播效率分析样本和 (Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis) - 专知论文

会员服务 ·

0

样本 · 样本复杂度 · INFORMS · state-of-the-art · SGD ·

2021 年 9 月 8 日

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

翻译：与有限时间分析有关的传播效率分析样本和

Ziyi Chen,Yi Zhou,Rongrong Chen,Shaofeng Zou

from arxiv, 40 pages, 2 figures

Actor-critic (AC) algorithms have been widely adopted in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either do not preserve the privacy of agents or are not sample and communication-efficient. In this work, we develop two decentralized AC and natural AC (NAC) algorithms that are private, and sample and communication-efficient. In both algorithms, agents share noisy information to preserve privacy and adopt mini-batch updates to improve sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities $\mathcal{O}\big(\epsilon^{-2}\ln(\epsilon^{-1})\big)$ and $\mathcal{O}\big(\epsilon^{-3}\ln(\epsilon^{-1})\big)$, respectively, and the same small communication complexity $\mathcal{O}\big(\epsilon^{-1}\ln(\epsilon^{-1})\big)$. Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithm.

翻译：在分散的多试剂系统中广泛采用Acor-critic(AC)算法,以学习最佳的联合控制政策。然而,现有的分散的AC算法要么不保护代理人的隐私,要么没有抽样和通信效率。在这项工作中,我们开发了两种分散的AC和自然AC(NAC)算法,它们是私人的,以及抽样和通信效率。在这两种算法中,代理共享噪音信息以维护隐私,并采用小型批量更新以提高抽样和通信效率。特别是在分散的NAC中,我们开发了一种分散的Markovian SGD算法,具有适应性微型批量大小,以高效地计算自然政策梯度。在Markovian抽样和线性功能接近下,我们证明拟议的分散的AC和自然AC(NAC)算法实现了最先进的样本复杂性$\macal{O ⁇ big(\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

0

相关内容

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

专知会员服务

48+阅读 · 2021年10月26日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】矩阵流形优化算法，237页pdf，普林斯顿大学出版社

【经典书】矩阵流形优化算法，237页pdf，普林斯顿大学出版社

专知会员服务

115+阅读 · 2021年3月3日

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【斯坦福大学博士论文】大规模和高维统计学习方法和算法，147页pdf， Large-scale and high-dimensional statistical learning methods and algorithms

专知会员服务

26+阅读 · 2020年6月13日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

4+阅读 · 2017年12月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Decentralized Personalized Federated Min-Max Problems

Arxiv

0+阅读 · 2021年11月1日

FeO2: Federated Learning with Opt-Out Differential Privacy

Arxiv

0+阅读 · 2021年10月28日

FedDR -- Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization

Arxiv

0+阅读 · 2021年10月28日

(Almost) Free Incentivized Exploration from Decentralized Learning Agents

(Almost) Free Incentivized Exploration from Decentralized Learning Agents

Arxiv

0+阅读 · 2021年10月27日

V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL

Arxiv

0+阅读 · 2021年10月27日

Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement

Arxiv

0+阅读 · 2021年10月26日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

样本复杂度

state-of-the-art

相关VIP内容

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

【Google-Marco Cuturi】最优传输，339页ppt，Optimal Transport

专知会员服务

48+阅读 · 2021年10月26日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】矩阵流形优化算法，237页pdf，普林斯顿大学出版社

【经典书】矩阵流形优化算法，237页pdf，普林斯顿大学出版社

专知会员服务

115+阅读 · 2021年3月3日

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【斯坦福大学博士论文】大规模和高维统计学习方法和算法，147页pdf， Large-scale and high-dimensional statistical learning methods and algorithms

专知会员服务

26+阅读 · 2020年6月13日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

4+阅读 · 2017年12月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Decentralized Personalized Federated Min-Max Problems

Arxiv

0+阅读 · 2021年11月1日

FeO2: Federated Learning with Opt-Out Differential Privacy

Arxiv

0+阅读 · 2021年10月28日

FedDR -- Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization

Arxiv

0+阅读 · 2021年10月28日

(Almost) Free Incentivized Exploration from Decentralized Learning Agents

(Almost) Free Incentivized Exploration from Decentralized Learning Agents

Arxiv

0+阅读 · 2021年10月27日

V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL

Arxiv

0+阅读 · 2021年10月27日

Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement

Arxiv

0+阅读 · 2021年10月26日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员