3DPG:网络化多机构系统分布式深确定性政策梯度梯度定数 (3DPG: Distributed Deep Deterministic Policy Gradient Algorithms for Networked Multi-Agent Systems) - 专知论文

会员服务 ·

0

INFORMS · 确定性策略 · Networking · Agent · Markov ·

2022 年 11 月 2 日

3DPG: Distributed Deep Deterministic Policy Gradient Algorithms for Networked Multi-Agent Systems

翻译：3DPG:网络化多机构系统分布式深确定性政策梯度梯度定数

Adrian Redder,Arunselvan Ramaswamy,Holger Karl

We present Distributed Deep Deterministic Policy Gradient (3DPG), a multi-agent actor-critic (MAAC) algorithm for Markov games. Unlike previous MAAC algorithms, 3DPG is fully distributed during both training and deployment. 3DPG agents calculate local policy gradients based on the most recently available local data (states, actions) and local policies of other agents. During training, this information is exchanged using a potentially lossy and delaying communication network. The network therefore induces Age of Information (AoI) for data and policies. We prove the asymptotic convergence of 3DPG even in the presence of potentially unbounded Age of Information (AoI). This provides an important step towards practical online and distributed multi-agent learning since 3DPG does not assume information to be available deterministically. We analyze 3DPG in the presence of policy and data transfer under mild practical assumptions. Our analysis shows that 3DPG agents converge to a local Nash equilibrium of Markov games in terms of utility functions expressed as the expected value of the agents local approximate action-value functions (Q-functions). The expectations of the local Q-functions are with respect to limiting distributions over the global state-action space shaped by the agents' accumulated local experiences. Our results also shed light on the policies obtained by general MAAC algorithms. We show through a heuristic argument and numerical experiments that 3DPG improves convergence over previous MAAC algorithms that use old actions instead of old policies during training. Further, we show that 3DPG is robust to AoI; it learns competitive policies even with large AoI and low data availability.

翻译：3DPG 代理商根据最近可获得的地方数据(国家、行动)和其他地方代理商的地方政策计算地方政策梯度。在培训过程中,这种信息交流使用潜在损失和延迟的通信网络进行。因此,网络为数据和政策带来信息时代(AoI),我们证明3DPG 即使在可能不受约束的信息时代(AoI)存在的情况下,也无济于事。这提供了在培训和部署期间充分分布的3DPG 算法。3DPG 代理商根据最近可获得的地方数据(国家、行动)和其他地方代理商的地方政策计算地方政策梯度。我们的分析表明,3DPG 代理商通过可能丢失和延误的通信网络,在数据和政策方面,为信息时代(AoI)带来了信息时代(Ao)的不协调。我们证明,3DPG 即使在可能不受约束的信息时代约束的信息时代(AoI ) 也为实用的在线和分布提供了重要的多机构学习步骤。我们通过以往的A-DA-DA 分析工具来改变老的老的老的动作。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

DCD-Nck在大肠癌侵袭转移中的作用及蛇葡萄素的抑制机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

hNSCs在MS中对CD4+T细胞DNA甲基化和免疫调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

ASICs在肿瘤酸化微环境中对MDSCs抑制免疫活性的影响及其机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Markov状态转换下的跳扩散风险理论的新模型与新算法

国家自然科学基金

1+阅读 · 2012年12月31日

受时变对流扩散方程约束的最优控制问题的SUPG方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗炎细胞因子-IL-37在动脉粥样硬化中的作用和机制

国家自然科学基金

0+阅读 · 2012年12月31日

下一代互联网DDoS防御关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

微量Zr、Mg等在Cu-Cr-Zr铜合金时效过程中的作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

RGC-32参与TGF-β#35825;导肾小管上皮向间充质细胞转化的分子调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

Networked Federated Learning

Arxiv

0+阅读 · 2022年12月23日

Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年12月22日

Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems

Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems

Arxiv

0+阅读 · 2022年12月22日

Local Policy Improvement for Recommender Systems

Arxiv

0+阅读 · 2022年12月22日

Aggregate Markov models in life insurance: estimation via the EM algorithm

Arxiv

0+阅读 · 2022年12月20日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

确定性策略

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Networked Federated Learning

Arxiv

0+阅读 · 2022年12月23日

Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年12月22日

Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems

Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems

Arxiv

0+阅读 · 2022年12月22日

Local Policy Improvement for Recommender Systems

Arxiv

0+阅读 · 2022年12月22日

Aggregate Markov models in life insurance: estimation via the EM algorithm

Arxiv

0+阅读 · 2022年12月20日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

DCD-Nck在大肠癌侵袭转移中的作用及蛇葡萄素的抑制机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

hNSCs在MS中对CD4+T细胞DNA甲基化和免疫调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

ASICs在肿瘤酸化微环境中对MDSCs抑制免疫活性的影响及其机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Markov状态转换下的跳扩散风险理论的新模型与新算法

国家自然科学基金

1+阅读 · 2012年12月31日

受时变对流扩散方程约束的最优控制问题的SUPG方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗炎细胞因子-IL-37在动脉粥样硬化中的作用和机制

国家自然科学基金

0+阅读 · 2012年12月31日

下一代互联网DDoS防御关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

微量Zr、Mg等在Cu-Cr-Zr铜合金时效过程中的作用机理

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

RGC-32参与TGF-β#35825;导肾小管上皮向间充质细胞转化的分子调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员