存储联网系统多机构强化学习 (Multi-Agent Reinforcement Learning in Stochastic Networked Systems)

We study multi-agent reinforcement learning (MARL) in a stochastic network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static, fixed and local, e.g., between neighbors in a fixed, time-invariant underlying graph. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and stochastic, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems.

翻译：我们在一个代理商的随机网络中研究多剂强化学习(MARL) 。目标是找到使(折扣的)全球奖励最大化的地方化政策。一般来说, 伸缩性是这一背景下的一个挑战, 因为全球州/ 行动空间的大小在代理商数量上可以指数化。只有在依赖性是静态的、固定的和局部的(例如,在固定的、时间变化性的底图中)邻居之间,我们才知道可缩放的算法。在这项工作中,我们提出了一个可缩放的行为者批评框架,该框架适用于依赖性可能是非本地的和随机的环境下,并提供一个限定时间的错误,表明趋同率如何取决于网络中信息传播的速度。此外,作为我们分析的副产品,我们获得了新的固定时间趋同结果,用于一般的随机近似和与国家汇总之间的时间差异学习,这在网络系统中的MARL设置之外。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日