多机构联网系统可缩放强化学习 (Scalable Reinforcement Learning for Multi-Agent Networked Systems)

from arxiv, Accepted to Operations Research. Conference version appeared in 2nd Learning for Dynamics and Control Conference with title "Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems". This journal version includes more examples, discussions and simulations

We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor Critic (SAC) framework that exploits the network structure and finds a localized policy that is an $O(\rho^{\kappa})$-approximation of a stationary point of the objective for some $\rho\in(0,1)$, with complexity that scales with the local state-action space size of the largest $\kappa$-hop neighborhood of the network. We illustrate our model and approach using examples from wireless communication, epidemics and traffic.

翻译：我们研究的是强化学习(RL), 在一个由州和行动以当地方式相互作用的代理商组成的网络环境中,其目标是找到本地化政策,从而最大限度地实现(折扣的)全球奖励。在这一背景下,一个根本性的挑战就是州-行动空间规模在代理商数量上成倍增长,使问题难以解决到大型网络中。在本文件中,我们提出一个可扩展的行为者批评(SAC)框架,利用网络结构,并找到一种本地化政策,即以$(rhoä ⁇ kappa}($$)为单位,实现约0.1美元的目标固定点,该比例与网络中最大的州-行动空间规模($\kappa-hop)相邻的当地州-行动空间规模十分复杂。我们用无线通信、流行病和交通的例子来说明我们的模型和办法。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

专知会员服务

38+阅读 · 2020年6月3日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日