Autonomous driving has witnessed incredible advances in the past several decades, while Multi-Agent Reinforcement Learning (MARL) promises to satisfy the essential need of autonomous vehicle control in a wireless connected vehicle networks. In MARL, how to effectively decompose a global feedback into the relative contributions of individual agents belongs to one of the most fundamental problems. However, the environment volatility due to vehicle movement and wireless disturbance could significantly shape time-varying topological relationships among agents, thus making the Value Decomposition (VD) challenging. Therefore, in order to cope with this annoying volatility, it becomes imperative to design a dynamic VD framework. Hence, in this paper, we propose a novel Stochastic VMIX (SVMIX) methodology by taking account of dynamic topological features during the VD and incorporating the corresponding components into a multi-agent actor-critic architecture. In particular, Stochastic Graph Neural Network (SGNN) is leveraged to effectively capture underlying dynamics in topological features and improve the flexibility of VD against the environment volatility. Finally, the superiority of SVMIX is verified through extensive simulations.
翻译:自动驾驶在过去几十年取得了惊人的进展,而多智能体强化学习 (MARL) 承诺在无线连接的车辆网络中满足自主车辆控制的基本需求。在 MARL 中,如何有效地将全局反馈分解为各个个体代理的相对贡献,是其中最基本的问题之一。然而,由于车辆移动和无线干扰等因素带来的环境波动可能会显着影响个体代理之间的时变拓扑关系,从而使价值分解变得具有挑战性。因此,为了应对这种令人烦恼的波动,有必要设计一个动态的价值分解框架。因此,在本文中,我们提出了一种新颖的基于随机 VMIX (SVMIX) 方法,通过考虑拓扑特征的动态变化情况,在多智能体 actor-critic 架构中结合对应的组件来实现价值分解。具体而言,利用随机图神经网络 (SGNN) 来有效地捕捉拓扑特征中的底层动态变化,并提高价值分解对于环境波动的灵活性。最后,通过广泛的模拟验证 SVMIX 的优越性。