通过多代理多任务强化学习为基于排的C-V2X网络分配AoI-Aware资源 (AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via Multi-Agent Multi-Task Reinforcement Learning)

This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system. Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers while ensuring timely delivery of safety-critical messages to the Road-Side Unit (RSU). Due to the challenges of dynamic channel conditions, centralized resource management schemes that require global information are inefficient and lead to large signaling overheads. Hence, we exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy. Existing MARL algorithms consider a holistic reward function for the group's collective success, which often ends up with unsatisfactory results and cannot guarantee an optimal policy for each agent. Consequently, motivated by the existing literature in RL, we propose a novel MARL framework that trains two critics with the following goals: A global critic which estimates the global expected reward and motivates the agents toward a cooperating behavior and an exclusive local critic for each agent that estimates the local individual reward. Furthermore, based on the tasks each agent has to accomplish, the individual reward of each agent is decomposed into multiple sub-reward functions where task-wise value functions are learned separately. Numerical results indicate our proposed algorithm's effectiveness compared with the conventional RL methods applied in this area.

翻译：本文调查信息年龄问题(AoI)了解排排系统无线电资源管理的信息年龄问题。多自治排利用蜂窝无线汽车到便携(C-V2X)通信技术向追随者传播合作意识信息(CAMs),同时确保及时向道路搜索股(RSU)传递安全关键信息。由于动态频道条件的挑战,需要全球信息的中央资源管理计划效率低下,导致大型间接信号。因此,我们利用基于多剂强化学习(MARL)的分布式资源分配框架(MARL),让每个排领导(PL)作为代理人,与环境互动,学习其最佳政策。现有的MARL算法认为,该团体的集体成功是一个整体的奖励功能,其最终结果往往不能令人满意,无法保证每个代理人的最佳政策。因此,根据RL的现有文献,我们提议了一个全新的MARL框架,向两个批评者提供培训,其目标如下:一个全球评论器估计全球预期的奖赏,激励这些代理人走向合作行为,并与环境互动,以学习其最佳政策互动性地方评分,每个代理人的每个代理人的评分。