In wireless communication systems, efficient and adaptive resource allocation plays a crucial role in enhancing overall Quality of Service (QoS). While centralized Multi-Agent Reinforcement Learning (MARL) frameworks rely on a central coordinator for policy training and resource scheduling, they suffer from scalability issues and privacy risks. In contrast, the Distributed Training with Decentralized Execution (DTDE) paradigm enables distributed learning and decision-making, but it struggles with non-stationarity and limited inter-agent cooperation, which can severely degrade system performance. To overcome these challenges, we propose the Multi-Agent Conditional Diffusion Model Planner (MA-CDMP) for decentralized communication resource management. Built upon the Model-Based Reinforcement Learning (MBRL) paradigm, MA-CDMP employs Diffusion Models (DMs) to capture environment dynamics and plan future trajectories, while an inverse dynamics model guides action generation, thereby alleviating the sample inefficiency and slow convergence of conventional DTDE methods. Moreover, to approximate large-scale agent interactions, a Mean-Field (MF) mechanism is introduced as an assistance to the classifier in DMs. This design mitigates inter-agent non-stationarity and enhances cooperation with minimal communication overhead in distributed settings. We further theoretically establish an upper bound on the distributional approximation error introduced by the MF-based diffusion generation, guaranteeing convergence stability and reliable modeling of multi-agent stochastic dynamics. Extensive experiments demonstrate that MA-CDMP consistently outperforms existing MARL baselines in terms of average reward and QoS metrics, showcasing its scalability and practicality for real-world wireless network optimization.
翻译:在无线通信系统中,高效且自适应的资源分配对于提升整体服务质量(QoS)至关重要。尽管集中式多智能体强化学习(MARL)框架依赖中央协调器进行策略训练与资源调度,但其存在可扩展性不足和隐私风险等问题。相比之下,分布式训练与去中心化执行(DTDE)范式支持分布式学习与决策,但面临非平稳性和智能体间协作有限等挑战,这可能严重降低系统性能。为克服这些难题,本文提出用于去中心化通信资源管理的多智能体条件扩散模型规划器(MA-CDMP)。该模型基于模型强化学习(MBRL)范式构建,利用扩散模型(DMs)捕捉环境动态并规划未来轨迹,同时通过逆动力学模型指导动作生成,从而缓解传统DTDE方法的样本效率低下与收敛缓慢问题。此外,为近似大规模智能体交互,我们在DMs的分类器中引入平均场(MF)机制作为辅助。该设计在分布式环境下以最小通信开销缓解了智能体间的非平稳性并增强了协作能力。我们进一步从理论上建立了基于MF的扩散生成所引入的分布近似误差上界,保证了多智能体随机动力学建模的收敛稳定性与可靠性。大量实验表明,MA-CDMP在平均奖励与QoS指标上持续优于现有MARL基线方法,展现了其在现实无线网络优化场景中的可扩展性与实用性。