A multi-agent deep reinforcement learning (MADRL) is a promising approach to challenging problems in wireless environments involving multiple decision-makers (or actors) with high-dimensional continuous action space. In this paper, we present a MADRL-based approach that can jointly optimize precoders to achieve the outer-boundary, called pareto-boundary, of the achievable rate region for a multiple-input single-output (MISO) interference channel (IFC). In order to address two main challenges, namely, multiple actors (or agents) with partial observability and multi-dimensional continuous action space in MISO IFC setup, we adopt a multi-agent deep deterministic policy gradient (MA-DDPG) framework in which decentralized actors with partial observability can learn a multi-dimensional continuous policy in a centralized manner with the aid of shared critic with global information. Meanwhile, we will also address a phase ambiguity issue with the conventional complex baseband representation of signals widely used in radio communications. In order to mitigate the impact of phase ambiguity on training performance, we propose a training method, called phase ambiguity elimination (PAE), that leads to faster learning and better performance of MA-DDPG in wireless communication systems. The simulation results exhibit that MA-DDPG is capable of learning a near-optimal precoding strategy in a MISO IFC environment. To the best of our knowledge, this is the first work to demonstrate that the MA-DDPG framework can jointly optimize precoders to achieve the pareto-boundary of achievable rate region in a multi-cell multi-user multi-antenna system.
翻译:多剂深层强化学习(MADRL)是应对无线环境中挑战问题的有希望的办法,涉及多个决策者(或行为者),具有高维连续行动空间。在本文件中,我们提出了一个基于MADRL的多剂多度确定性政策梯度(MADRL)框架,其中部分可观测性分散的行为者可以集中学习多维持续政策,同时,我们还将处理一个阶段模糊问题,即无线电通信中广泛使用的信号的常规复杂基带代表问题。为了减轻阶段性模糊对培训业绩的影响,我们建议了一个培训方法,称为阶段性消除模糊性(PAE),在这个框架内,部分可观测性分散的行为者可以集中地学习多维持续政策,同时共享全球信息批评者的作用。同时,我们还将处理一个阶段性模糊问题,即阶段性基带代表在无线电通信中广泛使用的信号。为了减轻阶段性模糊性对培训业绩的影响,我们建议了一种培训方法,即分阶段消除不确定性(PAE),在这个框架中,一个在MADGMDFS公司之前能够更快地学习和更好地展示系统之前的模拟成果。