多剂多试剂深强化学习(MADRL)符合多用户MIMO系统 (Multi-agent deep reinforcement learning (MADRL) meets multi-user MIMO systems)

from arxiv, Accepted for presentation at the IEEE GLOBECOM 2021, SAC, Machine Learning for Communications, December 7 - 11, in Madrid, Spain. @2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media

A multi-agent deep reinforcement learning (MADRL) is a promising approach to challenging problems in wireless environments involving multiple decision-makers (or actors) with high-dimensional continuous action space. In this paper, we present a MADRL-based approach that can jointly optimize precoders to achieve the outer-boundary, called pareto-boundary, of the achievable rate region for a multiple-input single-output (MISO) interference channel (IFC). In order to address two main challenges, namely, multiple actors (or agents) with partial observability and multi-dimensional continuous action space in MISO IFC setup, we adopt a multi-agent deep deterministic policy gradient (MA-DDPG) framework in which decentralized actors with partial observability can learn a multi-dimensional continuous policy in a centralized manner with the aid of shared critic with global information. Meanwhile, we will also address a phase ambiguity issue with the conventional complex baseband representation of signals widely used in radio communications. In order to mitigate the impact of phase ambiguity on training performance, we propose a training method, called phase ambiguity elimination (PAE), that leads to faster learning and better performance of MA-DDPG in wireless communication systems. The simulation results exhibit that MA-DDPG is capable of learning a near-optimal precoding strategy in a MISO IFC environment. To the best of our knowledge, this is the first work to demonstrate that the MA-DDPG framework can jointly optimize precoders to achieve the pareto-boundary of achievable rate region in a multi-cell multi-user multi-antenna system.

翻译：多剂深层强化学习(MADRL)是应对无线环境中挑战问题的有希望的办法,涉及多个决策者(或行为者),具有高维连续行动空间。在本文件中,我们提出了一个基于MADRL的多剂多度确定性政策梯度(MADRL)框架,其中部分可观测性分散的行为者可以集中学习多维持续政策,同时,我们还将处理一个阶段模糊问题,即无线电通信中广泛使用的信号的常规复杂基带代表问题。为了减轻阶段性模糊对培训业绩的影响,我们建议了一个培训方法,称为阶段性消除模糊性(PAE),在这个框架内,部分可观测性分散的行为者可以集中地学习多维持续政策,同时共享全球信息批评者的作用。同时,我们还将处理一个阶段性模糊问题,即阶段性基带代表在无线电通信中广泛使用的信号。为了减轻阶段性模糊性对培训业绩的影响,我们建议了一种培训方法,即分阶段消除不确定性(PAE),在这个框架中,一个在MADGMDFS公司之前能够更快地学习和更好地展示系统之前的模拟成果。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日