Millimeter wave (mmWave) and terahertz MIMO systems rely on pre-defined beamforming codebooks for both initial access and data transmission. Being pre-defined, however, these codebooks are commonly not optimized for specific environments, user distributions, and/or possible hardware impairments. This leads to large codebook sizes with high beam training overhead which increases the initial access/tracking latency and makes it hard for these systems to support highly mobile applications. To overcome these limitations, this paper develops a deep reinforcement learning framework that learns how to iteratively optimize the codebook beam patterns (shapes) relying only on the receive power measurements and without requiring any explicit channel knowledge. The developed model learns how to autonomously adapt the beam patterns to best match the surrounding environment, user distribution, hardware impairments, and array geometry. Further, this approach does not require any knowledge about the channel, array geometry, RF hardware, or user positions. To reduce the learning time, the proposed model designs a novel Wolpertinger-variant architecture that is capable of efficiently searching for an optimal policy in a large discrete action space, which is important for large antenna arrays with quantized phase shifters. This complex-valued neural network architecture design respects the practical RF hardware constraints such as the constant-modulus and quantized phase shifter constraints. Simulation results based on the publicly available DeepMIMO dataset confirm the ability of the developed framework to learn near-optimal beam patterns for both line-of-sight (LOS) and non-LOS scenarios and for arrays with hardware impairments without requiring any channel knowledge.
翻译:用于初始存取和数据传输的预设波(mmWave) 和 terahertz MIMO 系统依赖于预设的波形成型代码手册。 然而,这些代码手册通常不是用于特定环境、用户分布和/或可能的硬件缺陷的优化。 这导致大量代码手册的大小,而高梁培训管理管理器,增加了初始存取/跟踪的延迟度,使这些系统难以支持高度移动的应用。为了克服这些限制,本文件开发了一个深层强化学习框架,以学习如何迭代优化代码书的波形模式(shape),仅依靠接收电源模式测量,而不需要任何明确的频道知识知识。 开发的模型学会如何自主调整光束模式,以最符合周围的环境、用户分布、硬件损坏和阵列的测量。此外,这一方法并不需要任何关于频道、阵列几何地测量、RF硬件硬件或用户位置的知识。 为了缩短学习时间,拟议模型设计了一个新型的沃尔伯变形结构架构,可以不需在离线电路机的深度能力度阵列阵列阵列阵列中快速搜索最精确的系统能力度变换变换的系统,而需要大规模的硬化的硬阵列的阵列的阵列的阵列的阵列的阵列。