多层Terrahertz通信中优化光束培训强化学习 (Reinforcement Learning for Optimized Beam Training in Multi-Hop Terahertz Communications)

Communication at terahertz (THz) frequency bands is a promising solution for achieving extremely high data rates in next-generation wireless networks. While the THz communication is conventionally envisioned for short-range wireless applications due to the high atmospheric absorption at THz frequencies, multi-hop directional transmissions can be enabled to extend the communication range. However, to realize multi-hop THz communications, conventional beam training schemes, such as exhaustive search or hierarchical methods with a fixed number of training levels, can lead to a very large time overhead. To address this challenge, in this paper, a novel hierarchical beam training scheme with dynamic training levels is proposed to optimize the performance of multi-hop THz links. In fact, an optimization problem is formulated to maximize the overall spectral efficiency of the multi-hop THz link by dynamically and jointly selecting the number of beam training levels across all the constituent single-hop links. To solve this problem in presence of unknown channel state information, noise, and path loss, a new reinforcement learning solution based on the multi-armed bandit (MAB) is developed. Simulation results show the fast convergence of the proposed scheme in presence of random channels and noise. The results also show that the proposed scheme can yield up to 75% performance gain, in terms of spectral efficiency, compared to the conventional hierarchical beam training with a fixed number of training levels.

翻译：在Thahertz(Thz)频带进行通信,是实现下一代无线网络极高数据率的极高数据率的一个有希望的解决办法。虽然由于Thz频率高大气吸收,Thz通信是传统设想的短距离无线应用Thz通信,但多光点方向传输可以扩展通信范围。然而,为了实现多光点Thaz通信,传统波束培训计划,例如采用固定数量的培训水平的彻底搜索或等级方法,可以导致大量时间管理。为了应对这一挑战,本文件提议了一个具有动态培训水平的新颖的级波束培训计划,以优化多光点Thz链接的性能。事实上,为了最大限度地提高多光点Thaz链接的总体光谱效率,可以形成一个优化问题,通过动态和联合选择所有组成单点链接的波段培训水平。在频道信息、噪音和路径丢失的情况下解决这一问题,一个基于多臂波段(MAB)的新的强化学习解决方案。模拟结果显示,拟议的75位阶级培训计划将快速整合,同时显示常规水平计划的结果将展示为随机水平。