DRL STAR-RIS协助网络的DRL 启用覆盖率和能力优化 (DRL Enabled Coverage and Capacity Optimization in STAR-RIS Assisted Networks)

Simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) is a promising passive device that contributes to a full-space coverage via transmitting and reflecting the incident signal simultaneously. As a new paradigm in wireless communications, how to analyze the coverage and capacity performance of STAR-RISs becomes essential but challenging. To solve the coverage and capacity optimization (CCO) problem in STAR-RIS assisted networks, a multi-objective proximal policy optimization (MO-PPO) algorithm is proposed to handle long-term benefits than conventional optimization algorithms. To strike a balance between each objective, the MO-PPO algorithm provides a set of optimal solutions to form a Pareto front (PF), where any solution on the PF is regarded as an optimal result. Moreover, in order to improve the performance of the MO-PPO algorithm, two update strategies, i.e., action-value-based update strategy (AVUS) and loss function-based update strategy (LFUS), are investigated. For the AVUS, the improved point is to integrate the action values of both coverage and capacity and then update the loss function. For the LFUS, the improved point is only to assign dynamic weights for both loss functions of coverage and capacity, while the weights are calculated by a min-norm solver at every update. The numerical results demonstrated that the investigated update strategies outperform the fixed weights MO optimization algorithms in different cases, which includes a different number of sample grids, the number of STAR-RISs, the number of elements in the STAR-RISs, and the size of STAR-RISs. Additionally, the STAR-RIS assisted networks achieve better performance than conventional wireless networks without STAR-RISs. Moreover, with the same bandwidth, millimeter wave is able to provide higher capacity than sub-6 GHz, but at a cost of smaller coverage.

翻译：同步传输和反映可重新配置的智能表面(STAR-RIS)是一个很有希望的被动装置,它通过传输和同时反映事件信号,促进整个空间覆盖,同时反映事件信号。作为无线通信的新范例,如何分析STAR-RIS的覆盖范围和能力性能变得必要但具有挑战性。为了解决STAR-RIS协助的网络中的覆盖面和能力优化问题,提议了一个多目标准目标政策优化算法(MO-PPPO)比常规优化算法(LFUS)更能处理长期效益。为了平衡每个目标,MO-PO算法提供了一套最佳解决方案,以形成Pareto Front(PF),其中任何解决方案都被视为最佳结果。此外,为了改善MO-PO-RIS算法的性能和能力,即基于行动价值的更新战略(AVUS)和基于损失函数更新战略(LFUS)的数值,改进了S-RIS的动作值值值值更新功能,而SLF-RIS的S-RIS值更新能力则根据S-RRIS的每一个已计算出的S-RIS值更新的Stor值值值值值值, 的S-RIS值更新, 的S-RIS值的S-ralal 更新,在S-resental 的S-ralal 更新的值值中,在S-ral 的计算,在S-resental 更新中,在S-resental 的计算中,在S-resental 更新中,在S-res算算法中,在S-resental 的计算中,在一次中,在S的值中,在S-res 的每一次的计算的计算到的计算到的精度能力中,在S-ral 的结果中,在S-ral 上,在S-ralal 的精度上值的精度值中,只有程中,在SLILA值中,在SLIS的精量值中,在S-lal的值值值值值值值的值值值值值值值值值值值值值值值值值值值值值值值值值值值上,在S-l的值的值的值的