In virtualized radio access network (vRAN), the base station (BS) functions are decomposed into virtualized components that can be hosted at the centralized unit or distributed units through functional splits. Such flexibility has many benefits; however, it also requires solving the problem of finding the optimal splits of functions of the BSs in such a way that minimizes the total network cost. The underlying vRAN system is complex and precise modelling of it is not trivial. Formulating the functional split problem to minimize the cost results in a combinatorial problem that is provably NP-hard, and solving it is computationally expensive. In this paper, a constrained deep reinforcement learning (RL) approach is proposed to solve the problem with minimal assumptions about the underlying system. Since in deep RL, the action selection is the outcome of inference of a neural network, it can be done in real-time while training to update the neural networks can be done in the background. However, since the problem is combinatorial, even for a small number of functions, the action space of the RL problem becomes large. Therefore, to deal with such a large action space, a chain rule-based stochastic policy is exploited in which a long short-term memory (LSTM) network-based sequence-to-sequence model is applied to estimate the policy that is selecting the functional split actions. However, the utilized policy is still limited to an unconstrained problem, and each split decision is bounded by vRAN's constraint requirements. Hence, a constrained policy gradient method is leveraged to train and guide the policy toward constraint satisfaction. Further, a search strategy by greedy decoding or temperature sampling is utilized to improve the optimality performance at the test time. Simulations are performed to evaluate the performance of the proposed solution using synthetic and real network datasets.
翻译:在虚拟无线电访问网络(vRAN)中,基站功能被分解成虚拟化组件,这些组件可以通过功能分割托管到中央单位或分布单位。这种灵活性有许多好处;然而,它还需要解决找到最优分解 BSS 功能的问题,从而将网络总成本降到最低。基础 vRAN 系统是复杂的,其精确的建模并非微不足道。提出功能分解问题,以尽量减少在可察觉的NP-硬化组合问题中的成本结果,并解决它的成本成本昂贵。在本文件中,提出了有限的深度强化功能学习(RL)方法,以基本系统的最低假设解决问题。由于在深度的RLL中,行动选择是神经网络的推断结果,而更新神经网络的训练也可以在背景中完成。然而,由于问题在于组合问题,即使对少量的功能进行限制,RLL的动作空间也会在计算上变得很昂贵。因此,在使用大型的节能强化的节能满足要求,因此,在使用高级的网络中,使用一个高级的节能的策略是使用一个高级的动作,在使用一个高级的节能的节能操作的节能,在使用一个节能的策略上进行。