Virtualized Radio Access Networks (vRANs) are fully configurable and can be implemented at low cost over commodity platforms offering unprecedented network management flexibility. In this paper, a novel vRAN reconfiguration problem with a deep reinforcement learning (RL)-based framework is proposed that jointly reconfigures the functional splits of the base stations (BSs), resources and locations of the virtualized central units (vCUs) and distributed units (vDUs), and the routing for each BS flow. The objective is to minimize the long-term total network operation cost while adapting to traffic demands and resource availability. Testbed measurements are performed to study the relations between traffic demands and computing resource utilization, which reveal that their relations have high variance and dependence on platform and platform load. Hence, acquiring the perfect model of the underlying system is non-trivial. A comprehensive cost function is formulated that considers resource overprovisioning, instantiation and reconfiguration and the declined demands, where such impacts urge to perform the reconfigurations prudently. Motivated by these insights, our solution framework is developed using model-free RL. Since the formulated RL problem has a semi-continuous state and discrete action space, Dueling Double Deep Q-network (D3QN)-based approach is proposed. However, the system consists of multiple BSs with highly coupled configurations sharing the same resources, which renders a multi-dimensional discrete action space and drives combinatorial growth of the number of possible actions. To overcome the curse of dimensionality, action branching, an action decomposition method with a shared decision module followed by neural network branches, is tailored with D3QN. Further, simulations are performed using O-RAN compliant model and real traces of the testbed.
翻译:虚拟虚拟无线电接入网络(VRANs)是完全可配置的,并且可以在成本低廉的商品平台上实施,提供前所未有的网络管理灵活性。在本文中,提出一个具有深层强化学习(RL)基础框架的新型 VRAN重组问题,即联合重组基站的功能分割、虚拟中央单位(VCUs)和分布单元(VDUs)的资源和地点以及每个BS流流的路线。目标是在适应交通需求和资源可用性的同时,尽可能降低长期的网络运行总成本。测试测量是为了研究交通需求与计算资源利用之间的关系,这表明它们的关系在平台和平台负荷上有很大差异和依赖。因此,获得基础系统的完美模型是非边际化的。 设计了一个综合成本功能,考虑资源过度供给、即时速和配置以及下降的需求,从而需要谨慎地进行重新配置。根据这些理解,我们开发的解决方案框架是使用无模型的 RL。 由于开发的 RL, 与DRL 的深度配置, 将运行一个不连续运行的系统,因此, 将运行一个稳定的系统 运行一个不连续操作 。