One of the key benefits of virtualized radio access networks (vRANs) is network management flexibility. However, this versatility raises previously-unseen network management challenges. In this paper, a learning-based zero-touch vRAN orchestration framework (LOFV) is proposed to jointly select the functional splits and allocate the virtualized resources to minimize the long-term management cost. First, testbed measurements of the behaviour between the users' demand and the virtualized resource utilization are collected using a centralized RAN system. The collected data reveals that there are non-linear and non-monotonic relationships between demand and resource utilization. Then, a comprehensive cost model is proposed that takes resource overprovisioning, declined demand, instantiation and reconfiguration into account. Moreover, the proposed cost model also captures different routing and computing costs for each split. Motivated by our measurement insights and cost model, LOFV is developed using a model-free reinforcement learning paradigm. The proposed solution is constructed from a combination of deep Q-learning and a regression-based neural network that maps the network state and users' demand into split and resource control decisions. Our numerical evaluations show that LOFV can offer cost savings by up to 69\% of the optimal static policy and 45\% of the optimal fully dynamic policy.
翻译:虚拟无线电访问网络(vRANs)的关键好处之一是网络管理灵活性。然而,这种多功能性提出了以前看不到的网络管理挑战。在本文件中,建议联合选择功能分割和分配虚拟资源,以尽量减少长期管理费用。首先,使用集中的RAN系统收集用户需求和虚拟资源利用之间行为的测试衡量方法。所收集的数据显示,需求和资源利用之间存在非线性和非线性和非线性关系。然后,建议采用一个综合成本模型,将资源过多供应、需求减少、即时和重组考虑在内。此外,拟议的成本模型还捕捉了每种分割的不同路线和计算费用。根据我们的计量洞察和成本模型,利用无模型的强化学习模式开发LOFV。 拟议的解决方案是从深层次的学习和基于回归的神经网络组合中构建的,将网络状态和用户的需求映射成45度和最佳动态政策决定。我们的数字评估能够通过LOV政策的成本分割和最佳动态政策控制来充分展示。