Recently, there has been tremendous efforts by network operators and equipment vendors to adopt intelligence and openness in the next generation radio access network (RAN). The goal is to reach a RAN that can self-optimize in a highly complex setting with multiple platforms, technologies and vendors in a converged compute and connect architecture. In this paper, we propose two nested actor-critic learning based techniques to optimize the placement of resource allocation function, and as well, the decisions for resource allocation. By this, we investigate the impact of observability on the performance of the reinforcement learning based resource allocation. We show that when a network function (NF) is dynamically relocated based on service requirements, using reinforcement learning techniques, latency and throughput gains are obtained.
翻译:最近,网络运营商和设备供应商作出了巨大努力,在下一代无线电接入网络(RAN)中采用情报和开放性。目标是达到一个RAN,能够在高度复杂的环境中,以多种平台、技术和供应商在综合的计算和连接结构中实现自我优化。在本文件中,我们提议采用两种基于嵌套的基于行为体-批评学习的技术,以优化资源分配功能的定位,以及资源分配决策。我们以此调查可观测性对强化学习资源分配绩效的影响。我们表明,如果网络功能(NF)根据服务要求,利用强化学习技术、延时和吞量收益,被动态地转移。