Over the last few years, the DRL paradigm has been widely adopted for 5G and beyond network optimization because of its extreme adaptability to many different scenarios. However, collecting and processing learning data entail a significant cost in terms of communication and computational resources, which is often disregarded in the networking literature. In this work, we analyze the cost of learning in a resource-constrained system, defining an optimization problem in which training a DRL agent makes it possible to improve the resource allocation strategy but also reduces the number of available resources. Our simulation results show that the cost of learning can be critical when evaluating DRL schemes on the network edge and that assuming a cost-free learning model can lead to significantly overestimating performance.
翻译:过去几年来,由于对许多不同情景的极端适应性,DRL模式在5G和网络优化之外被广泛采用,因为它对许多不同情景具有极大的适应性;然而,收集和处理学习数据在通信和计算资源方面需要大量费用,而这种费用往往在网络文献中被忽略;在这项工作中,我们分析了在资源限制的系统中的学习费用,确定了培训DRL代理机构可以改进资源分配战略,但也减少了可用资源数量的优化问题;我们的模拟结果表明,在评价网络边缘的DRL计划时,学习费用可能十分关键,假设一个免费学习模式会导致大大高估业绩。