In delay-sensitive industrial internet of things (IIoT) applications, the age of information (AoI) is employed to characterize the freshness of information. Meanwhile, the emerging network function virtualization provides flexibility and agility for service providers to deliver a given network service using a sequence of virtual network functions (VNFs). However, suitable VNF placement and scheduling in these schemes is NP-hard and finding a globally optimal solution by traditional approaches is complex. Recently, deep reinforcement learning (DRL) has appeared as a viable way to solve such problems. In this paper, we first utilize single agent low-complex compound action actor-critic RL to cover both discrete and continuous actions and jointly minimize VNF cost and AoI in terms of network resources under end-to end Quality of Service constraints. To surmount the single-agent capacity limitation for learning, we then extend our solution to a multi-agent DRL scheme in which agents collaborate with each other. Simulation results demonstrate that single-agent schemes significantly outperform the greedy algorithm in terms of average network cost and AoI. Moreover, multi-agent solution decreases the average cost by dividing the tasks between the agents. However, it needs more iterations to be learned due to the requirement on the agents collaboration.
翻译:在对事物的工业互联网(IIoT)应用中,信息年龄(AoI)被用来说明信息的新鲜性;同时,新兴的网络功能虚拟化为服务提供者利用虚拟网络功能序列(VNFs)提供特定网络服务提供了灵活性和灵活性;然而,在这些计划中适当的VNF定位和时间安排是NP硬的,通过传统方法寻找全球最佳解决办法是复杂的;最近,深入强化学习(DRL)似乎是解决这类问题的可行办法;在本文件中,我们首先利用单一代理低复合复合低功能动作行为者-Critic RL来涵盖离散和连续的行动,并在服务最终质量限制下联合尽量减少VNF的成本和AoI的网络资源;为克服单剂学习能力限制,我们然后将我们的解决办法扩大到多剂DRL计划,其中代理方相互协作;模拟结果表明,单剂计划在平均网络成本和AoI方面大大超出贪婪的算法。此外,多剂解决方案通过将所学的代理人之间的平均成本分成更多的代理人。