Recent advances in distributed artificial intelligence (AI) have led to tremendous breakthroughs in various communication services, from fault-tolerant factory automation to smart cities. When distributed learning is run over a set of wirelessly connected devices, random channel fluctuations and the incumbent services running on the same network impact the performance of both distributed learning and the coexisting service. In this paper, we investigate a mixed service scenario where distributed AI workflow and ultra-reliable low latency communication (URLLC) services run concurrently over a network. Consequently, we propose a risk sensitivity-based formulation for device selection to minimize the AI training delays during its convergence period while ensuring that the operational requirements of the URLLC service are met. To address this challenging coexistence problem, we transform it into a deep reinforcement learning problem and address it via a framework based on soft actor-critic algorithm. We evaluate our solution with a realistic and 3GPP-compliant simulator for factory automation use cases. Our simulation results confirm that our solution can significantly decrease the training delay of the distributed AI service while keeping the URLLC availability above its required threshold and close to the scenario where URLLC solely consumes all network resources.
翻译:分布式人工智能(AI)的最近进展导致各种通信服务(从防故障工厂自动化到智能城市)的巨大突破,从防故障工厂自动化到智能城市。当分布式学习通过一组无线连接装置、随机频道波动和在同一网络运行的在职服务进行时,影响分布式学习和共存服务的业绩。在本文件中,我们调查一种混合的服务情景,即分布式人工智能工作流程和超可靠低悬浮通信(URLLC)服务同时在一个网络中运行。因此,我们建议为选择设备而采用基于风险的敏感度的配方,以尽量减少大赦国际培训的延误,同时确保满足URLC服务的运行要求。为了解决这一具有挑战性的共存问题,我们将其转化为一个深度强化学习问题,并通过一个基于软的动作-crical算法的框架加以解决。我们用符合3GPP的模拟器来评估我们的解决办法,用于工厂自动化使用案例。我们的模拟结果证实,我们的解决办法可以大大减少分布式的AI服务的培训延误,同时将URLC的提供量保持在所需的阈值之上,并接近URLC完全消耗所有网络资源的情景。