Time Slotted Channel Hopping (TSCH) behavioural mode has been introduced in IEEE 802.15.4e standard to address the ultra-high reliability and ultra-low power communication requirements of Industrial Internet of Things (IIoT) networks. Scheduling the packet transmissions in IIoT networks is a difficult task owing to the limited resources and dynamic topology. In this paper, we propose a phasic policy gradient (PPG) based TSCH schedule learning algorithm. The proposed PPG based scheduling algorithm overcomes the drawbacks of totally distributed and totally centralized deep reinforcement learning-based scheduling algorithms by employing the actor-critic policy gradient method that learns the scheduling algorithm in two phases, namely policy phase and auxiliary phase.
翻译:在IEEE 802.15.4e 标准中引入了时间流声震(TSCH)行为模式,以解决工业物互联网(IIOT)网络的超高可靠性和超低功率通信要求。由于资源有限和动态地形学,将IIOT网络的包传输安排成计划是一项困难的任务。在本文件中,我们建议采用基于基于速变政策梯度(PPG)的TSCH时间表学习算法。拟议的基于PPG的时间安排算法克服了完全分布和完全集中的深入强化学习的列表算法的缺点,即通过使用在政策阶段和辅助阶段这两个阶段学习排程算法的行为者-批评政策梯度法。