Remote state estimation of large-scale distributed dynamic processes plays an important role in Industry 4.0 applications. In this paper, we focus on the transmission scheduling problem of a remote estimation system. First, we derive some structural properties of the optimal sensor scheduling policy over fading channels. Then, building on these theoretical guidelines, we develop a structure-enhanced deep reinforcement learning (DRL) framework for optimal scheduling of the system to achieve the minimum overall estimation mean-square error (MSE). In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure. This explores the action space more effectively and enhances the learning efficiency of DRL agents. Furthermore, we introduce a structure-enhanced loss function to add penalties to actions that do not follow the policy structure. The new loss function guides the DRL to converge to the optimal policy structure quickly. Our numerical experiments illustrate that the proposed structure-enhanced DRL algorithms can save the training time by 50% and reduce the remote estimation MSE by 10% to 25% when compared to benchmark DRL algorithms. In addition, we show that the derived structural properties exist in a wide range of dynamic scheduling problems that go beyond remote state estimation.
翻译:大规模分布分布式动态进程远程状态估计在工业4.0应用中起着重要作用。 在本文中,我们侧重于远程估计系统的传输时间安排问题。 首先,我们从淡化的渠道中获取最佳传感器列表政策的一些结构性属性。 然后,在这些理论指南的基础上,我们开发了一个结构强化的深强化学习框架,以优化系统优化时间安排,从而实现最低总体估计平均值差错(MSE) 。特别是,我们建议了一个结构强化的行动选择方法,该方法倾向于选择符合政策结构的行动。这可以更有效地探索行动空间,提高DRL代理的学习效率。此外,我们引入了结构强化的损失函数,对不遵循政策结构的行动增加惩罚。新的损失函数引导DRL快速与最佳政策结构汇合。我们的数字实验表明,拟议的结构强化的DRL算法可以节省培训时间50 %,并将远程估计MSE减少10%,与基准的DRL远程算法相比,将远程估计减少25%。此外,我们展示了动态测算范围以外的结构属性。