Remote state estimation of large-scale distributed dynamic processes plays an important role in Industry 4.0 applications. In this paper, by leveraging the theoretical results of structural properties of optimal scheduling policies, we develop a structure-enhanced deep reinforcement learning (DRL) framework for optimal scheduling of a multi-sensor remote estimation system to achieve the minimum overall estimation mean-square error (MSE). In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure. This explores the action space more effectively and enhances the learning efficiency of DRL agents. Furthermore, we introduce a structure-enhanced loss function to add penalty to actions that do not follow the policy structure. The new loss function guides the DRL to converge to the optimal policy structure quickly. Our numerical results show that the proposed structure-enhanced DRL algorithms can save the training time by 50% and reduce the remote estimation MSE by 10% to 25%, when compared to benchmark DRL algorithms.
翻译:大规模分布式动态流程的远程状态估算在工业4.0应用中发挥了重要作用。在本文件中,通过利用最佳排期政策结构属性的理论结果,我们开发了一个结构强化深度强化学习框架,以优化多传感器或远程估算系统的时间安排,从而实现最低总体估计平均差值(MSE)的最小值。特别是,我们提出了一个结构强化的行动选择方法,该方法倾向于选择符合政策结构的行动。这可以更有效地探索行动空间,提高DRL代理商的学习效率。此外,我们引入了结构强化的损失功能,为不遵循政策结构的行动添加惩罚。新的损失函数引导DRL快速与最佳政策结构汇合。我们的数字结果表明,拟议的结构强化的DRL算法可以节省培训时间50%,并将远程估算MSE减少10%至25%,与基准DL算法相比。