We investigate the coexistence of task-oriented and data-oriented communications in a IoT system that shares a group of channels, and study the scheduling problem to jointly optimize the weighted age of incorrect information (AoII) and throughput, which are the performance metrics of the two types of communications, respectively. This problem is formulated as a Markov decision problem, which is difficult to solve due to the large discrete action space and the time-varying action constraints induced by the stochastic availability of channels. By exploiting the intrinsic properties of this problem and reformulating the reward function based on channel statistics, we first simplify the solution space, state space, and optimality criteria, and convert it to an equivalent Markov game, for which the large discrete action space issue is greatly relieved. Then, we propose a Whittle's index guided multi-agent proximal policy optimization (WI-MAPPO) algorithm to solve the considered game, where the embedded Whittle's index module further shrinks the action space, and the proposed offline training algorithm extends the training kernel of conventional MAPPO to address the issue of time-varying constraints. Finally, numerical results validate that the proposed algorithm significantly outperforms state-of-the-art age of information (AoI) based algorithms under scenarios with insufficient channel resources.
翻译:我们调查了任务导向通信和数据导向通信在共享一组频道的IoT系统中共存的情况,并研究了时间安排问题,以共同优化不正确信息(AoII)和传输量的加权年龄,这分别是两种类型通信的性能衡量标准。这个问题被表述为Markov决定问题,由于巨大的离散行动空间和由于频道的随机可用性所造成的时间变化行动限制,这个问题难以解决。通过利用这一问题的内在特性和根据频道统计重新设定奖励功能,我们首先简化了解决方案空间、状态空间和最佳性能标准,并将它转换为相当的Markov游戏,而大型离散行动空间问题对此非常宽慰。然后,我们提出了惠特指数指导多剂准政策优化(WI-MAPO)算法,以解决深思熟虑的游戏,其中嵌入的Whitttle索引模块进一步缩小了行动空间,拟议离线培训算法将传统的MAPO培训核心范围扩大到解决基于时间变化的资源问题。最后,根据基于时间变化的频道的算法(A,数字结果),根据基于不完全基于时间变化的轨算法,证实了拟议的国家算法。