合作通信中的语义学时代:加速模拟通过离线强化学习实现现实 (Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning)

The age of information metric fails to correctly describe the intrinsic semantics of a status update. In an intelligent reflecting surface-aided cooperative relay communication system, we propose the age of semantics (AoS) for measuring semantics freshness of the status updates. Specifically, we focus on the status updating from a source node (SN) to the destination, which is formulated as a Markov decision process (MDP). The objective of the SN is to maximize the expected satisfaction of AoS and energy consumption under the maximum transmit power constraint. To seek the optimal control policy, we first derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework. However, implementing the online DAC in practice poses the key challenge in infinitely repeated interactions between the SN and the system, which can be dangerous particularly during the exploration. We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset without any further interactions with the system. Numerical experiments verify the theoretical results and show that our offline DAC scheme significantly outperforms the online DAC scheme and the most representative baselines in terms of mean utility, demonstrating strong robustness to dataset quality.

翻译：信息衡量的时代未能正确描述状态更新的内在语义。在一个智能的反映表面辅助的合作中继通信系统中,我们提出测量状态更新的语义更新的语义学年龄。具体地说,我们侧重于从源节点到目的地的状态更新,这是作为Markov决定程序(MDP)制定的。SN的目标是在最大传输电源限制下最大限度地实现AOS和能源消耗的预期满意度。为了寻求最佳控制政策,我们首先在政策时间差异学习框架内推出一个在线深层次的行为者-批评(DAC)学习计划。然而,在实践中实施在线DAC对S和系统之间无限重复的互动提出了关键挑战,这在探索期间可能特别危险。我们随后提出了一个全新的离线的DAC计划,该计划从先前收集的数据集中估算了最佳控制政策,而没有与系统进一步互动。数字实验核实理论结果,并显示我们离线的DAC计划大大超越了在线的DAC计划,在平均效用方面展示了最有力的质量基线。