Scheduling the transmission of time-sensitive information from a source node to multiple users over error-prone communication channels is studied with the goal of minimizing the long-term average age of information (AoI) at the users. A long-term average resource constraint is imposed on the source, which limits the average number of transmissions. The source can transmit only to a single user at each time slot, and after each transmission, it receives an instantaneous ACK/NACK feedback from the intended receiver, and decides when and to which user to transmit the next update. Assuming the channel statistics are known, the optimal scheduling policy is studied for both the standard automatic repeat request (ARQ) and hybrid ARQ (HARQ) protocols. Then, a reinforcement learning(RL) approach is introduced to find a near-optimal policy, which does not assume any a priori information on the random processes governing the channel states. Different RL methods including average-cost SARSAwith linear function approximation (LFA), upper confidence reinforcement learning (UCRL2), and deep Q-network (DQN) are applied and compared through numerical simulations
翻译:研究将时间敏感信息从源节点向多用户传送到容易出错的通信频道的时间敏感信息,目的是最大限度地减少用户信息的长期平均年龄(AoI),对源施加长期平均资源限制,限制平均传输次数。源只能在每个时段向单一用户传送,每次传输后,源只能从预定接收器获得瞬时ACK/NACK反馈,并决定用户何时和向哪个用户传送下一次更新。假设频道统计数据已经为已知,则为标准自动重复请求(ARQ)和混合ARQ(HARQ)协议研究最佳时间安排政策。然后,采用强化学习(RL)方法寻找接近最佳的政策,该政策不假定关于频道状态随机运行过程的任何先验信息。不同的RL方法,包括平均成本的SASARA和线性功能近(LFA)、高信任度学习(UCRL2)和深Q网络(DQN),通过数字模拟加以应用和比较。