We consider a scenario where multiple users, powered by energy harvesting, send version updates over a fading multiple access channel (MAC) to an access point (AP). Version updates having random importance weights arrive at a user according to an exogenous arrival process, and a new version renders all previous versions obsolete. As energy harvesting imposes a time-varying peak power constraint, it is not possible to deliver all the bits of a version instantaneously. Accordingly, the AP chooses the objective of minimizing a finite-horizon time average expectation of the product of importance weight and a convex increasing function of the number of remaining bits of a version to be transmitted at each time instant. The objective enables importance-aware delivery of as many bits, as soon as possible. In this setup, the AP optimizes the objective function subject to an achievable rate-region constraint of the MAC and energy constraints at the users, by deciding the transmit power and the number of bits to be transmitted by each user. We obtain a Markov Decision Process (MDP)-based optimal online policy to the problem and derive structural properties of the policy. We then develop a neural network (NN)-based online heuristic policy, for which we train an NN on the optimal offline policy derived for different sample paths of energy, version arrival and channel power gain processes. Via numerical simulations, we observe that the NN-based online policy performs competitively with respect to the MDP-based online policy.
翻译:我们考虑多个由能量采集驱动的用户通过衰落多路访问通道(MAC)向接入点(AP)发送版本更新的情况。根据外生到达过程,大小随机的版本更新具有随机的重要性权重,每个用户按此权重顺序接收,新版本使所有先前的版本都过时。由于能量采集强加了时变峰值功率限制,不可能瞬间传递版本的所有比特。因此,AP选择目标是在有限时间跨度内最小化重要性权重和剩余待传递比特数的凸递增函数的乘积的期望时间平均值。该目标实现了尽快按重要性传递尽可能多的比特。在此设置中,AP决定每个时间时刻的传输功率和每个用户要传输的比特数,以优化MAC的可实现速率区域约束和用户的能量约束,从而优化目标函数。我们获得了一个基于马尔可夫决策过程(MDP)的问题最优在线策略,并推导出策略的结构性质。然后,我们开发了一种基于神经网络(NN)的在线启发式策略,该策略使用由针对能源、版本到达和通道功率增益过程的不同样本路径导出的最优离线策略进行培训。通过数值模拟,我们观察到基于NN的在线策略在性能上与基于MDP的在线策略相当。