Deep reinforcement learning has been applied for a variety of wireless tasks, which is however known with high training and inference complexity. In this paper, we resort to deep deterministic policy gradient (DDPG) algorithm to optimize predictive power allocation among K mobile users requesting video streaming, which minimizes the energy consumption of the network under the no-stalling constraint of each user. To reduce the sampling complexity and model size of the DDPG, we exploit a kind of symmetric prior inherent in the actor and critic networks: permutation invariant and equivariant properties, to design the neural networks. Our analysis shows that the free model parameters of the DDPG can be compressed by 2/K^2. Simulation results demonstrate that the episodes required by the learning model with the symmetric prior to achieve the same performance as the vanilla policy reduces by about one third when K = 10.
翻译:在本文中,我们采用深度确定性政策梯度(DDPG)算法,优化要求视频流的K型移动用户的预测功率分配,从而将网络的能量消耗在每一个用户的无压限制下最小化。为了降低DDPG的取样复杂性和模型大小,我们利用了演员和评论家网络以前固有的一种对称性:变异性和等异性,设计神经网络。我们的分析表明,DDPG的自由模型参数可以压缩为2/K ⁇ 2。模拟结果显示,学习模型所需的与对称性在K=10达到与Vanilla政策相同的性能之前所需的过程减少了大约三分之一。