We investigate the applicability of deep reinforcement learning algorithms to the adaptive initial access beam alignment problem for mmWave communications using the state-of-the-art proximal policy optimization algorithm as an example. In comparison to recent unsupervised learning based approaches developed to tackle this problem, deep reinforcement learning has the potential to address a new and wider range of applications, since, in principle, no (differentiable) model of the channel and/or the whole system is required for training, and only agent-environment interactions are necessary to learn an algorithm (be it online or using a recorded dataset). We show that, although the chosen off-the-shelf deep reinforcement learning agent fails to perform well when trained on realistic problem sizes, introducing action space shaping in the form of beamforming modules vastly improves the performance, without sacrificing much generalizability. Using this add-on, the agent is able to deliver competitive performance to various state-of-the-art methods on simulated environments, even under realistic problem sizes. This demonstrates that through well-directed modification, deep reinforcement learning may have a chance to compete with other approaches in this area, opening up many straightforward extensions to other/similar scenarios.
翻译:我们调查了深强化学习算法是否适用于毫米Wave通信的适应性初始存取波束调整问题,例如,利用最先进的近身强化政策优化算法作为实例。与最近为解决这一问题而开发的未经监督的基于学习的方法相比,深强化学习有潜力解决新的和更广泛的应用,因为原则上不需要该频道和/或整个系统的任何(可区别的)模型来进行培训,而且只有代理器-环境相互作用才能学习一种算法(无论是在线还是使用已记录的数据集)。我们表明,尽管所选的现成的深身强化学习代理在接受关于现实问题大小的培训时无法很好地发挥作用,但引入以模形模版形式形成的行动空间可以大大改进性能,同时又不牺牲很多一般性。使用这一加法,该代理商能够向模拟环境中的各种最先进的方法提供竞争性的绩效,即使是在实际问题大小之下。这证明,通过有条不紊的修改,深度强化学习可能有机会与这个区域的其他方法竞争。