We propose to control handoffs (HOs) in user-centric cell-free massive MIMO networks through a partially observable Markov decision process (POMDP) with the state space representing the discrete versions of the large-scale fading (LSF) and the action space representing the association decisions of the user with the access points. Our proposed formulation accounts for the temporal evolution and the partial observability of the channel states. This allows us to consider future rewards when performing HO decisions, and hence obtain a robust HO policy. To alleviate the high complexity of solving our POMDP, we follow a divide-and-conquer approach by breaking down the POMDP formulation into sub-problems, each solved individually. Then, the policy and the candidate cluster of access points for the best solved sub-problem is used to perform HOs within a specific time horizon. We control the number of HOs by determining when to use the HO policy. Our simulation results show that our proposed solution reduces HOs by 47% compared to time-triggered LSF-based HOs and by 70% compared to data rate threshold-triggered LSF-based HOs. This amount can be further reduced through increasing the time horizon of the POMDP.
翻译:我们建议通过一个部分可见的Markov决策程序(POMDP)控制用户中心无细胞大型MIMO网络的分流(HOs),通过代表大规模淡化(LSF)离散版本的国家空间和代表用户与接入点关联决定的行动空间。我们提议的配方说明频道状态的时间演变和部分可视性。这使我们能够在执行HO决定时考虑今后的奖励,从而获得强有力的HO政策。为了减轻解决我们的POMDP的高度复杂性,我们采取了分而治之办法,将POMDP的配方分成一个子问题,每个问题单独解决。然后,最佳解决子问题的政策和候选接入点组用于在特定的时间范围内执行HOs。我们通过决定何时使用HO政策来控制Hs的数量。我们的模拟结果表明,我们提议的解决方案比时间错开的LSFD-HOs减少了47 %, 与数据速率阈值的PSF-OM-HOM的门槛值相比,我们采用了70%的方法。