Reliable communication through multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) requires accurate channel estimation. Existing literature largely focuses on denoising methods for channel estimation that are dependent on either (i) channel analysis in the time-domain, and/or (ii) supervised learning techniques, requiring large pre-labeled datasets for training. To address these limitations, we present a frequency-domain denoising method based on the application of a reinforcement learning framework that does not need a priori channel knowledge and pre-labeled data. Our methodology includes a new successive channel denoising process based on channel curvature computation, for which we obtain a channel curvature magnitude threshold to identify unreliable channel estimates. Based on this process, we formulate the denoising mechanism as a Markov decision process, where we define the actions through a geometry-based channel estimation update, and the reward function based on a policy that reduces the MSE. We then resort to Q-learning to update the channel estimates over the time instances. Numerical results verify that our denoising algorithm can successfully mitigate noise in channel estimates. In particular, our algorithm provides a significant improvement over the practical least squares (LS) channel estimation method and provides performance that approaches that of the ideal linear minimum mean square error (LMMSE) with perfect knowledge of channel statistics.
翻译:通过多输入多输出(MIMO)或多频多输出(OFDM)的可靠通信,需要准确的频道估算。现有文献主要侧重于频道估算的解密方法,这些方法取决于:(一) 时间范围内的频道分析,和(或)(二) 监管的学习技术,需要大量的预贴标签数据集进行培训。为解决这些限制,我们提出了一个基于应用强化学习框架的频率持续取消方法,不需要事先的频道知识和预先标记的数据。我们的方法包括一个新的连续频道分解过程,以频道曲线计算为基础,为此我们获得频道曲线质量临界值的临界值,以确定不可靠的频道估算。根据这一过程,我们将分解机制设计为马尔科夫决策程序,我们通过基于几何的频道估算更新来界定行动,以及基于减少MSE的政策的奖励功能。我们随后通过Q学习来更新频道估算。我们的数据包括一个新的连续频道分流分解过程,为此我们获得了一条频道缩略度临界值阈值临界值阈值阈值阈值,从而能够成功地减少该频道的精确度数据。