In this work we propose RELDEC, a novel approach for sequential decoding of moderate length low-density parity-check (LDPC) codes. The main idea behind RELDEC is that an optimized decoding policy is subsequently obtained via reinforcement learning based on a Markov decision process (MDP). In contrast to our previous work, where an agent learns to schedule only a single check node (CN) within a group (cluster) of CNs per iteration, in this work we train the agent to schedule all CNs in a cluster, and all clusters in every iteration. That is, in each learning step of RELDEC an agent learns to schedule CN clusters sequentially depending on a reward associated with the outcome of scheduling a particular cluster. We also modify the state space representation of the MDP, enabling RELDEC to be suitable for larger block length LDPC codes than those studied in our previous work. Furthermore, to address decoding under varying channel conditions, we propose two related schemes, namely, agile meta-RELDEC (AM-RELDEC) and meta-RELDEC (M-RELDEC), both of which employ meta-reinforcement learning. The proposed RELDEC scheme significantly outperforms standard flooding and random sequential decoding for a variety of LDPC codes, including codes designed for 5G new radio.
翻译:在这项工作中,我们提议了RELDEC,这是对中低密度低密度对等检查(LDPC)编码进行顺序解码的新做法。RELDEC的主要思想是,随后通过基于Markov决定程序的强化学习(MDP)获得优化解码政策。与我们以前的工作不同,我们以前的工作是,代理人学习在一个小组(集群)中,在每迭代氯化萘中,只安排一个单一的检查节点(CN),在这项工作中,我们训练代理人将所有氯化萘列入一个集群,并在每迭代中进行所有集群。这就是,在RELDEC的每个学习步骤中,代理人学习根据与某一特定集群安排结果有关的奖励,按顺序安排氯化萘集群。我们还修改MDP的国家空间代表,使RELDEC能够适应比我们以往工作所研究的更大块长的LDPC编码(CN)。此外,为了处理不同频道条件下的解码问题,我们提议了两个相关的计划,即敏捷元-RELDEC(AM-RELDEC)和元-RELDEC(M-REDEC),这两份计划都是为REDRAF-RADRADADAD,其中拟议的新的标准,包括REC-REC-RADREDREDC-RADARDARDARDC。这两种模式,这两份都将大幅学习了一个新的标准。