Owing to its high parallelism, belief propagation (BP) decoding is highly amenable to high-throughput implementations and thus represents a promising solution for meeting the ultra-high peak data rate of future communication systems. However, for polar codes, the error-correcting performance of BP decoding is far inferior to that of the widely used CRC-aided successive cancellation list (SCL) decoding algorithm. To close the performance gap to SCL, BP list (BPL) decoding expands the exploration of candidate codewords through multiple permuted factor graphs (PFGs). From an implementation perspective, designing a unified and flexible hardware architecture for BPL decoding that supports various PFGs and code configurations presents a big challenge. In this paper, we propose the first hardware implementation of a BPL decoder for polar codes and overcome the implementation challenge by applying a hardware-friendly algorithm that generates flexible permutations on-the-fly. First, we derive the graph selection gain and provide a sequential generation (SG) algorithm to obtain a near-optimal PFG set. We further prove that any permutation can be decomposed into a combination of multiple fixed routings, and we design a low-complexity permutation network to satisfy the decoding schedule. Our BPL decoder not only has a low decoding latency by executing the decoding and permutation generation in parallel, but also supports an arbitrary list size without any area overhead. Experimental results show that, for length-1024 polar codes with a code rate of one-half, our BPL decoder with 32 PFGs has a similar error-correcting performance to SCL with a list size of 4 and achieves a throughput of 25.63 Gbps and an area efficiency of 29.46 Gbps/mm$^{2}$ at SNR=4.0dB, which is 1.82$\times$ and 4.33$\times$ faster than the state-of-the-art BP flip and SCL decoders,~respectively
翻译:由于其高度的并行性,置信传播(BP)译码非常适合高通量实现,因此对于满足未来通信系统超高峰值数据速率的需求而言,它是一种有前途的解决方案。然而,对于极化码而言,BP译码的纠错性能远远不及广泛采用的带有循环冗余校验码(CRC)的逐次取消列表(SCL)译码算法。为了接近SCL的性能,BP列表(BPL)译码通过多个置换因子图(PFG)扩展候选码字的探索。从实现角度来看,为BPL译码设计一个统一而灵活的硬件架构,以支持各种PFG和编码配置,是一个巨大的挑战。在本文中,我们提出了极化码的BPL译码器的第一个硬件实现,并通过应用一个硬件友好的算法,在线生成灵活的置换来克服实现的挑战。首先,我们推导了图选择增益,并提供了一种顺序生成(SG)算法来获取近似最优的PFG集合。我们进一步证明任何置换都可以分解成多个固定路由的组合,并设计了一个低复杂度的置换网络来满足解码日程安排。我们的BPL译码器不仅通过并行执行解码和置换生成来实现低解码延迟,而且能够支持任意列表大小,而不需要任何面积开销。实验结果表明,对于码长为1024,码率为一半的极化码,我们的具有32个PFG的BPL译码器在SNR=4.0dB时具有类似于列表大小为4的SCL的纠错性能,并实现了25.63 Gbps的吞吐量和29.46 Gbps/mm$^{2}$的面积效率,比BP翻转和SCL译码器的最新技术快1.82$\times$和4.33$\times$,分别。