Sorting is one of the fundamental problems in computer science. Playing a role in many processes, it has a lower complexity bound imposed by $\mathcal{O}(n\log{n})$ when executing on a sequential machine. This limit can be brought down to sub-linear times thanks to parallelization techniques that increase the number of comparisons done in parallel. This, however, increases the cost of implementation, which limits the application of such techniques. Moreover, as the size of the arrays increases, a bottleneck arises in moving the vast quantities of data required at the input, and generated at the output of such sorter. This might impose time requirements much stricter than those of the sorting itself. In this paper, a novel parallel sorter is proposed for the specific case where the input is parallel, but the output is serial. The design is then implemented and verified on an FPGA within the context of a quantum LDPC decoder. A latency of $\log{n}$ is achieved for the output of the first element, after which the rest stream out for a total sorting time of $n+\log{n}$. Contrary to other parallel sorting methods, clock speed does not degrade with $n$, and resources scale linearly with input size.
翻译:暂无翻译