Montgomery modular multiplication is widely-used in public key cryptosystems (PKC) and affects the efficiency of upper systems directly. However, modulus is getting larger due to the increasing demand of security, which results in a heavy computing cost. High-performance implementation of Montgomery modular multiplication is urgently required to ensure the highly-efficient operations in PKC. However, existing high-speed implementations still need a large amount redundant computing to simplify the intermediate result. Supports to the redundant representation is extremely limited on Montgomery modular multiplication. In this paper, we propose an efficient parallel variant of iterative Montgomery modular multiplication, called DRMMM, that allows the quotient can be computed in multiple iterations. In this variant, terms in intermediate result and the quotient in each iteration are computed in different radix such that computation of the quotient can be pipelined. Based on proposed variant, we also design high-performance hardware implementation architecture for faster operation. In the architecture, intermediate result in every iteration is denoted as three parts to free from redundant computations. Finally, to support FPGA-based systems, we design operators based on FPGA underlying architecture for better area-time performance. The result of implementation and experiment shows that our method reduces the output latency by 38.3\% than the fastest design on FPGA.
翻译:暂无翻译