Fast combinational multipliers with large bit widths can occupy significant silicon area, which also drives up power consumption. Area can be reduced through resource sharing (i.e., folding) at the expense of lower throughput, which is acceptable for some applications. This work explores multiple architectures for Multi-Cycle folded Integer Multiplier (MCIM) designs, which are based on Schoolbook and Karatsuba approaches. Applications sometimes require a fractional number of multiplications to be performed per cycle. For example, an algorithm may only require 3.5 multiplications per cycle. In such a case, 3 multipliers with a throughput of 1 plus an additional smaller multiplier with a throughput of $1/2$ would be sufficient to maintain the algorithm's throughput. Our MCIM design generator offers customization in terms of throughput, latency, and clock frequency. MCIM designs were synthesized and verified for various parameter values using scripts. ASIC synthesis results show that MCIM designs with a throughput of $1/2$ offer area savings of up to 44% for bit widths of 8 to 128 with respect to directly synthesizing the * operator. Additionally, MCIM designs can offer up to 33% energy savings and 65% average peak power reduction.
翻译:暂无翻译