Fast combinational multipliers with large bit widths can occupy significant silicon area. Provided the application allows for a multiplication to last two or more clock cycles, the area can be reduced through resource sharing (i.e., folding). This work introduces multiple architectures and parameterized Verilog circuit generators for Multi-Cycle folded Integer Multiplier (MCIM) designs, which are based on Schoolbook and Karatsuba approaches. When implementing an application in hardware, it is possible that a fractional number of multiplications is performed per cycle on average, such as 3.5. In such a case, we can use 3 single-cycle multipliers plus an additional smaller multiplier with a ThroughPut (TP) of 0.5. Our MCIM designs offer customization in terms of TP, latency, and clock frequency. The MCIM idea is for a TP of $1/n$, where $n$ is an integer and $n \geq 2$. All proposed designs were synthesized and verified for various bit widths using scripts. ASIC synthesis results show that MCIM designs with a TP of 1/2 offer area savings of 21% to 48% for bit widths of 8 to 128, with respect to synthesizing the * operator. Additionally, MCIM designs can offer up to 33% energy savings and 84% average peak power reduction.
翻译:快速的组合乘法器在具有大位宽的情况下可能占据显著的硅面积。如果应用程序允许乘法持续两个或更多时钟周期,则可以通过资源共享(即折叠)来减少面积。本文介绍了多种基于 Schoolbook 和 Karatsuba 方法的多周期折叠整数乘法器(MCIM)设计的体系结构和参数化 Verilog 电路生成器。在硬件中实现应用程序时,每个周期平均可能执行一个分数乘法,例如 3.5,此时我们可以使用 3 个单周期乘法器加上一个具有 ThroughPut (TP) 为 0.5 的附加较小乘法器。我们的 MCIM 设计支持 TP、延迟和时钟频率的定制化。MCIM 的思想是 TP 为 $1/n$,其中 $n$ 是一个整数且 $n\geq 2$。所有提出的设计均使用脚本针对各种位宽进行了合成和验证。 ASIC 合成结果显示,TP 为 1/2 的 MCIM 设计在位宽为 8 到 128 时相对于合成 * 操作符可节约 21% 到 48% 的面积。此外,MCIM 设计可节约高达 33% 的能量和 84% 的平均峰值功率。