Fast multipliers with large bit widths can occupy significant silicon area, which, in turn, can be minimized by employing multi-cycle multipliers. This paper introduces architectures and parameterized Verilog circuit generators for 2-cycle integer multipliers. When implementing an algorithm in hardware, it is common that less than 1 multiplication needs to be performed per clock cycle. It is also possible that the multiplications per cycle is a fractional number, e.g., 3.5. In such case, we can surely use 4 multipliers, each with a throughput of 1 result per cycle. However, we can instead use 3 such multipliers plus a multiplier with a throughput of 1/2. Resource sharing allows a multiplier with a lower throughput to be smaller, hence area savings. These multipliers offer customization in regards to the latency and clock frequency. All proposed designs were automatically synthesized and tested for various bit widths. Two main architectures are presented in this work, and each has several variants. Our 2-cycle multipliers offer up to 21%, 42%, 32%, 41%, and 48% of area savings for bit widths of 8, 16, 32, 64, and 128, with respect to synthesizing the "*" operator with throughput of 1. Furthermore, some of the proposed designs also offer power savings under certain conditions.
翻译:具有大位宽度的快速增殖器可以占据显著的硅区域, 而通过使用多周期乘数可以将其最小化。 本文为两周期整数乘数引入了结构图和维里龙电路的参数化生成器。 在使用硬件算法时, 通常每个时钟周期需要的乘数小于1倍。 每个周期的乘数可能是一个分数, 例如3.5。 在这种情况下, 我们肯定可以使用4倍数, 每个周期的吞吐量为1个结果。 但是, 我们也可以使用3个这样的乘数加上一个乘数, 吞吐量为1/2。 资源共享可以使一个低量的乘数变数较小, 从而节省区域。 这些乘数通常为每个时钟周期需要执行一个小于1倍的算算法。 所有拟议设计都自动合成并测试了多个位宽度。 在这项工作中, 两种主要结构都有几种变式。 我们的2周期乘数可以提供21 %、 42 %、 32 %、 41 % 和 48 % 区域储蓄率的乘数, 也提供点宽度, 在18x 中, 标准 的,, 将 将 提供 的 宽度 的 的 的 的 的 的, 将 的 的 的 以 1 到 1 到 1 到 1 到 的 的 的 的 的 的 。