In the literature on algorithms for performing the multi-term addition $s_n=\sum_{i=1}^n x_i$ using floating-point arithmetic it is often shown that a hardware unit that has single normalization and rounding improves precision, area, latency, and power consumption, compared with the use of standard add or fused multiply-add units. However, non-monotonicity can appear when computing sums with a subclass of multi-term addition units, which currently is not explored in the literature. We demonstrate that common techniques for performing multi-term addition with $n\geq 4$, without normalization of intermediate quantities, can result in non-monotonicity -- increasing one of the addends $x_i$ decreases the sum $s_n$. Summation is required in dot product and matrix multiplication operations, operations that have increasingly started appearing in the hardware of supercomputers, thus knowing where monotonicity is preserved can be of interest to the users of these machines. Our results suggest that non-monotonicity of summation, in some of the commercial hardware devices that implement a specific class of multi-term adders, is a feature that may have appeared unintentionally as a consequence of design choices that reduce circuit area and other metrics. To demonstrate our findings, we use formal proofs as well as a numerical simulation of non-monotonic multi-term adders in MATLAB.
翻译:在使用浮点运算执行多项加法 $s_n=\sum_{i=1}^n x_i$ 的算法文献中,通常表明一个具有单一规格化和舍入的硬件单元,与使用标准加法器或融合乘加器相比,可提高精度、面积、延迟和功耗。但是,当使用一类多项加法器的计算和计算中间量的规格化时,会出现非单调性,这一点目前还没有在文献中得到探讨。我们证明了,常用用于执行 $n\geq 4$ 的多项加法运算的技术,当不规格化中间量时,可能会导致非单调性,即增加其中一个加数 $x_i$ 会减小和 $s_n$。求和是点积和矩阵乘法运算所必需的操作,在超级计算机的硬件中越来越普遍,因此,知道在哪里保存单调性可能对这些机器的用户有兴趣。我们的结果表明,在一些实现一类多项加法器的商业硬件设备中,求和的非单调性可能是由于减小电路面积和其他指标的设计选择所造成的意外结果。为了证明我们的结果,我们使用形式证明以及在 MATLAB 中进行非单调多项加法器的数值模拟。