The ever-increasing size and computational complexity of today's machine-learning algorithms pose an increasing strain on the underlying hardware. In this light, novel and dedicated architectural solutions are required to optimize energy efficiency by leveraging opportunities (such as intrinsic parallelism and robustness to quantization errors) exposed by algorithms. We herein address this challenge by introducing a flexible two-stages computing pipeline. The pipeline can support fine-grained operand quantization through software-supported Single Instruction Multiple Data (SIMD) operations. Moreover, it can efficiently execute sequential multiplications over SIMD sub-words thanks to zero-skipping and Canonical Signed Digit (CSD) coding. Finally, a lightweight repacking unit allows changing the bitwidth of sub-words at run-time dynamically. These features are implemented within a tight energy and area budget. Indeed, experimental results showcase that our approach greatly outperforms traditional hardware SIMD ones both in terms of area and energy requirements. In particular, our pipeline occupies up to 53.1% smaller than a hardware SIMD one supporting the same sub-word widths, while performing multiplication up to 88.8% more efficiently.
翻译:今天机器学习算法的日益扩大的规模和计算复杂性对基础硬件造成越来越大的压力。 如此看来, 需要创新和专门的建筑解决方案来利用算法暴露的机会( 如内在的平行性和对量化错误的稳健性) 来优化能源效率。 我们在此通过采用灵活的双阶段计算管道来应对这一挑战。 管道可以通过软件支持的单一指令多重数据操作来支持细微的拼写量。 此外, 通过零倾斜和Canonical 签名Digit(CSD)编码,它可以有效地执行SIMD子字的顺序倍增。 最后, 轻重的重新包装单元可以动态地改变运行时的小字的微宽度。 这些特征是在紧凑的能源和地区预算范围内实施的。 事实上, 实验结果显示,我们的方法在区域和能源需求方面大大超越了传统的SIMD硬件。 特别是, 我们的管道比SIMD硬件小53.1%, 支持同一小一个小的子字宽度, 并高效地进行倍化到88.8 % 。