面向智能边缘设备的紧凑型低功耗跨精度算术逻辑单元 (A Compact, Low Power Transprecision ALU for Smart Edge Devices)

Transprecision computing (TC) is a promising approach for energy-efficient machine learning (ML) computation on resource-constrained platforms. This work presents a novel ASIC design of a Transprecision Arithmetic and Logic Unit (TALU) that can support multiple number formats: Posit, Floating Point (FP), and Integer (INT) data with variable bitwidth of 8, 16, and 32 bits. Additionally, TALU can be reconfigured in runtime to support TC without overprovisioning the hardware. Posit is a new number format, gaining traction for ML computations, producing similar accuracy in lower bitwidth than FP representation. This paper thus proposes a novel algorithm for decoding Posit for energy-efficient computation. TALU implementation achieves a 54.6x reduction in power consumption and 19.8x reduction in the area as compared to a state-of-the-art unified MAC unit (UMAC for Posit and FP computation. Experimental results on an ML compute kernel executed on a Vector Processor of TALUs integrated with a RISC-V processor achieves about 2x improvement in energy efficiency and similar throughput as compared to a state-of-the-art TC-based vector processor.

翻译：跨精度计算（TC）是一种在资源受限平台上实现高能效机器学习（ML）计算的有效方法。本文提出了一种新型专用集成电路（ASIC）设计的跨精度算术逻辑单元（TALU），该单元可支持多种数据格式：Posit、浮点（FP）和整数（INT），并具有8位、16位和32位的可变位宽。此外，TALU可在运行时重新配置以支持TC，而无需过度配置硬件。Posit是一种新兴的数据格式，在ML计算中日益受到关注，它能在比FP表示更低的位宽下实现相似的精度。为此，本文提出了一种用于解码Posit以实现高能效计算的新算法。与支持Posit和FP计算的最先进统一乘累加单元（UMAC）相比，TALU的实现实现了功耗降低54.6倍，面积减少19.8倍。在与RISC-V处理器集成的TALU向量处理器上执行ML计算内核的实验结果表明，与最先进的基于TC的向量处理器相比，能效提升了约2倍，同时保持了相近的吞吐量。