Single-precision floating point (FP32) data format, defined by the IEEE 754 standard, is widely employed in scientific computing, signal processing, and deep learning training, where precision is critical. However, FP32 multiplication is computationally expensive and requires complex hardware, especially for precisely handling mantissa multiplication. In practical applications like neural network inference, perfect accuracy is not always necessary, minor multiplication errors often have little impact on final accuracy. This enables trading precision for gains in area, power, and speed. This work focuses on CNN inference using approximate FP32 multipliers, where the mantissa multiplication is approximated by employing error-variant approximate compressors, that significantly reduce hardware cost. Furthermore, this work optimizes CNN performance by employing differently approximated FP32 multipliers and studying their impact when interleaved within the kernels across the convolutional layers. The placement and ordering of these approximate multipliers within each kernel are carefully optimized using the Non-dominated Sorting Genetic Algorithm-II, balancing the trade-off between accuracy and hardware efficiency.
翻译:暂无翻译