Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of activations and weights and increase the parallelism across memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80% average sparsity.
翻译:连锁神经网络(CNNNs)在各种应用程序中表现优异,但计算复杂度很高。 应用定量化来降低CNN的延迟和存储权重。 在量化方法中, 二进制和四进制网络(BWNs和TWNNs)在8位和4位四进制上具有独特的优势。 它们用附加来取代CNN的倍增操作, 这在In- 计量- Pcut(IMC) 设备上是首选的。 已经广泛研究了 BWNs IMC 的加速。 然而, 虽然 TWNNs比 BWNs的精确度和存储权重更高, 但是 TWNNNS的加速度有限研究。 现有的 IMC 设备上的TNNs效率低效率, 因为宽度没有很好地使用, 增加的操作效率。 在本文中, 我们建议FAT 与 3进制的 IMC 加速器 。 首先, 我们提议一个S- X 递增级控制股, 利用 TWNNNs 速度速度速度速度速度, 将S 递增到 递增缩缩缩缩缩到 。