Convolutional Neural Networks (CNNs) demonstrate great performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to the memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of both activations and weights and increase the parallelism of memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency and 1.22X area efficiency compared with State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80% sparsity
翻译:彩虹神经网络(CNNNs)在各种应用中表现良好,但计算复杂度很高。 应用量化来降低CNN的静态和存储成本。 在量化方法中, 二进制和四进制网络(BWNs和TWNNs)在8位和4位四进制上具有独特的优势。 它们用新增加的CNN的倍增操作取代CNN的倍增操作,这在In- mory-Computing (IMC) 设备上是首选的。 已经广泛研究过BWNs的IM加速。 然而,尽管 TWNs的精确度和储存成本更高,但TWNs的IMMSNs加速度有限。 现有IMC设备上的TNs效率效率是低效率的, 在现有的IMCS- 22级网络中,S-xxxxl 的精度和S-xxl 速度将S-xxl 的精度操作降低速度。