Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts. However, 2 bits are required to encode the ternary representation with only 3 quantization levels leveraged. As a result, conventional TNNs have similar memory consumption and speed compared with the standard 2-bit models, but have worse representational capability. Moreover, there is still a significant gap in accuracy between TNNs and full-precision networks, hampering their deployment to real applications. To tackle these two challenges, in this work, we first show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2. Second, to mitigate the performance gap, we elaborately design an implementation-dependent ternary quantization algorithm. The proposed framework is termed Fast and Accurate Ternary Neural Networks (FATNN). Experiments on image classification demonstrate that our FATNN surpasses the state-of-the-arts by a significant margin in accuracy. More importantly, speedup evaluation compared with various precisions is analyzed on several platforms, which serves as a strong benchmark for further research.
翻译:远程神经网络(TNN)由于具有比完全精准的对应方更快速的推论以及更高效的电力,因此受到高度重视,因为其潜在规模可能比完全精准的对应方的推导速度要快得多,但是,需要2位数才能将长期代表制编码成单数,只有3个四分制水平得到杠杆。因此,传统临时网络与标准2位模型的记忆消耗量和速度相似,但代表性能力更差。此外,TNN和全面精密网络之间在准确性方面仍然存在着巨大差距,阻碍了将其部署到真正的应用。为了应对这两项挑战,我们首先显示,在有些轻微的限制下,内部内部产品在计算上的复杂性可以降低2.倍,以缩小性差。第二,我们精心设计了一个与标准2位模型相适应的执行性永久化算法。拟议框架被称为快速和准确的Ternal Neur网络(FATNNNN) 。图像分类实验表明,我们的FATNNNN用一个显著的差幅超越了状态,在精确性上可以进一步分析。更重要的是,与各种精确性基准相比,对各种评估进行了进一步分析。