Efficient inference of Deep Neural Networks (DNNs) is essential to making AI ubiquitous. Two important algorithmic techniques have shown promise for enabling efficient inference - sparsity and binarization. These techniques translate into weight sparsity and weight repetition at the hardware-software level allowing the deployment of DNNs with critically low power and latency requirements. We propose a new method called signed-binary networks to improve further efficiency (by exploiting both weight sparsity and weight repetition) while maintaining similar accuracy. Our method achieves comparable accuracy on ImageNet and CIFAR10 datasets with binary and can lead to $>69\%$ sparsity. We observe real speedup when deploying these models on general-purpose devices. We show that this high percentage of unstructured sparsity can lead to a further ~2x reduction in energy consumption on ASICs with respect to binary.
翻译:深神经网络(DNN)的有效推论是使AI无处不在的关键。 两种重要的算法技术已经显示出促进高效推论的希望, 即聚度和二进制。 这些技术在硬件软件一级转化为重量宽度和重量重复, 使得能部署电量和潜伏性要求极低的DNNN。 我们提出了一种叫作签名二进制网络的新方法, 以进一步提高效率( 利用重量的宽度和重重重重重), 同时保持类似的精确性。 我们的方法在图像网和CIFAR10数据集中实现了类似的精度, 并实现了二进制, 并可以导致 $>69 $ $ $ $ oquclorsity。 我们在将这些模型部署在通用设备上时观察到真正的超速。 我们表明,这种高比例的无结构散度可以导致在二进制系统中进一步减少ASIC的能源消耗量 ~ 2x 。