In principle, sparse neural networks should be significantly more efficient than traditional dense networks. Neurons in the brain exhibit two types of sparsity; they are sparsely interconnected and sparsely active. These two types of sparsity, called weight sparsity and activation sparsity, when combined, offer the potential to reduce the computational cost of neural networks by two orders of magnitude. Despite this potential, today's neural networks deliver only modest performance benefits using just weight sparsity, because traditional computing hardware cannot efficiently process sparse networks. In this article we introduce Complementary Sparsity, a novel technique that significantly improves the performance of dual sparse networks on existing hardware. We demonstrate that we can achieve high performance running weight-sparse networks, and we can multiply those speedups by incorporating activation sparsity. Using Complementary Sparsity, we show up to 100X improvement in throughput and energy efficiency performing inference on FPGAs. We analyze scalability and resource tradeoffs for a variety of kernels typical of commercial convolutional networks such as ResNet-50 and MobileNetV2. Our results with Complementary Sparsity suggest that weight plus activation sparsity can be a potent combination for efficiently scaling future AI models.
翻译:原则上,稀有的神经网络应该比传统的密集网络效率更高得多。大脑中的神经网络应该比传统的密集网络高得多。大脑中的神经网络表现出两种种类的宽度;它们彼此联系少,活动少。这两种种类的神经网络,即称为重量聚度和活性聚度,如果合并起来,有可能将神经网络的计算成本减少两个数量级。尽管存在这种潜力,但今天的神经网络仅能以仅仅重量偏重的方式带来微小的性能效益,因为传统的计算机硬件无法有效处理稀散网络。在文章中,我们引入了补充性平衡,这是一种新颖技术,大大改进了现有硬件上双重稀散网络的性能。我们证明,我们可以实现高性能运行重量吸散网络,而我们可以通过吸收活性聚度来增加这些加速度。我们利用补充性平衡,显示出在吞吐量和能源效率方面达到100X的改进,从而对FPGAGAs进行推断。我们分析了典型的商业革命网络(如ResNet-50和移动式Net2)组合模型的大小和资源交换。我们的结果显示,未来稳定性压力和震动性弹性的模型能够放大。