具有固定蝴蝶结构的浅状线性网络:理论和实践 (Sparse Linear Networks with a Fixed Butterfly Structure: Theory and Practice)

Fast Fourier transform, Wavelets, and other well-known transforms in signal processing have a structured representation as a product of sparse matrices which are referred to as butterfly structures. Research in the recent past have used such structured linear networks along with randomness as pre-conditioners to improve the computational performance of large scale linear algebraic operations. With the advent of deep learning and AI and the computational efficiency of such structured matrices, it is natural to study sparse linear deep networks in which the location of the non-zero weights are predetermined by the butterfly structure. This work studies, both theoretically and empirically, the feasibility of training such networks in different scenarios. Unlike convolutional neural networks, which are structured sparse networks designed to recognize local patterns in lattices representing a spatial or a temporal structure, the butterfly architecture used in this work can replace any dense linear operator with a gadget consisting of a sequence of logarithmically (in the network width) many sparse layers, containing a total of near linear number of weights. This improves on the quadratic number of weights required in a standard dense layer, with little compromise in expressibility of the resulting operator. We show in a collection of empirical experiments that our proposed architecture not only produces results that match and often outperform existing known architectures, but it also offers faster training and prediction in deployment. This empirical phenomenon is observed in a wide variety of experiments that we report, including both supervised prediction on NLP and vision data, as well as in unsupervised representation learning using autoencoders. Preliminary theoretical results presented in the paper explain why training speed and outcome are not compromised by our proposed approach.

翻译：在信号处理中,快速的Fleier变换、Wavelets和其他众所周知的变异器具有结构化的表述方式,作为稀释矩阵的产物,被称为蝴蝶结构。最近的研究使用了结构化线性网络以及随机性网络,作为改进大规模线性代数操作的计算性能的预设条件。随着深入学习和AI的到来,以及这种结构化矩阵的计算效率的出现,研究稀薄线性深度网络是自然的,其中非零重量的位置由蝴蝶结构决定。从理论上和经验上看,这种网络在不同的情景中培训这些网络的可行性。与结构化的神经神经神经网络不同,这些网络是结构化的分散网络,旨在识别代表空间或时间结构的本地模式。随着深度学习和人工智能的出现,这种结构可以取代任何密度的线性线性操作者。随着网络宽度(在网络宽度中)许多稀薄层层的定位,包含近线性重量的总数。这在理论上改进了标准密度层中所需的二次重量的重量数量。这种理论性研究的结果不是那么容易理解的,而是用精确性结构来解释,而很少妥协性地解释, 也就是的实验性的研究结果, 也是我们所研究的结果, ——在实验性地分析的实验性地分析中, —— —— 也就是的实验性地分析的实验性地分析中, —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— 在-- —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— —— ——