We present a novel global compression framework for deep neural networks that automatically analyzes each layer to identify the optimal per-layer compression ratio, while simultaneously achieving the desired overall compression. Our algorithm hinges on the idea of compressing each convolutional (or fully-connected) layer by slicing its channels into multiple groups and decomposing each group via low-rank decomposition. At the core of our algorithm is the derivation of layer-wise error bounds from the Eckart Young Mirsky theorem. We then leverage these bounds to frame the compression problem as an optimization problem where we wish to minimize the maximum compression error across layers and propose an efficient algorithm towards a solution. Our experiments indicate that our method outperforms existing low-rank compression approaches across a wide range of networks and data sets. We believe that our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks. Our code is available at https://github.com/lucaslie/torchprune.
翻译:我们为深神经网络提出了一个新的全球压缩框架,它自动分析每一层,以确定最佳的每个层压缩比率,同时实现理想的整体压缩率。我们的算法取决于通过将每个卷发层(或完全连接的)层分解成多组,并通过低声分解分解将每个组分解成不同的构思。我们的算法核心是从埃卡特·杨·米尔斯基理论中衍生出多层错误的界限。然后,我们利用这些界限将压缩问题作为一个优化问题,我们希望将各层的最大压缩误差降到最低,并提出一个高效的解析法。我们的实验表明,我们的方法超越了在广泛的网络和数据集中现有的低声压缩法。我们相信,我们的结果为未来研究现代神经网络的全球性能大小交易开辟了新的途径。我们的代码可以在 https://github.com/lucaslie/torchprune查阅。