平均路径长度: 非线条的分解创建令人惊讶的浅浅网络 (Average Path Length: Sparsification of Nonlinearties Creates Surprisingly Shallow Networks)

We perform an empirical study of the behaviour of deep networks when pushing its activation functions to become fully linear in some of its feature channels through a sparsity prior on the overall number of nonlinear units in the network. To measure the depth of the resulting partially linearized network, we compute the average number of active nonlinearities encountered along a path in the network graph. In experiments on CNNs with sparsified PReLUs on typical image classification tasks, we make several observations: Under sparsity pressure, the remaining nonlinear units organize into distinct structures, forming core-networks of near constant effective depth and width, which in turn depend on task difficulty. We consistently observe a slow decay of performance with depth until the onset of a rapid collapse in accuracy, allowing for surprisingly shallow networks at moderate losses in accuracy that outperform base-line networks of similar depth, even after increasing width to a comparable number of parameters. In terms of training, we observe a nonlinear advantage: Reducing nonlinearity after training leads to a better performance than before, in line with previous findings in linearized training, but with a gap depending on task difficulty that vanishes for easy problems.

翻译：我们从经验上研究深网络在推动其激活功能时的行为,通过网络中非线性单位总数之前的宽度,通过网络中非线性单位的总数,将某些特征频道完全线性化。为了测量由此形成的部分线性网络的深度,我们计算了在网络图中一条路径上遇到的主动非线性网络的平均数量。在有线性网络的实验中,对典型图像分类任务进行了松散的PRELUs,我们做了几项观察:在紧张的压力下,剩余的非线性单位组织成不同的结构,形成接近持续有效深度和宽度的核心网络,而这又取决于任务难度。我们不断观察到深度的性能缓慢衰减,直至准确性迅速崩溃,使得光线性网络的精度大大超过类似基线网络的精度,即使宽度增加至相似的参数数量,但就培训而言,我们观察到一个非线性优势:按照线性培训的结果,减少培训后的非线性性,比以前更好的性能,与线性化培训以前的调查结果一致,但差距取决于任务难度,容易消失的问题。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日