Deep neural networks have been successfully applied to a broad range of problems where overparametrization yields weight matrices which are partially random. A comparison of weight matrix singular vectors to the Porter-Thomas distribution suggests that there is a boundary between randomness and learned information in the singular value spectrum. Inspired by this finding, we introduce an algorithm for noise filtering, which both removes small singular values and reduces the magnitude of large singular values to counteract the effect of level repulsion between the noise and the information part of the spectrum. For networks trained in the presence of label noise, we indeed find that the generalization performance improves significantly due to noise filtering.
翻译:深神经网络被成功地应用于一系列广泛的问题,在这些问题中,过度对称产生部分随机的重量矩阵。将重量矩阵单向矢量与波特-托马斯分布的比较表明,随机性和在单值频谱中所学到的信息之间有界线。受这一发现的影响,我们引入了噪音过滤算法,它既消除了小的单值,又减少了大单值的大小,以抵消噪音与频谱中信息部分之间水平反射的影响。对于在有标签噪音的情况下受过训练的网络,我们确实发现,由于噪音过滤,一般化的性能有显著改善。