Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm that stability toward diffeomorphisms does not strongly correlate to performance on benchmark data sets of images. By contrast, we find that the stability toward diffeomorphisms relative to that of generic transformations $R_f$ correlates remarkably with the test error $\epsilon_t$. It is of order unity at initialization but decreases by several decades during training for state-of-the-art architectures. For CIFAR10 and 15 known architectures, we find $\epsilon_t\approx 0.2\sqrt{R_f}$, suggesting that obtaining a small $R_f$ is important to achieve good performance. We study how $R_f$ depends on the size of the training set and compare it to a simple model of invariant learning.
翻译:深网能够将数据分成大层面仍是一个挑战。 有人建议, 深网可以将数据分为大层面, 其方法是稳定地对二异体形态进行分类, 但现有的实证测量支持通常并非如此。 我们重新审视了这一问题, 定义了对二异体形态形态的最大有机分布, 从而可以研究特定规范的典型二异体形态。 我们确认, 向二异体形态的稳定性与基准数据集图像的性能没有强烈关联。 相反, 我们发现, 相对于普通变形的变形, 稳定地走向二异体形态, 与测试错误明显相关, $_f$( f) $( f) 。 我们的研究是, 初始化时的秩序统一, 而在最先进的结构培训期间减少了几十年。 对于 CIFAR10 和 15 个已知结构, 我们发现 $\ epsilon_ t\ approx 0.2\ sqrt{R_ f$ $( f) $ $( f) $), 表明, 获得一小R_ f$( f$) 对于取得一美元对于取得良好性表现很重要。 我们的研究是如何取决于一个简单的学习的模型的大小。 和比较。