To achieve near-zero training error in a classification problem, the layers of a deep network have to disentangle the manifolds of data points with different labels, to facilitate the discrimination. However, excessive class separation can bring to overfitting since good generalisation requires learning invariant features, which involve some level of entanglement. We report on numerical experiments showing how the optimisation dynamics finds representations that balance these opposing tendencies with a non-monotonic trend. After a fast segregation phase, a slower rearrangement (conserved across data sets and architectures) increases the class entanglement. The training error at the inversion is remarkably stable under subsampling, and across network initialisations and optimisers, which characterises it as a property solely of the data structure and (very weakly) of the architecture. The inversion is the manifestation of tradeoffs elicited by well-defined and maximally stable elements of the training set, coined "stragglers", particularly influential for generalisation.
翻译:为了在分类问题中实现接近零的培训错误, 深网络的层层必须用不同标签解开数据点的方块, 以方便歧视。 但是, 过度的分类分解可以带来超常的适应, 因为良好的概括化需要学习不同的特点, 这需要一定程度的纠缠。 我们报告关于数字实验的情况, 表明优化动态如何发现这些对立趋势与非分子趋势之间的平衡。 在快速的隔离阶段之后, 较慢的重新排列( 跨数据集和结构的维护) 增加了分类的缠绕。 反转的训练错误在子抽样下非常稳定, 跨网络初始化和选取的训练错误也非常稳定, 将它定性为数据结构的属性和( 非常薄弱 ) 结构的属性。 偏移是训练集集、 硬币“ 分解器 ” 中由定义明确且极稳定的元素引发的折叠效应的表示。</s>