Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that interpolates its training data will typically improve its generalization performance. Explaining the mechanism behind this ``benign overfitting'' in deep networks remains an outstanding challenge. Here, we study the last hidden layer representations of various state-of-the-art convolutional neural networks and find evidence for an underlying mechanism that we call "representation mitosis": if the last hidden representation is wide enough, its neurons tend to split into groups which carry identical information, and differ from each other only by a statistically independent noise. Like in a mitosis process, the number of such groups, or ``clones'', increases linearly with the width of the layer, but only if the width is above a critical value. We show that a key ingredient to activate mitosis is continuing the training process until the training error is zero.
翻译:深神经网络(DNNs) 无视经典的偏差偏差取舍: 给一个DNNN添加参数, 该DNN将对其培训数据进行内插, 通常会改善它的概括性表现。 解释这个深网络中“ 隐蔽的过度装配” 背后的机制仍是一个突出的挑战 。 在这里, 我们研究各种最先进的神经神经网络的最后隐藏层表示, 并找到一个我们称之为“ 代表性分裂” 的基本机制的证据 : 如果最后一个隐蔽的表达面足够宽, 其神经元往往会分裂成一个组, 它们含有相同的信息, 并且只有统计上独立的噪音才彼此不同 。 就像在一个线性分裂过程, 这些组的数目, 或“ 克隆”, 与层宽度一样, 线性地增加, 但只有当宽度超过一个关键值时。 我们显示, 激活分裂症的关键成分正在继续到培训错误为零为止 。