A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy does not degrade when the network's width is increased. Recent evidence suggests that developing compressible representations is key for adjusting the complexity of large networks to the learning task at hand. However, these compressible representations are poorly understood. A promising strand of research inspired from biology is understanding representations at the unit level as it offers a more granular and intuitive interpretation of the neural mechanisms. In order to better understand what facilitates increases in width without decreases in accuracy, we ask: Are there mechanisms at the unit level by which networks control their effective complexity as their width is increased? If so, how do these depend on the architecture, dataset, and training parameters? We identify two distinct types of "frivolous" units that proliferate when the network's width is increased: prunable units which can be dropped out of the network without significant change to the output and redundant units whose activities can be expressed as a linear combination of others. These units imply complexity constraints as the function the network represents could be expressed by a network without them. We also identify how the development of these units can be influenced by architecture and a number of training factors. Together, these results help to explain why the accuracy of DNNs does not degrade when width is increased and highlight the importance of frivolous units toward understanding implicit regularization in DNNs.
翻译:过度量化的深神经网络(DNNs)的一个显著特点是,当网络宽度增加时,其准确性不会降低,因为网络宽度增加时,网络的准确性不会降低。最近的证据显示,发展压缩的表述方式对于调整大型网络的复杂性以适应手头的学习任务至关重要。然而,这些压缩的表述方式却不甚为人理解。生物学激发的令人充满希望的研究内容是理解单位一级的表述方式,因为它对神经机制的输出和冗余部分没有显著的改变,因此,可以将神经机制从网络上删除。为了更好地了解什么可以促进宽度的增加,而不会降低准确性,我们问:在单位一级是否有机制使网络控制其有效复杂性,因为网络的宽度增加?如果有,这些压缩的表述方式如何取决于结构、数据集和培训参数?然而,这些压缩的表述方式是调整式表述方式的关键。我们还要指出,当网络没有显示网络能显示其功能的复杂程度时,在网络上如何控制其有效复杂性?当网络的网络能影响常规性时,如何使数据库的准确性得到更高的程度,我们还要解释这些稳定的结构如何影响。