Overfitting is one of the fundamental challenges when training convolutional neural networks and is usually identified by a diverging training and test loss. The underlying dynamics of how the flow of activations induce overfitting is however poorly understood. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures. These novel explainable AI strategies reveal a surprising relationship between activation sparsity and overfitting, namely an increase in sparsity in the feature extraction layers shortly before the test loss starts rising. This tendency is preserved across network architectures and reguralisation strategies so that our measures can be used as a reliable indicator for overfitting while decoupling the network's generalisation capabilities from its loss-based definition. Moreover, our differentiable sparsity formulation can be used to explicitly penalise the emergence of sparsity during training so that the impact of reduced sparsity on overfitting can be studied in real-time. Applying this penalty and analysing activation sparsity for well known regularisers and in common network architectures supports the hypothesis that reduced activation sparsity can effectively improve the generalisation and classification performance. In line with other recent work on this topic, our methods reveal novel insights into the contradicting concepts of activation sparsity and network capacity by demonstrating that dense activations can enable discriminative feature learning while efficiently exploiting the capacity of deep models without suffering from overfitting, even when trained excessively.
翻译:在培训进化神经网络时,过度适应是基本挑战之一,通常通过不同的培训和测试损失来确定。激活流动如何导致过度适应的基本动态,尽管人们对此理解不甚清楚。在本研究中,我们引入了基于多层次的多元性定义,以得出和视觉化多层次的激活措施。这些新颖的可解释的AI战略揭示了激活过度与过度适应之间的惊人关系,即在测试损失开始上升之前不久地段地貌提取层的聚变性增加。这一趋势在网络架构和重新配置战略中得以保持,以便我们的措施可以用作过度适应的可靠指标,同时将网络的概括性能力与其基于损失的定义脱钩。此外,我们可使用不同的宽度配方程式来明确惩罚培训过程中出现的宽度,以便实时研究放松过度适应性的影响。应用这一处罚,分析为众所周知的深层次常规化和共同网络架构的激活性紧张性,从而支持这样的假设,即即使降低活性紧张性模型也能在不经过培训的情况下,有效地改进整个网络的概括性能力与基于损失定义的定义。此外,我们可使用可明确地展示更新的催化性思维的模型,通过其他方法,从而展示新的催化催化模型。