The process of aggregation is ubiquitous in almost all deep nets models. It functions as an important mechanism for consolidating deep features into a more compact representation, whilst increasing robustness to overfitting and providing spatial invariance in deep nets. In particular, the proximity of global aggregation layers to the output layers of DNNs mean that aggregated features have a direct influence on the performance of a deep net. A better understanding of this relationship can be obtained using information theoretic methods. However, this requires the knowledge of the distributions of the activations of aggregation layers. To achieve this, we propose a novel mathematical formulation for analytically modelling the probability distributions of output values of layers involved with deep feature aggregation. An important outcome is our ability to analytically predict the KL-divergence of output nodes in a DNN. We also experimentally verify our theoretical predictions against empirical observations across a range of different classification tasks and datasets.
翻译:集成过程几乎在所有深网模型中都是无处不在的,它作为将深层特征整合为更为紧凑的表述方式的重要机制,同时提高深网超配和提供空间差异性能的力度,特别是全球集成层与DNN输出层的距离,意味着集成层与DNN输出层的距离直接影响到深网的性能。利用信息理论方法可以更好地了解这一关系。然而,这需要了解集成层激活的分布。为此,我们提出一个新的数学公式,用于分析模拟与深层集成有关的层产出值的概率分布。一个重要的结果是,我们有能力分析预测DNNN输出节点的KL-维特朗度。我们还根据不同分类任务和数据集的经验观测,对我们的理论预测进行了实验性核查。