Explaining generalizations and preventing over-confident predictions are central goals of studies on the loss landscape of neural networks. Flatness, defined as loss invariability on perturbations of a pre-trained solution, is widely accepted as a predictor of generalization in this context. However, the problem that flatness and generalization bounds can be changed arbitrarily according to the scale of a parameter was pointed out, and previous studies partially solved the problem with restrictions: Counter-intuitively, their generalization bounds were still variant for the function-preserving parameter scaling transformation or limited only to an impractical network structure. As a more fundamental solution, we propose new prior and posterior distributions invariant to scaling transformations by \textit{decomposing} the scale and connectivity of parameters, thereby allowing the resulting generalization bound to describe the generalizability of a broad class of networks with the more practical class of transformations such as weight decay with batch normalization. We also show that the above issue adversely affects the uncertainty calibration of Laplace approximation and propose a solution using our invariant posterior. We empirically demonstrate our posterior provides effective flatness and calibration measures with low complexity in such a practical parameter transformation case, supporting its practical effectiveness in line with our rationale.
翻译:解释一般化和防止过度自信预测是神经网络损失情况研究的中心目标。 简单化被定义为对预先训练的解决方案的扰动造成损失的易变性,被广泛接受为这方面的一般化预测。然而,据指出,平化和一般化界限可根据参数的尺度任意改变的问题,以及先前的研究通过限制部分解决了问题: 反直觉,其一般化界限仍然是功能保留参数缩放变异体,或仅限于不切实际的网络结构。作为更根本的解决办法,我们提议新的先前和后部分布不易以\textit{decostrating}参数的规模和连通性来扩大变异体,从而使得由此产生的普遍化能够描述一大批网络的可概括性,而更实际的变异性,如重量腐蚀和分级正常化。我们还表明,上述问题对Laplace的不确定性校准误差率产生了不利影响,并提出了使用我们弹性后台结构的解决办法。作为更根本的解决办法,我们提出新的先前和后部分配的变异性,我们提出了新的前置变异性前和后变体分布式的变体,我们用低的参数校准提供了实际的校准参数的校准标准。我们用低的校准的精确校准提供了实际的校准的校准。