Piecewise linear neural networks can be split into subfunctions, each with its own activation pattern, domain, and empirical error. Empirical error for the full network can be written as an expectation over empirical error of subfunctions. Constructing a generalization bound on subfunction empirical error indicates that the more densely a subfunction is surrounded by training samples in representation space, the more reliable its predictions are. Further, it suggests that models with fewer activation regions generalize better, and models that abstract knowledge to a greater degree generalize better, all else equal. We propose not only a theoretical framework to reason about subfunction error bounds but also a pragmatic way of approximately evaluating it, which we apply to predicting which samples the network will not successfully generalize to. We test our method on detection of misclassification and out-of-distribution samples, finding that it performs competitively in both cases. In short, some network activation patterns are associated with higher reliability than others, and these can be identified using subfunction error bounds.
翻译:小片线性神经网络可以分为子功能, 每一个都有自己的激活模式、 域和经验错误。 整个网络的经验错误可以写成, 作为对子功能的经验错误的预期。 构建一个对子功能的经验错误的概括化, 实验性错误表明, 代表空间的培训样本环绕的子功能密度越强, 其预测就越可靠。 此外, 它表明, 激活区域较少的模型一般化得更好, 以及抽象知识在更大程度上更普遍化的模型。 我们不仅建议一个理论框架来解释子功能错误的界限, 而且还建议一种对它进行实际评估的方法, 我们用这个框架来预测网络的哪些样本不会成功概括化。 我们测试我们的分类错误和分配外的样本方法, 发现它在两种情况下都具有竞争性。 简而言之, 一些网络激活模式与其它模式的可靠性更高, 并且可以用子功能错误的界限来识别这些模式。