A well-known failure mode of neural networks is that they may confidently return erroneous predictions. Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model's internal activation patterns. Specifically, we assign to each class a unique, fixed, randomly-generated binary vector - hereafter called class code - and train the model so that its cross-depths activation patterns predict the appropriate class code according to the input sample's class. The resulting predictors are dubbed Total Activation Classifiers (TAC), and TACs may either be trained from scratch, or used with negligible cost as a thin add-on on top of a frozen, pre-trained neural network. The distance between a TAC's activation pattern and the closest valid code acts as an additional confidence score, besides the default unTAC'ed prediction head's. In the add-on case, the original neural network's inference head is completely unaffected (so its accuracy remains the same) but we now have the option to use TAC's own confidence and prediction when determining which course of action to take in an hypothetical production workflow. In particular, we show that TAC strictly improves the value derived from models allowed to reject/defer. We provide further empirical evidence that TAC works well on multiple types of architectures and data modalities and that it is at least as good as state-of-the-art alternative confidence scores derived from existing models.
翻译:神经网络的一个众所周知的失败模式是它们可能会自信地返回错误的预测结果。当使用案例略有不同,或者存在对手时,这种不安全的行为尤其频繁。本研究提出了一种新的方向,以广泛,通用的方式解决这些问题:在模型的内部激活模式上强加类感知约束。具体而言,我们为每个类分配一个独特的、固定的、随机生成的二进制向量——以下简称类代码——并训练模型,使其跨深度的激活模式根据输入样本的类别预测相应的类代码。所得预测器被称为“总激活分类器” (Total Activation Classifiers,TAC)。TAC 可以从头开始训练,也可以在已冻结的预训练神经网络上用极小的成本作为薄的附加组件使用。TAC 激活模式与最接近的有效代码之间的距离充当额外的置信度分数,除了默认的不带 TAC 的预测头之外。在附加组件的情况下,原始神经网络的推断头完全不受影响 (因此其准确性保持不变),但我们现在可以选择使用 TAC 的自己的置信度和预测结果,以确定在假设的生产流程中要采取哪种行动。特别地,我们展示了 TAC 严格提高了支持拒绝推理的模型的价值。我们进一步提供了实证证据,表明 TAC 在多种类型的体系结构和数据模态上表现良好,并且它至少与基于现有模型的最新替代置信度分数一样好。