In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition (i.e., less generalizable), so that one cannot prevent a model from co-adapting on such (so-called) "shortcut" signals: this makes the model fragile in various distribution shifts. To bypass such failure modes, we consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. This motivates us to extend the standard information bottleneck to additionally model the nuisance information. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training concerning both convolutional- and Transformer-based architectures. Our experimental results show that the proposed scheme improves robustness of learned representations (remarkably without using any domain-specific knowledge), with respect to multiple challenging reliability measures. For example, our model could advance the state-of-the-art on a recent challenging OBJECTS benchmark in novelty detection by $78.4\% \rightarrow 87.2\%$ in AUROC, while simultaneously enjoying improved corruption, background and (certified) adversarial robustness. Code is available at https://github.com/jh-jeong/nuisance_ib.
翻译:在训练数据有限的实际应用场景中,数据中的许多预测信号可能更多地来自数据获取中的某些偏差(即,具有较差的一般化能力),因此无法防止模型在这些所谓的“捷径”信号上共同适应,这使得模型在各种分布偏移中变得脆弱。为了避免这种失败模式,我们考虑在互信息约束下的对抗威胁模型,以涵盖更广泛的训练扰动类别。这促使我们扩展标准的信息瓶颈以额外地建模无用信息。我们提出了基于自动编码器的训练来实现该目标,以及实用的编码器设计,以促进所提出的关于卷积和Transformer类别的混合判别式-生成式训练。我们的实验结果表明,所提出的方案提高了学习表示的鲁棒性(显着地没有使用任何特定领域的知识),与多个具有挑战性的可靠性量度有关。例如,我们的模型可以在AUROC方面将最近挑战性的OBJECTS基准测试中的先进水平从$78.4\%$提高到$87.2\%$,同时享受着改进后的污染固化,背景和对抗鲁棒性。代码可在https://github.com/jh-jeong/nuisance_ib上获得。