Deep neural networks suffer from poor generalization to unseen environments when the underlying data distribution is different from that in the training set. By learning minimum sufficient representations from training data, the information bottleneck (IB) approach has demonstrated its effectiveness to improve generalization in different AI applications. In this work, we propose a new neural network-based IB approach, termed gated information bottleneck (GIB), that dynamically drops spurious correlations and progressively selects the most task-relevant features across different environments by a trainable soft mask (on raw features). GIB enjoys a simple and tractable objective, without any variational approximation or distributional assumption. We empirically demonstrate the superiority of GIB over other popular neural network-based IB approaches in adversarial robustness and out-of-distribution (OOD) detection. Meanwhile, we also establish the connection between IB theory and invariant causal representation learning, and observed that GIB demonstrates appealing performance when different environments arrive sequentially, a more practical scenario where invariant risk minimization (IRM) fails. Code of GIB is available at https://github.com/falesiani/GIB
翻译:深神经网络由于基本数据分布不同于培训中的数据分布,在向无形环境中的深度神经网络遭受了与培训中的数据分布不同的不完全的概括化和隐蔽环境。信息瓶颈(IB)方法通过从培训数据中学习到最低限度的充分表述,证明它能有效地改进不同AI应用中的一般化。在这项工作中,我们提出了一个新的以神经网络为基础的IB方法,称为封闭式信息瓶颈(GIB),动态地降低虚假的关联,并逐步通过可训练的软面罩(原始特征)在不同环境中选择最与任务相关的特征。GIB享有一个简单和可移动的目标,没有任何变近似或分布式的假设。我们从经验上证明了GIB在对抗性强力和分配外检测中优于其他流行性神经网络IB方法。与此同时,我们还建立了IB理论与不易变因果表现学习之间的联系,并指出,GIB在不同的环境到达时会显示有吸引力的表现,这是一种比较实际的情景,即最小化风险(IRM)无法做到。GIB的代码可在https://github.com/falelilisianianianiani/GIB。