Deep learning has revolutionized human society, yet the black-box nature of deep neural networks hinders further application to reliability-demanded industries. In the attempt to unpack them, many works observe or impact internal variables to improve the comprehensibility and invertibility of the black-box models. However, existing methods rely on intuitive assumptions and lack mathematical guarantees. To bridge this gap, we introduce Bort, an optimizer for improving model explainability with boundedness and orthogonality constraints on model parameters, derived from the sufficient conditions of model comprehensibility and invertibility. We perform reconstruction and backtracking on the model representations optimized by Bort and observe a clear improvement in model explainability. Based on Bort, we are able to synthesize explainable adversarial samples without additional parameters and training. Surprisingly, we find Bort constantly improves the classification accuracy of various architectures including ResNet and DeiT on MNIST, CIFAR-10, and ImageNet. Code: https://github.com/zbr17/Bort.
翻译:深层的学习使人类社会发生革命,但深神经网络的黑箱性质阻碍了进一步应用到有可靠性要求的行业。许多工作都观察或影响内部变数,以提高黑箱模型的可理解性和可视性。然而,现有方法依赖于直观的假设,缺乏数学保障。为了缩小这一差距,我们引入了博特,这是改进模型解释的优化器,模型的界限和孔径限制来自模型可理解性和可视性的充分条件。我们根据博特优化的模型显示进行重建和回溯,并观察到模型解释的明显改进。基于博特,我们可以在没有额外参数和培训的情况下合成可解释的对立样品。令人惊讶的是,我们发现博特不断提高各种结构的分类准确性,包括MNIST上的ResNet和DeiT、CIFAR-10和图像网络。代码:https://github.com/zbr17/Bort。</s>