Having reliable specifications is an unavoidable challenge in achieving verifiable correctness, robustness, and interpretability of AI systems. Existing specifications for neural networks are in the paradigm of data as specification. That is, the local neighborhood centering around a reference input is considered to be correct (or robust). While existing specifications contribute to verifying adversarial robustness, a significant problem in many research domains, our empirical study shows that those verified regions are somewhat tight, and thus fail to allow verification of test set inputs, making them impractical for some real-world applications. To this end, we propose a new family of specifications called neural representation as specification, which uses the intrinsic information of neural networks - neural activation patterns (NAPs), rather than input data to specify the correctness and/or robustness of neural network predictions. We present a simple statistical approach to mining neural activation patterns. To show the effectiveness of discovered NAPs, we formally verify several important properties, such as various types of misclassifications will never happen for a given NAP, and there is no ambiguity between different NAPs. We show that by using NAP, we can verify a significant region of the input space, while still recalling 84% of the data on MNIST. Moreover, we can push the verifiable bound to 10 times larger on the CIFAR10 benchmark. Thus, we argue that NAPs can potentially be used as a more reliable and extensible specification for neural network verification.
翻译:向可靠的神经网络说明书迈进
拥有可靠的说明书是实现可验证的正确性、鲁棒性和可解释性的人工智能系统所面临的无法避免的挑战。现有的神经网络说明书都是基于数据的,即认为以参考输入为中心的局部邻域是正确的(或鲁棒的)。虽然现有的说明书有助于验证对抗鲁棒性,这是许多研究领域的一个重要的难题,但我们的实证研究表明,这些验证区域有些紧密,因此无法验证测试集数据,使得它们对于某些实际应用来说是不切实际的。因此,我们提出了一种新的说明书族,名为神经表示作为说明书,它使用神经网络的内在信息——神经激活模式 (NAPs),而不是输入数据来说明神经网络预测的正确性和/或鲁棒性。我们提出了一种简单的统计方法来挖掘神经激活模式。为了展示发现的 NAPs 的有效性,我们正式验证了几个重要的属性,例如对于给定的 NAP,各种类型的分类错误永远不会发生,并且不会存在不同 NAP 之间的歧义。我们展示了通过使用 NAP,我们可以验证输入空间的一个重要部分,同时在 MNIST 数据集上保留 84% 的数据。此外,我们可以将可验证边界推到 CIFAR10 基准测试的10倍,因此我们认为 NAPs 可以潜在地作为神经网络验证更可靠且可扩展的规范。