Having reliable specifications is an unavoidable challenge in achieving verifiable correctness, robustness, and interpretability of AI systems. Existing specifications for neural networks are in the paradigm of data as specification. That is, the local neighborhood centering around a reference input is considered to be correct (or robust). While existing specifications contribute to verifying adversarial robustness, a significant problem in many research domains, our empirical study shows that those verified regions are somewhat tight, and thus fail to allow verification of test set inputs, making them impractical for some real-world applications. To this end, we propose a new family of specifications called neural representation as specification, which uses the intrinsic information of neural networks - neural activation patterns (NAPs), rather than input data to specify the correctness and/or robustness of neural network predictions. We present a simple statistical approach to mining neural activation patterns. To show the effectiveness of discovered NAPs, we formally verify several important properties, such as various types of misclassifications will never happen for a given NAP, and there is no ambiguity between different NAPs. We show that by using NAP, we can verify a significant region of the input space, while still recalling 84% of the data on MNIST. Moreover, we can push the verifiable bound to 10 times larger on the CIFAR10 benchmark. Thus, we argue that NAPs can potentially be used as a more reliable and extensible specification for neural network verification.
翻译:具备可靠的规格是实现可核实的正确性、稳健性和可解释性AI系统的一个不可避免的挑战。神经网络的现有规格属于数据范式,即以参考输入为中心的地方邻居被视为正确(或稳健 ) 。虽然现有的规格有助于核查对抗性强性强性,在许多研究领域是一个重大问题,但我们的经验研究表明,这些经核实的区域有些紧张,因此无法核查测试输入,因此无法对测试输入进行实际应用的不切实际操作。为此,我们提议建立一个称为神经代表的新的规格系列,使用神经网络的内在信息-神经激活模式(NAPs),而不是输入数据来说明神经网络预测的正确性和/或稳健性。我们对采矿神经激活模式提出了简单的统计方法。为显示已发现的国家行动方案的有效性,我们正式核实了若干重要特性,例如,对于某个国家行动方案来说,各种分类错误永远不会发生,而不同的国家行动方案之间也不存在任何模糊之处。我们通过使用NAP,可以核实一个重要的输入空间区域,即神经激活模式(NAPS),而不是输入数据数据,因此,可以将一个更大的基准(IFAR)加以核查。