RoMA:神经网络的强力衡量和评估方法 (RoMA: a Method for Neural Network Robustness Measurement and Assessment)

Neural network models have become the leading solution for a large variety of tasks, such as classification, language processing, protein folding, and others. However, their reliability is heavily plagued by adversarial inputs: small input perturbations that cause the model to produce erroneous outputs. Adversarial inputs can occur naturally when the system's environment behaves randomly, even in the absence of a malicious adversary, and are a severe cause for concern when attempting to deploy neural networks within critical systems. In this paper, we present a new statistical method, called Robustness Measurement and Assessment (RoMA), which can measure the expected robustness of a neural network model. Specifically, RoMA determines the probability that a random input perturbation might cause misclassification. The method allows us to provide formal guarantees regarding the expected frequency of errors that a trained model will encounter after deployment. Our approach can be applied to large-scale, black-box neural networks, which is a significant advantage compared to recently proposed verification methods. We apply our approach in two ways: comparing the robustness of different models, and measuring how a model's robustness is affected by the magnitude of input perturbation. One interesting insight obtained through this work is that, in a classification network, different output labels can exhibit very different robustness levels. We term this phenomenon categorial robustness. Our ability to perform risk and robustness assessments on a categorial basis opens the door to risk mitigation, which may prove to be a significant step towards neural network certification in safety-critical applications.

翻译：神经网络模型已成为大量任务的主要解决方案,如分类、语言处理、蛋白折叠等。然而,其可靠性受到对抗性投入的极大困扰:导致模型产生错误产出的输入扰动作用小,造成模型错误产出。当系统环境随机行动时,即使在没有恶意对手的情况下,反向投入自然发生。当试图在关键系统中部署神经网络时,它是一个令人严重关切的问题。在本文件中,我们提出了一个新的统计方法,称为强力度测量和评估(ROMA),它可以测量神经网络模型的预期稳健性。具体地说,RoMA决定随机输入扰动可能造成错误错误分类的可能性。这种方法使我们能够就经过训练的模型在部署后会遇到的错误的预期频率提供正式保证。我们的方法可以适用于大规模黑箱神经网络,这与最近提出的核查方法相比是一个重大优势。我们用两种方法来应用我们的方法:比较不同模型的稳健性应用,衡量模型的稳健性如何稳健性,以及测量模型的坚固性是如何稳健性,这个网络的准确性会受到不同程度的检验。