Deep neural networks for natural language processing are fragile in the face of adversarial examples -- small input perturbations, like synonym substitution or word duplication, which cause a neural network to change its prediction. We present an approach to certifying the robustness of LSTMs (and extensions of LSTMs) and training models that can be efficiently certified. Our approach can certify robustness to intractably large perturbation spaces defined programmatically in a language of string transformations. Our evaluation shows that (1) our approach can train models that are more robust to combinations of string transformations than those produced using existing techniques; (2) our approach can show high certification accuracy of the resulting models.
翻译:自然语言处理的深神经网络在面对对抗性例子时是脆弱的 -- -- 输入的微小扰动,如同义代词或重复字词等,导致神经网络改变其预测。我们提出了一个验证LSTMs(和LSTMs的扩展)的稳健性以及能够有效认证的培训模式的方法。我们的方法可以证明对用弦转换语言以方案方式界定的大规模扰动空间的稳健性。我们的评估表明:(1) 我们的方法可以培训比使用现有技术生成的模式更强大的组合弦转换模型;(2) 我们的方法可以显示由此产生的模型的高度认证准确性。