As NLP models achieved state-of-the-art performances over benchmarks and gained wide applications, it has been increasingly important to ensure the safe deployment of these models in the real world, e.g., making sure the models are robust against unseen or challenging scenarios. Despite robustness being an increasingly studied topic, it has been separately explored in applications like vision and NLP, with various definitions, evaluation and mitigation strategies in multiple lines of research. In this paper, we aim to provide a unifying survey of how to define, measure and improve robustness in NLP. We first connect multiple definitions of robustness, then unify various lines of work on identifying robustness failures and evaluating models' robustness. Correspondingly, we present mitigation strategies that are data-driven, model-driven, and inductive-prior-based, with a more systematic view of how to effectively improve robustness in NLP models. Finally, we conclude by outlining open challenges and future directions to motivate further research in this area.
翻译:由于国家实验室方案模型比基准取得了最先进的业绩,并获得了广泛的应用,确保这些模型在现实世界中安全部署越来越重要,例如,确保这些模型对不可见或具有挑战性的情况具有很强的可靠性。尽管强健性是一个日益研究的专题,但在愿景和国家实验室方案等应用中单独探讨了这一模型,在多种研究领域提出了各种定义、评价和缓解战略。在本文件中,我们的目标是对如何界定、衡量和改进国家实验室方案的稳健性进行统一调查。我们首先将多种强健性定义联系起来,然后统一确定稳健性失败和评估模型稳健性的各种工作方针。我们相应地提出了以数据为驱动、模式驱动和以感性为主的减缓战略,更系统地审视如何有效地提高国家实验室方案模型的稳健性。最后,我们通过概述公开的挑战和未来的方向来鼓励这一领域的进一步研究。