The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisation. In this paper, we lay the groundwork to address both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they investigate, the type of data shift they consider, the source of this data shift, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 papers that test generalisation, for a total of more than 600 individual experiments. Considering the results of this review, we present an in-depth analysis that maps out the current state of generalisation research in NLP, and we make recommendations for which areas might deserve attention in the future. Along with this paper, we release a webpage where the results of our review can be dynamically explored, and which we intend to update as new NLP generalisation studies are published. With this work, we aim to take steps towards making state-of-the-art generalisation testing the new status quo in NLP.
翻译:全面分析能力是自然语言处理的主要延伸(NLP)之一。然而,“良好概括”意味着什么以及应该如何评价“良好概括”没有很好地理解,也没有为概括化制定任何评价标准。在本文件中,我们为处理这两个问题打下了基础。我们提出了一个分类法,用于描述和理解NLP的概括化研究。我们的分类法基于对总体化研究的广泛文献审查,并包含五个轴心,研究可能有所不同:研究的主要动机、一般化的类型、它们所调查的数据变化类型、它们考虑的数据变化的种类、数据变化的来源以及建模管道的转移中心。我们利用我们的分类法对400多份测试概括化的论文进行分类,以总共600多项个人实验。我们提出深入分析,根据全国语言规划局目前的总体化研究状况,我们提出今后哪些领域值得注意的建议。我们除了本文件之外,还公布一个网页,使我们的审查结果能够成为动态化的NP阶段,我们打算对总体的状态进行更新。