Natural language understanding (NLU) has made massive progress driven by large benchmarks, paired with research on transfer learning to broaden its impact. Benchmarks are dominated by a small set of frequent phenomena, leaving a long tail of infrequent phenomena underrepresented. In this work, we reflect on the question: have transfer learning methods sufficiently addressed performance of benchmark-trained models on the long tail? Since benchmarks do not list included/excluded phenomena, we conceptualize the long tail using macro-level dimensions such as underrepresented genres, topics, etc. We assess trends in transfer learning research through a qualitative meta-analysis of 100 representative papers on transfer learning for NLU. Our analysis asks three questions: (i) Which long tail dimensions do transfer learning studies target? (ii) Which properties help adaptation methods improve performance on the long tail? (iii) Which methodological gaps have greatest negative impact on long tail performance? Our answers to these questions highlight major avenues for future research in transfer learning for the long tail. Lastly, we present a case study comparing the performance of various adaptation methods on clinical narratives to show how systematically conducted meta-experiments can provide insights that enable us to make progress along these future avenues.
翻译:自然语言理解(NLU)在大型基准推动下取得了巨大进展,同时对转让学习进行研究,以扩大其影响。基准以少量常见现象为主,使非经常现象的长尾尾部代表不足。在这项工作中,我们思考了以下问题:转让学习方法是否充分解决了经过基准训练的模型在长尾部上的表现?由于基准没有列出/排除现象,我们利用代表性不足的基因、专题等宏观层面来构思长尾部分。我们通过对100份关于转让学习的具有代表性的文件进行定性的元分析,评估了转让学习研究的趋势。我们的分析提出了三个问题:(一) 哪些长尾部分是转移学习的目标? (二) 哪些特性有助于改进长尾部的业绩? (三) 哪些方法差距对长尾部业绩产生最大的消极影响?我们对这些问题的回答突出了今后在转让学习方面进行长期尾部学习研究的主要途径。最后,我们提出一个案例研究,比较了临床叙述的各种适应方法的绩效,以显示如何系统地进行元研究,使我们能够沿着这些未来的道路取得进展。