The issue of shortcut learning is widely known in NLP and has been an important research focus in recent years. Unintended correlations in the data enable models to easily solve tasks that were meant to exhibit advanced language understanding and reasoning capabilities. In this survey paper, we focus on the field of machine reading comprehension (MRC), an important task for showcasing high-level language understanding that also suffers from a range of shortcuts. We summarize the available techniques for measuring and mitigating shortcuts and conclude with suggestions for further progress in shortcut research. Most importantly, we highlight two main concerns for shortcut mitigation in MRC: the lack of public challenge sets, a necessary component for effective and reusable evaluation, and the lack of certain mitigation techniques that are prominent in other areas.
翻译:近路学习问题在国家劳工规划中广为人知,近年来一直是一个重要的研究焦点。数据中的意外关联使模型能够使各种模型能够轻松地解决旨在展示先进语言理解和推理能力的任务。在本调查文件中,我们侧重于机器阅读理解领域,这是展示高层次语言理解的重要任务,也存在一系列捷径。我们总结了衡量和减轻捷径的现有技术,并在结论中提出了在捷径研究方面取得进一步进展的建议。最重要的是,我们强调了移动研究中心减少捷径的两个主要关切:缺乏成套公共挑战,这是有效和可重复使用评价的必要组成部分,以及缺乏在其他领域十分突出的某些缓解技术。