Recent work has shown that deep learning models in NLP are highly sensitive to low-level correlations between simple features and specific output labels, leading to overfitting and lack of generalization. To mitigate this problem, a common practice is to balance datasets by adding new instances or by filtering out "easy" instances (Sakaguchi et al., 2020), culminating in a recent proposal to eliminate single-word correlations altogether (Gardner et al., 2021). In this opinion paper, we identify that despite these efforts, increasingly-powerful models keep exploiting ever-smaller spurious correlations, and as a result even balancing all single-word features is insufficient for mitigating all of these correlations. In parallel, a truly balanced dataset may be bound to "throw the baby out with the bathwater" and miss important signal encoding common sense and world knowledge. We highlight several alternatives to dataset balancing, focusing on enhancing datasets with richer contexts, allowing models to abstain and interact with users, and turning from large-scale fine-tuning to zero- or few-shot setups.
翻译:最近的工作表明,国家劳工局的深层次学习模式对简单特征和具体产出标签之间的低层次关联非常敏感,导致过度配置和缺乏笼统化。为了缓解这一问题,一个常见的做法是通过添加新实例或过滤“容易”实例来平衡数据集(Sakaguchi等人,2020年),最终提出最近的一项提案,即完全消除单词关联(Gardner等人,2021年)。在本意见文件中,我们发现,尽管做出了这些努力,但越来越强大的模型仍然不断利用越来越小的虚假关联,因此,甚至平衡所有单词特征都不足以缓解所有这些关联性。同时,一个真正平衡的数据集可能注定要“用浴水将婴儿扔出去”并错过重要的信号编码常识和世界知识。我们强调数据集平衡的几种替代方法,重点是在更富裕的环境中加强数据集,允许模型不使用和与用户互动,并从大规模微调转向零光或微小的组合。