Benchmark datasets play an important role in evaluating Natural Language Understanding (NLU) models. However, shortcuts -- unwanted biases in the benchmark datasets -- can damage the effectiveness of benchmark datasets in revealing models' real capabilities. Since shortcuts vary in coverage, productivity, and semantic meaning, it is challenging for NLU experts to systematically understand and avoid them when creating benchmark datasets. In this paper, we develop a visual analytics system, ShortcutLens, to help NLU experts explore shortcuts in NLU benchmark datasets. The system allows users to conduct multi-level exploration of shortcuts. Specifically, Statistics View helps users grasp the statistics such as coverage and productivity of shortcuts in the benchmark dataset. Template View employs hierarchical and interpretable templates to summarize different types of shortcuts. Instance View allows users to check the corresponding instances covered by the shortcuts. We conduct case studies and expert interviews to evaluate the effectiveness and usability of the system. The results demonstrate that ShortcutLens supports users in gaining a better understanding of benchmark dataset issues through shortcuts, inspiring them to create challenging and pertinent benchmark datasets.
翻译:基准数据集在评估自然语言理解(NLU)模型方面起着重要作用。然而,捷径 -- -- 基准数据集中不必要的偏差 -- -- 可能会损害基准数据集在揭示模型真实能力方面的效力。由于捷径在覆盖面、生产率和语义含义上各有不同,因此对于国家语言组专家来说,在创建基准数据集时,很难系统地理解和避免这些捷径。在本文件中,我们开发了一个视觉分析系统,即快捷路,以帮助国家语言组专家探索国家语言组基准数据集中的捷径。该系统允许用户进行多层次的捷径探索。具体地说,统计视图帮助用户掌握统计数据,例如基准数据集中捷径的覆盖面和生产率。模板视图使用等级和可解释的模板来总结不同类型的捷径。实例视图使用户能够检查快捷键所涵盖的相应实例。我们进行了案例研究和专家访谈,以评价系统的有效性和可用性。结果显示, SwitchLens支持用户通过捷径更好地了解基准数据集问题,从而创建具有挑战性和相关性的基准数据集。