Recently, neural natural language models have attained state-of-the-art performance on a wide variety of tasks, but the high performance can result from superficial, surface-level cues (Bender and Koller, 2020; Niven and Kao, 2020). These surface cues, as the ``shortcuts'' inherent in the datasets, do not contribute to the *task-specific information* (TSI) of the classification tasks. While it is essential to look at the model performance, it is also important to understand the datasets. In this paper, we consider this question: Apart from the information introduced by the shortcut features, how much task-specific information is required to classify a dataset? We formulate this quantity in an information-theoretic framework. While this quantity is hard to compute, we approximate it with a fast and stable method. TSI quantifies the amount of linguistic knowledge modulo a set of predefined shortcuts -- that contributes to classifying a sample from each dataset. This framework allows us to compare across datasets, saying that, apart from a set of ``shortcut features'', classifying each sample in the Multi-NLI task involves around 0.4 nats more TSI than in the Quora Question Pair.
翻译:最近,神经自然语言模型在各种各样的任务中取得了最先进的表现,但高性能可以来自表面和表面层次的提示(Bender和Koller,2020年;Niven和Kao,2020年)。这些表面提示,作为数据集中“shortcuts”所固有的“shortcuts”,无助于分类任务中的“task ”特定信息* (TSI)。虽然观察模型性能至关重要,但理解数据集也很重要。在本文中,我们考虑这一问题:除了捷径特征带来的信息外,还需要多少具体任务的信息来对数据集进行分类?我们在一个信息理论框架中制定这个数量。虽然这个数量难以计算,但我们用一种快速和稳定的方法来将其接近。TSI量化了语言知识模块的数量,一套预先界定的捷径,有助于对每个数据集的样本进行分类。这个框架使我们能够对数据集进行交叉比较,说,除了一套“4.11”国家空间研究所的每个样本中每个样本都涉及“0.4 Q”的“多层次”特性外,除了一套“多层次”的“多层次”的“矩阵”外,我们还可以对每个数据集进行比较进行比较。