The task of abductive natural language inference (\alpha{}nli), to decide which hypothesis is the more likely explanation for a set of observations, is a particularly difficult type of NLI. Instead of just determining a causal relationship, it requires common sense to also evaluate how reasonable an explanation is. All recent competitive systems build on top of contextualized representations and make use of transformer architectures for learning an NLI model. When somebody is faced with a particular NLI task, they need to select the best model that is available. This is a time-consuming and resource-intense endeavour. To solve this practical problem, we propose a simple method for predicting the performance without actually fine-tuning the model. We do this by testing how well the pre-trained models perform on the \alpha{}nli task when just comparing sentence embeddings with cosine similarity to what the performance that is achieved when training a classifier on top of these embeddings. We show that the accuracy of the cosine similarity approach correlates strongly with the accuracy of the classification approach with a Pearson correlation coefficient of 0.65. Since the similarity computation is orders of magnitude faster to compute on a given dataset (less than a minute vs. hours), our method can lead to significant time savings in the process of model selection.
翻译:绑架性自然语言推断(\ alpha ⁇ ⁇ nli)的任务是决定哪一种假设更可能是对一套观察的更可能的解释,这是特别困难的NLI类型。它不是仅仅确定因果关系,而是需要常识来评估解释的合理性。所有最近的竞争性系统都建立在背景化的表达方式之上,并利用变压器结构来学习NLI模型。当某人面临特定的国家LI任务时,他们需要选择现有的最佳模式。这是一个耗时和资源密集的工作。为了解决这一实际问题,我们提出了一个简单的方法来预测性能而不实际微调模型。我们这样做的方法是测试预先训练过的模型在\alpha ⁇ nli任务上的表现如何。我们只是将句子嵌入于连接到连接这些嵌入式顶部的分级器时所取得的性能相似性。我们显示,类似方法的准确性与分类方法的准确性与Pearson相关系数(0.65)的精确性是紧密的,因为我们在相当的时程中选择一个快速的精度数据,因此,在选择一个不长的时程中可以快速的计算方法。