In this paper, we analyze several neural network designs (and their variations) for sentence pair modeling and compare their performance extensively across eight datasets, including paraphrase identification, semantic textual similarity, natural language inference, and question answering tasks. Although most of these models have claimed state-of-the-art performance, the original papers often reported on only one or two selected datasets. We provide a systematic study and show that (i) encoding contextual information by LSTM and inter-sentence interactions are critical, (ii) Tree-LSTM does not help as much as previously claimed but surprisingly improves performance on Twitter datasets, (iii) the Enhanced Sequential Inference Model is the best so far for larger datasets, while the Pairwise Word Interaction Model achieves the best performance when less data is available. We release our implementations as an open-source toolkit.
翻译:在本文中,我们分析对句建模的若干神经网络设计(及其变异),并在八个数据集中广泛比较其性能,包括参数识别、语义文字相似性、自然语言推论和答题任务。虽然这些模型大多声称其性能最先进,但原始文件往往只报告一两个选定的数据集。我们提供系统研究,并显示:(一) LSTM的编码背景信息以及同理互动至关重要,(二) 树-LSTM没有以前声称的那么有用,但令人惊讶地改善了Twitter数据集的性能,(三) 强化序列推断模型是大数据集迄今为止的最佳方法,而“有孔的单词互动模型”则在数据较少时取得最佳的性能。我们作为开放源工具公布了我们的实施方法。