Natural Language Inference (NLI) is a hot topic research in natural language processing, contradiction detection between sentences is a special case of NLI. This is considered a difficult NLP task which has a big influence when added as a component in many NLP applications, such as Question Answering Systems, text Summarization. Arabic Language is one of the most challenging low-resources languages in detecting contradictions due to its rich lexical, semantics ambiguity. We have created a data set of more than 12k sentences and named ArNLI, that will be publicly available. Moreover, we have applied a new model inspired by Stanford contradiction detection proposed solutions on English language. We proposed an approach to detect contradictions between pairs of sentences in Arabic language using contradiction vector combined with language model vector as an input to machine learning model. We analyzed results of different traditional machine learning classifiers and compared their results on our created data set (ArNLI) and on an automatic translation of both PHEME, SICK English data sets. Best results achieved using Random Forest classifier with an accuracy of 99%, 60%, 75% on PHEME, SICK and ArNLI respectively.
翻译:自然语言推断(NLI)是自然语言处理的热题研究,判决之间的矛盾检测是NLI的一个特例。这被认为是一项困难的NLP任务。当添加作为许多NLP应用程序的组成部分时具有很大影响,例如问答系统、文本摘要化。阿拉伯语是因其丰富的词汇、语义模糊性而发现矛盾的最困难的低资源语言之一。我们创建了一个由12公里以上句子组成的数据集,并命名为ArNLI,将公开提供。此外,我们应用了一个由斯坦福对英语的矛盾检测建议解决方案所启发的新模型。我们提出了一种方法,用矛盾矢量与语言模式矢量相结合的阿拉伯语句子与语言模式矢量作为机器学习模型的投入来检测矛盾。我们分析了不同传统机器学习分类器的结果,并比较了我们创建的数据集(ArNLI)和PHEME、SICK英语数据集的自动翻译结果。最佳结果是使用随机森林分类,精确度分别为99%、60%、75%在PHEME、SICK和ArNLI。