Pronoun resolution is a challenging subset of an essential field in natural language processing called coreference resolution. Coreference resolution is about finding all entities in the text that refers to the same real-world entity. This paper presents a hybrid model combining multiple rulebased sieves with a machine-learning sieve for pronouns. For this purpose, seven high-precision rule-based sieves are designed for the Persian language. Then, a random forest classifier links pronouns to the previous partial clusters. The presented method demonstrates exemplary performance using pipeline design and combining the advantages of machine learning and rulebased methods. This method has solved some challenges in end-to-end models. In this paper, the authors develop a Persian coreference corpus called Mehr in the form of 400 documents. This corpus fixes some weaknesses of the previous corpora in the Persian language. Finally, the efficiency of the presented system compared to the earlier model in Persian is reported by evaluating the proposed method on the Mehr and Uppsala test sets.
翻译:Pronoun 分辨率是自然语言处理中一个具有挑战性的基本领域的一个分支,称为共同参照决议。 共同参照决议涉及在文本中找到所有实体,指的是同一个真实世界实体。 本文展示了一个混合模型, 将多种基于规则的Sieves与机器学习的Sieve结合起来, 用于代诺恩。 为此, 为波斯语设计了七种基于高精度规则的Sieves。 然后, 一个随机的森林分类器连接了以前的部分组别。 提出的方法展示了使用管道设计并结合了机器学习和基于规则的方法的优势的模范性表现。 这个方法解决了终端到终端模式中的一些挑战。 在本文中,作者以400份文件的形式开发了一个称为Mehr的波斯连带参照程序。 这个系统修复了先前波斯语中子体的一些弱点。 最后, 与先前的波斯语模型相比, 所呈现的系统的效率是通过评价Mehr和Uppsala测试组的拟议方法来报告的。