The clustering-based unsupervised relation discovery method has gradually become one of the important methods of open relation extraction (OpenRE). However, high-dimensional vectors can encode complex linguistic information which leads to the problem that the derived clusters cannot explicitly align with the relational semantic classes. In this work, we propose a relation-oriented clustering model and use it to identify the novel relations in the unlabeled data. Specifically, to enable the model to learn to cluster relational data, our method leverages the readily available labeled data of pre-defined relations to learn a relation-oriented representation. We minimize distance between the instance with same relation by gathering the instances towards their corresponding relation centroids to form a cluster structure, so that the learned representation is cluster-friendly. To reduce the clustering bias on predefined classes, we optimize the model by minimizing a joint objective on both labeled and unlabeled data. Experimental results show that our method reduces the error rate by 29.2% and 15.7%, on two datasets respectively, compared with current SOTA methods.
翻译:以集群为基础的未监督关系发现方法已逐渐成为开放关系提取(OpenRE)的重要方法之一。然而,高维矢量可以将复杂的语言信息编码成一个问题,从而导致衍生的组群无法与关系语义类明确一致。在这项工作中,我们提议了一个以关系为导向的组群模型,并用它来识别未标签数据中的新关系。具体地说,为了使模型能够学习集群关系数据,我们的方法利用了预先界定关系的现有标签数据,以学习以关系为导向的表述。我们通过收集其相应关系百分解的事例与形成一个组群体结构的对应关系百分解体,将相同关系的例子之间的距离最小化,从而使所学到的代表性有利于集群结构。为减少预定类群群群的偏差,我们优化了模型,将标签和未标签数据这两个组的共同目标减少到最低限度。实验结果表明,我们的方法将两个数据集的误差率分别降低了29.2%和15.7%。与目前的SOTA方法相比,我们的方法将误差率降低了29.2%和15.7%。。