Entity Alignment (EA), which aims to detect entity mappings (i.e. equivalent entity pairs) in different Knowledge Graphs (KGs), is critical for KG fusion. Neural EA methods dominate current EA research but still suffer from their reliance on labelled mappings. To solve this problem, a few works have explored boosting the training of EA models with self-training, which adds confidently predicted mappings into the training data iteratively. Though the effectiveness of self-training can be glimpsed in some specific settings, we still have very limited knowledge about it. One reason is the existing works concentrate on devising EA models and only treat self-training as an auxiliary tool. To fill this knowledge gap, we change the perspective to self-training to shed light on it. In addition, the existing self-training strategies have limited impact because they introduce either much False Positive noise or a low quantity of True Positive pseudo mappings. To improve self-training for EA, we propose exploiting the dependencies between entities, a particularity of EA, to suppress the noise without hurting the recall of True Positive mappings. Through extensive experiments, we show that the introduction of dependency makes the self-training strategy for EA reach a new level. The value of self-training in alleviating the reliance on annotation is actually much higher than what has been realised. Furthermore, we suggest future study on smart data annotation to break the ceiling of EA performance.
翻译:实体调整(EA)旨在检测不同知识图(KGs)中的实体映射(即相等的实体配对),对KG聚合至关重要。神经EA方法主宰着当前的EA研究,但依然依赖贴标签的映射。为了解决这一问题,一些作品探索了通过自我培训加强对EA模型的培训,这在培训数据中反复地增加了自信预测的映射。虽然在特定环境中可以看到自我培训的有效性,但我们对此的知识仍然非常有限。原因之一是,现有工作侧重于设计EA模型,而仅将自我培训作为辅助工具。为了填补这一知识差距,我们改变了自我培训的视角,以展示这方面的知识。此外,现有的自我培训战略影响有限,因为它们引入了错误的正面噪音或真实正面的伪映射数量低。为了提高自我培训,我们建议利用实体之间的依赖性,特别是EA的特性,在不伤害真实正面映射时抑制噪音。通过广泛的实验,我们展示了对自我培训的自我培训的高度依赖性,我们从未来自我培训的高度提升了自我培训。