Incorporating tagging into neural machine translation (NMT) systems has shown promising results in helping translate rare words such as named entities (NE). However, translating NE in low-resource setting remains a challenge. In this work, we investigate the effect of using tags and NE hypernyms from knowledge graphs (KGs) in parallel corpus in different levels of resource conditions. We find the tag-and-copy mechanism (tag the NEs in the source sentence and copy them to the target sentence) improves translation in high-resource settings only. Introducing copying also results in polarizing effects in translating different parts-of-speech (POS). Interestingly, we find that copy accuracy for hypernyms is consistently higher than that of entities. As a way of avoiding "hard" copying and utilizing hypernym in bootstrapping rare entities, we introduced a "soft" tagging mechanism and found consistent improvement in high and low-resource settings.
翻译:将标记纳入神经机翻译系统(NMT)在帮助翻译诸如命名实体(NE)等稀有词方面已经显示出有希望的成果。 然而,在低资源环境下翻译NE仍然是一个挑战。 在这项工作中,我们调查了在不同资源条件下在平行体中使用知识图形(KGs)中的标记和NE超音频在不同水平资源条件下的影响。我们发现标签和复制机制(在源句中将NEs标注到源句中并抄录到目标句中)只改善了高资源环境中的翻译。引入复制还导致在翻译不同语音部分(POS)时产生两极化效应。有趣的是,我们发现超音音频的复制精度始终高于实体的精确度。作为避免“硬”复制和利用超音频复制稀有实体的方法,我们引入了“软”标记机制,并在高资源和低资源环境中发现持续改进。