A good translation should not only translate the original content semantically, but also incarnate personal traits of the original text. For a real-world neural machine translation (NMT) system, these user traits (e.g., topic preference, stylistic characteristics and expression habits) can be preserved in user behavior (e.g., historical inputs). However, current NMT systems marginally consider the user behavior due to: 1) the difficulty of modeling user portraits in zero-shot scenarios, and 2) the lack of user-behavior annotated parallel dataset. To fill this gap, we introduce a novel framework called user-driven NMT. Specifically, a cache-based module and a user-driven contrastive learning method are proposed to offer NMT the ability to capture potential user traits from their historical inputs under a zero-shot learning fashion. Furthermore, we contribute the first Chinese-English parallel corpus annotated with user behavior called UDT-Corpus. Experimental results confirm that the proposed user-driven NMT can generate user-specific translations.
翻译:良好的翻译不仅应该翻译原始内容的语义,而且应该将原始文本的个人特征化。对于现实世界神经机器翻译系统(NMT)来说,这些用户特征(例如主题偏好、文体特征和表达习惯)可以在用户行为中保留(例如历史投入)。然而,目前的NMT系统很少考虑用户行为,原因是:(1) 在零点情景中模拟用户肖像的难度;(2) 缺乏用户行为能力附加说明的平行数据集。为了填补这一空白,我们引入了一个叫做用户驱动的NMT的新框架。具体地说,一个基于缓存的模块和用户驱动的反向学习方法,建议NMT在零点学习时向用户提供从历史投入中捕捉潜在用户特征的能力。此外,我们贡献了第一个中文和英文平行的文集,用用户行为附加说明,称为UDT-Corpus。实验结果证实,拟议的用户驱动型NMT可以产生用户特定翻译。