We introduce and formalize an under-studied linguistic phenomenon we call Natural Asemantic Variation (NAV) and investigate it in the context of Machine Translation (MT) robustness. Standard MT models are shown to be less robust to rarer, nuanced language forms, and current robustness techniques do not account for this kind of perturbation despite their prevalence in "real world" data. Experiment results provide more insight into the nature of NAV and we demonstrate strategies to improve performance on NAV. We also show that NAV robustness can be transferred across languages and fine that synthetic perturbations can achieve some but not all of the benefits of human-generated NAV data.
翻译:我们引入并正式确定一种未经充分研究的语言现象,我们称之为自然阿斯曼变异(NAV),并结合机器翻译(MT)的稳健性对其进行调查。 标准MT模型对稀有、细微的语言形式来说不够强健,尽管在“现实世界”数据中普遍存在这种扰动,但目前的稳健性技术并没有说明这种扰动的原因。 实验结果更深入地揭示了NAV的性质,我们展示了改善NAV绩效的战略。 我们还表明,NAV的稳健性可以跨越语言转移,而且合成扰动可以实现人类生成的NAV数据的某些但并非全部的好处。