Machine translation models have discrete vocabularies and commonly use subword segmentation techniques to achieve an 'open vocabulary.' This approach relies on consistent and correct underlying unicode sequences, and makes models susceptible to degradation from common types of noise and variation. Motivated by the robustness of human language processing, we propose the use of visual text representations, which dispense with a finite set of text embeddings in favor of continuous vocabularies created by processing visually rendered text with sliding windows. We show that models using visual text representations approach or match performance of traditional text models on small and larger datasets. More importantly, models with visual embeddings demonstrate significant robustness to varied types of noise, achieving e.g., 25.9 BLEU on a character permuted German-English task where subword models degrade to 1.9.
翻译:机器翻译模型有独立的词汇,并通常使用子词分解技术来实现“ 开放词汇 ” 。 这种方法依赖于一致和正确的 Unicode 基本序列,并使模型容易因常见类型的噪音和变异而退化。 受人类语言处理的稳健性驱使,我们提议使用视觉文字表达方式,这种表示方式可以免除一套有限的文字嵌入,有利于通过用滑动窗口处理视觉化文本而创建的连续的词汇。 我们显示,模型使用视觉文字表达方法,或者匹配小型和大型数据集传统文本模型的性能。 更重要的是,具有视觉嵌入模式的模型显示了对不同类型噪音的巨大强健性,例如,实现了25.9 BLEU, 其字符遍及德国-英国的任务,其子词模式将降解为1.9。