Machine translation has seen rapid progress with the advent of Transformer-based models. These models have no explicit linguistic structure built into them, yet they may still implicitly learn structured relationships by attending to relevant tokens. We hypothesize that this structural learning could be made more robust by explicitly endowing Transformers with a structural bias, and we investigate two methods for building in such a bias. One method, the TP-Transformer, augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We test these methods on translating from English into morphologically rich languages, Turkish and Inuktitut, and consider both automatic metrics and human evaluations. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset. In sum, structural encoding methods make Transformers more sample-efficient, enabling them to perform better from smaller amounts of data.
翻译:随着以变异器为基础的模型的出现,机器翻译工作取得了迅速的进展。这些模型没有在其中建立明确的语言结构,但它们仍然可以通过关注相关符号来隐含地学习结构化关系。 我们假设通过明确赋予结构偏差的变异器可以使这种结构化学习更加有力,我们调查了在这种偏差下建立这种结构的两种方法。 一种方法,TP- Transfent, 扩大传统变异器结构以包括一个代表结构的额外组成部分。 第二种方法,通过将数据与形态符号化分割,在数据层面注入结构。 我们测试这些方法,将英语转换成形态上丰富的语言,土耳其语和伊努克蒂特语,并考虑自动衡量尺度和人类评估。 我们发现,这两种方法都使网络取得更好的业绩,但这种改进取决于数据集的大小。 总之,结构编码方法使变异器更具样本效率,能够从较少的数据中更好地运行。