Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. Early NMT works supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding researches pointed out that limited by the uncontrolled nature of attention computation, the NMT model requires an external syntax to capture the deep syntactic awareness. Although existing syntax-aware NMT methods have born great fruits in combining syntax, the additional workloads they introduced render the model heavy and slow. Particularly, these efforts scarcely involve the Transformer-based NMT and modify its core self-attention network (SAN). To this end, we propose a parameter-free, Dependency-scaled Self-Attention Network (Deps-SAN) for syntax-aware Transformer-based NMT. A quantified matrix of dependency closeness between tokens is constructed to impose explicit syntactic constraints into the SAN for learning syntactic details and dispelling the dispersion of attention distributions. Two knowledge sparsing techniques are further integrated to avoid the model overfitting the dependency noises introduced by the external parser. Experiments and analyses on IWSLT14 German-to-English and WMT16 German-to-English benchmark NMT tasks verify the effectiveness of our approach.
翻译:NMT早期工作认为,可以通过关注网络从众多文本中自动学习语法细节。然而,随后的研究指出,由于注意力计算不受控制的性质,NMT模式需要外部语法来捕捉深刻的语法意识。虽然现有的语法认知NMT方法在综合语法方面产生了巨大的成果,但是它们引入的额外工作量使模型变得沉重而缓慢。特别是,这些努力很少涉及基于变异器的NMT和修改其核心自留网络。为此,我们提议为基于NMT的Syntax-aware变异器建立无参数、依赖性比例自留网络(Deps-SAN),以捕捉到深层次的语法意识。虽然现有的语法觉NMT方法在综合语法方面产生了巨大的成果,但是它们带来的额外工作量使SAN在学习合成细节和消除注意力分布分散方面受到明显的协同限制。两种知识抽查技术被进一步整合,以避免在德国的I-MAT 和W-MAT 外部对德国的I-IMT-I-MAT-I-I-MATI-I-I-MATI-I-I-MAT-I-MATIMAT-I-I-I-MAST-I-I-I-MATIMAST-F-MATIMAST-I-I-I-I-I-I-I-I-I-MATIMAST-MAST-I-MAST-I-I-I-I-I-MAST-I-I-I-I-MAST-I-I-I-I-I-MAST-I-MAST-MAST-MAST-MAST-I-I-I-I-I-I-MAST-I-I-I-I-I-I-I-I-I-MAST-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-MATI-I-I-I-MAT-I-I-I-I-I-I-I-I-MA