Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. The early NMT model supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding researches pointed out that limited by the uncontrolled nature of attention computation, the model requires an external syntax to capture the deep syntactic awareness. Although recent syntax-aware NMT methods have bored great fruits in combining syntax, the additional workloads they introduced render the model heavy and slow. Particularly, these efforts scarcely involve the Transformer-based NMT and modify its core self-attention network (SAN). To this end, we propose a parameter-free, dependency-scaled self-attention network (Deps-SAN) for syntax-aware Transformer-based NMT. It integrates a quantified matrix of syntactic dependencies to impose explicit syntactic constraints into the SAN to learn syntactic details and dispel the dispersion of attention distributions. Two knowledge sparsing techniques are further proposed to avoid the model overfitting the dependency noises. Extensive experiments and analyses on the two benchmark NMT tasks verify the effectiveness of our approach.
翻译:早期NMT模型认为,可以通过关注网络从众多文本中自动学习语法细节。然而,随后的研究指出,由于注意力计算不受控制的性质,模型需要外部语法来捕捉深超感知。虽然最近的通税觉知NMT方法在结合语法方面成果丰富,但它们带来的额外工作量使模型变得沉重而缓慢。特别是,这些努力很少涉及基于变异器的NMT和修改核心自留网络(SAN)。为此,我们提议建立一个无参数的、依赖性标尺的自留网络(Deps-SAN),用于基于超常自留的NMT。它整合了一个量化的共性依赖性矩阵,在SAN中施加明确的协同性限制,以学习同理细节并消除分散的注意力。另外,还提议了两种知识蒸发技术,以避免模型过份依赖性噪音。我们两个基准方法的实效测试和分析。