Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.
翻译:最近,非经常性建筑(革命、自我保护)的表现超过了神经机器翻译中的RNNs。CNN和自我保护网络可以通过较短的网络路径将遥远的词句与RNNs连接起来,人们推测这提高了它们模拟长期依赖的能力。然而,这一理论论点没有经过经验测试,也没有对其强力表现的替代解释进行深入探讨。我们推测CNN和自我保护网络的强力表现也可能是由于它们能够从源文本中提取语义特征。我们评估RNN、CNNs和自我保护网络在两项任务上的表现明显优于RNNs和CNNs在单词识别上的不清晰协议;我们的实验结果表明:(1) 自我保护网络和CNNs在长距离上建模主题- Vib协议方面并不优于RNS和CNNMs。