In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has focused solely on source sentence tokens attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target prefix) in the model predictions. In this work, we propose an interpretability method that tracks complete input token attributions. Our method, which can be extended to any encoder-decoder Transformer-based model, allows us to better comprehend the inner workings of current NMT models. We apply the proposed method to both bilingual and multilingual Transformers and present insights into their behaviour.
翻译:在神经机器翻译(NMT)中,每种象征性的预测都以源句和目标前缀为条件(以前在解码步骤中翻译过),然而,以前关于NMT可解释性的工作只侧重于源句符号属性。因此,我们没有充分理解模型预测中每种输入符号(源句和目标前缀)的影响。在这项工作中,我们提出了一种可解释性方法,以跟踪完整的输入符号属性。我们的方法可以推广到任何基于编码器的变换器模型,使我们能够更好地了解当前NMT模型的内部功能。我们将拟议方法应用于双语和多语种变换器,并对它们的行为提出见解。