The study of the attention mechanism has sparked interest in many fields, such as language modeling and machine translation. Although its patterns have been exploited to perform different tasks, from neural network understanding to textual alignment, no previous work has analysed the encoder-decoder attention behavior in speech translation (ST) nor used it to improve ST on a specific task. In this paper, we fill this gap by proposing an attention-based policy (EDAtt) for simultaneous ST (SimulST) that is motivated by an analysis of the existing attention relations between audio input and textual output. Its goal is to leverage the encoder-decoder attention scores to guide inference in real time. Results on en->{de, es} show that the EDAtt policy achieves overall better results compared to the SimulST state of the art, especially in terms of computational-aware latency.
翻译:关注机制的研究引起了人们对语言建模和机器翻译等许多领域的兴趣,尽管其模式已被利用来履行从神经网络理解到文本对齐等不同任务,但以往没有工作分析语音翻译中的编码器-解码器关注行为,也没有利用它来改进语言翻译中的编码器-解码器关注行为。在本文中,我们通过对音频输入和文本输出之间的现有关注关系进行分析,提出关注政策(EDAtt)来填补这一空白,该政策的目的是利用编码器-解码器关注分数来实时指导推断。 有关 en- ⁇ de, es}的结果显示,与艺术的SimulST状态相比, EDAtt政策总体上取得了更好的效果,特别是在计算觉悟时。