Simultaneous translation systems start producing the output while processing the partial source sentence in the incoming input stream. These systems need to decide when to read more input and when to write the output. These decisions depend on the structure of source/target language and the information contained in the partial input sequence. Hence, read/write decision policy remains the same across different input modalities, i.e., speech and text. This motivates us to leverage the text transcripts corresponding to the speech input for improving simultaneous speech-to-text translation (SimulST). We propose Decision Attentive Regularization (DAR) to improve the decision policy of SimulST systems by using the simultaneous text-to-text translation (SimulMT) task. We also extend several techniques from the offline speech translation domain to explore the role of SimulMT task in improving SimulST performance. Overall, we achieve 34.66% / 4.5 BLEU improvement over the baseline model across different latency regimes for the MuST-C English-German (EnDe) SimulST task.
翻译:同时翻译系统在处理输入流中部分源句时开始产生输出。 这些系统需要决定何时读取更多输入和何时写出输出。 这些决定取决于源/目标语言的结构和部分输入序列所含的信息。 因此, 读/ 写决定政策在不同输入模式, 即语音和文本之间保持不变。 这促使我们利用与语音输入相对应的文本记录来改进同步语音对文本翻译( SimulST ) 。 我们提议“ 加速调整决定”, 以便利用同时的文本对文本翻译( imulMT) 任务来改进SimulMT 系统的决策政策。 我们还扩展了从离线语音翻译域到探索SimulMT 任务在改善SimulST 性能中的作用的若干技术。 总的来说, 我们为 MuST- C 英德( Ende) SimalST 任务在基线模型上实现了34.66%/4.5 BLEU的改进。