Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed. A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. While recent works had proposed various strategies to improve the pre-decision, they mainly adopt the fixed wait-k policy, leaving the adaptive policies rarely explored. This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF). Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances. We conduct experiments on the MuST-C V2 dataset and show the effectiveness of our approach.
翻译:同时语言翻译(SimulST)是一项具有挑战性的任务,目的是在观察到完整输入之前翻译流言。一个SimulST系统通常包括两个部分:综合语音信息和决定读写的政策的预先决定和决定读写的政策。虽然最近的工作提出了各种战略来改进决定前的改进,但主要是采用固定的等待-k政策,而适应政策很少被探讨。本文件建议通过调整连续整合-Fire(CIF)来模拟适应政策。与单一式多头目关注(MMA)相比,我们的方法具有更简单的计算、更优质量的低悬浮度和对长话的更好概括化的优势。我们在 MuST-C V2数据集上进行了实验,并展示了我们方法的有效性。