Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-the-art (SotA) results; yet, using such encoders is computationally expensive. To improve this, we present a novel Reducer Adaptor block, RedApt, that could be seamlessly integrated within any Transformer-based speech encoding architecture. Integrating the pretrained wav2vec 2 speech encoder with RedAptbrings 41% speedup, 33% memory reduction with 24% fewer FLOPs at inference. To our positive surprise, our ST model with RedApt outperforms the SotA architecture by an average of 0.68 BLEU score on 8 language pairs from Must-C.
翻译:语言翻译(ST)中经过预先培训的语音变换器促进了最先进的( SotA)结果; 然而, 使用这种编码器在计算上成本很高 。 为了改进这一点, 我们展示了一部小说《 降低调适器 》 ( RedApt) 块, 可以在任何基于变换器的语音编码结构中无缝地整合。 将预先培训的 wav2vec 2 语音编码器与RedAptbrings 41% 速度加速结合, 33%的内存减速, 将FLOPs减速减少24% 。 令我们感到意外的是, 我们的RedApt的ST模型在来自Must- C的8对语言中平均比SotA结构高出0. 68 BLEU分。