Direct speech-to-speech translation (S2ST) systems leverage recent progress in speech representation learning, where a sequence of discrete representations (units) derived in a self-supervised manner, are predicted from the model and passed to a vocoder for speech synthesis, still facing the following challenges: 1) Acoustic multimodality: the discrete units derived from speech with same content could be indeterministic due to the acoustic property (e.g., rhythm, pitch, and energy), which causes deterioration of translation accuracy; 2) high latency: current S2ST systems utilize autoregressive models which predict each unit conditioned on the sequence previously generated, failing to take full advantage of parallelism. In this work, we propose TranSpeech, a speech-to-speech translation model with bilateral perturbation. To alleviate the acoustic multimodal problem, we propose bilateral perturbation, which consists of the style normalization and information enhancement stages, to learn only the linguistic information from speech samples and generate more deterministic representations. With reduced multimodality, we step forward and become the first to establish a non-autoregressive S2ST technique, which repeatedly masks and predicts unit choices and produces high-accuracy results in just a few cycles. Experimental results on three language pairs demonstrate the state-of-the-art results by up to 2.5 BLEU points over the best publicly-available textless S2ST baseline. Moreover, TranSpeech shows a significant improvement in inference latency, enabling speedup up to 21.4x than autoregressive technique. Audio samples are available at \url{https://TranSpeech.github.io/}
翻译:直接语音对语音翻译(S2ST)系统利用语音代表学习的最新进展,从模型中预测出以自我监督方式产生的离散演示(单位)序列,并传递到语音合成的vocoder,仍面临以下挑战:1) 声学多式联运:由于声学特性(例如,节奏、音道和能量)导致翻译准确性下降,来自同一内容的语音翻译(S2ST)系统离散单位可能不确定性;2 高度悬浮:目前的S2ST系统使用自动递增模型,预测每个单位以先前生成的顺序为条件,无法充分利用平行功能。在此工作中,我们提议TranSpeech,一个语音对音异翻译模型,并带有双边扰动性。为了缓解声学多式联运问题,我们建议双边扰动,即由风格正常化和信息增强阶段组成,仅从语音样本中学习语言信息,并产生更多的确定性陈述。随着多式联运的减少,我们向前一步,成为第一个以先前序列为条件的自动选择,无法充分利用平行的同步周期。Stropecial-deal-deal-modeal-lavelyal-deal-deal-lavial-le-deal-le-lavial-lex le-s-legislational-legal