Speech translation (ST) systems translate speech in one language to text in another language. End-to-end ST systems (e2e-ST) have gained popularity over cascade systems because of their enhanced performance due to reduced latency and computational cost. Though resource intensive, e2e-ST systems have the inherent ability to retain para and non-linguistic characteristics of the speech unlike cascade systems. In this paper, we propose to use an e2e architecture for English-Hindi (en-hi) ST. We use two imperfect machine translation (MT) services to translate Libri-trans en text into hi text. While each service gives MT data individually to generate parallel ST data, we propose a data augmentation strategy of noisy MT data to aid robust ST. The main contribution of this paper is the proposal of a data augmentation strategy. We show that this results in better ST (BLEU score) compared to brute force augmentation of MT data. We observed an absolute improvement of 1.59 BLEU score with our approach.
翻译:----
语音翻译系统将一种语言的语音翻译成另一种语言的文本。端到端语音翻译系统(e2e-ST)由于降低了延迟和计算成本而比级联系统更受欢迎。尽管资源密集,但e2e-ST系统具有保留语音的嵌入式和非语言特征的固有能力,而不像级联系统。在本文中,我们提出使用一个e2e架构进行英语-印地语(en-hi)语音翻译。我们使用两个不完善的机器翻译(MT)服务将Libri-trans en文本翻译成hi文本。虽然每个服务单独提供MT数据以生成并行ST数据,但我们提出了嘈杂MT数据的数据增强策略来帮助实现抗干扰ST。本文的主要贡献在于提出数据增强策略。我们表明,与MT数据的暴力增强相比,这导致更好的ST(BLEU分数)。我们观察到我们的方法在BLEU分数上实现了1.59分的绝对改进。