Most speech-to-text (S2T) translation studies use English speech as a source, which makes it difficult for non-English speakers to take advantage of the S2T technologies. For some languages, this problem was tackled through corpus construction, but the farther linguistically from English or the more under-resourced, this deficiency and underrepresentedness becomes more significant. In this paper, we introduce kosp2e (read as `kospi'), a corpus that allows Korean speech to be translated into English text in an end-to-end manner. We adopt open license speech recognition corpus, translation corpus, and spoken language corpora to make our dataset freely available to the public, and check the performance through the pipeline and training-based approaches. Using pipeline and various end-to-end schemes, we obtain the highest BLEU of 21.3 and 18.0 for each based on the English hypothesis, validating the feasibility of our data. We plan to supplement annotations for other target languages through community contributions in the future.
翻译:多数语音对文本(S2T)翻译研究使用英语作为资料来源,使得非英语语言者难以利用S2T技术。对于一些语言来说,这个问题是通过建造文体来解决的,但从语言上看,这种缺陷和代表性不足的问题比英语更远,或更缺乏资源。在本文中,我们引入了kosp2e(称为`kospi'),该文体允许朝鲜语以端对端方式翻译成英文文本。我们通过公开的许可证语音识别表、翻译本体和口语团向公众免费提供数据集,并通过管道和基于培训的方法检查业绩。我们利用管道和各种端对端计划,根据英语假设,获得最高21.3和18.0的BLEU,确认我们数据的可行性。我们计划通过社区今后的贡献补充其他目标语言的说明。