Interpreters facilitate multi-lingual meetings but the affordable set of languages is often smaller than what is needed. Automatic simultaneous speech translation can extend the set of provided languages. We investigate if such an automatic system should rather follow the original speaker, or an interpreter to achieve better translation quality at the cost of increased delay. To answer the question, we release Europarl Simultaneous Interpreting Corpus (ESIC), 10 hours of recordings and transcripts of European Parliament speeches in English, with simultaneous interpreting into Czech and German. We evaluate quality and latency of speaker-based and interpreter-based spoken translation systems from English to Czech. We study the differences in implicit simplification and summarization of the human interpreter compared to a machine translation system trained to shorten the output to some extent. Finally, we perform human evaluation to measure information loss of each of these approaches.
翻译:口译人员为多语文会议提供便利,但费用低廉的一套语文往往比需要的要小。自动同时翻译语言可以扩展所提供的一套语文。我们调查这种自动系统是否应该遵循原发言者或口译人员的做法,以更长时间的延误为代价,提高翻译质量。为了回答这个问题,我们发布了Europarl Simultaneous Incorpus(ESIC),10小时的欧洲议会英语录音和录音记录,并同时译成捷克文和德文。我们评估了英语和捷克文的讲口语翻译系统的质量和长期性。我们研究了与在某种程度上缩短产出的机器翻译系统相比,人类口译人员的隐性简化和总称化与机器翻译系统的差别。最后,我们进行了人力评估,以衡量每一种方法的信息损失。