Simultaneous translation is a task in which translation begins before the speaker has finished speaking. In its evaluation, we have to consider the latency of the translation in addition to the quality. The latency is preferably as small as possible for users to comprehend what the speaker says with a small delay. Existing latency metrics focus on when the translation starts but do not consider adequately when the translation ends. This means such metrics do not penalize the latency caused by a long translation output, which actually delays users' comprehension. In this work, we propose a novel latency evaluation metric called Average Token Delay (ATD) that focuses on the end timings of partial translations in simultaneous translation. We discuss the advantage of ATD using simulated examples and also investigate the differences between ATD and Average Lagging with simultaneous translation experiments.
翻译:同时翻译是一项在发言者发言结束前就开始翻译的任务。 在评估中,我们必须考虑翻译的长度,除了质量之外,还要考虑翻译的长度。 时间的长度最好尽可能小,使用户能够理解发言者所说的话,只需稍稍延迟。 现有时间的长度衡量侧重于翻译开始的时间,但并不充分考虑翻译结束的时间。 这意味着这些尺度并不惩罚长翻译产出造成的延迟,这实际上拖延了用户的理解。 在这项工作中,我们提出了一个名为 " 平均延迟 " (ATD)的新颖的延迟评价指标,重点是同时翻译部分翻译的结束时间。 我们利用模拟例子来讨论翻译的优势,并调查ATD与平均延迟与同时翻译实验之间的差异。