Efficient machine translation models are commercially important as they can increase inference speeds, and reduce costs and carbon emissions. Recently, there has been much interest in non-autoregressive (NAR) models, which promise faster translation. In parallel to the research on NAR models, there have been successful attempts to create optimized autoregressive models as part of the WMT shared task on efficient translation. In this paper, we point out flaws in the evaluation methodology present in the literature on NAR models and we provide a fair comparison between a state-of-the-art NAR model and the autoregressive submissions to the shared task. We make the case for consistent evaluation of NAR models, and also for the importance of comparing NAR models with other widely used methods for improving efficiency. We run experiments with a connectionist-temporal-classification-based (CTC) NAR model implemented in C++ and compare it with AR models using wall clock times. Our results show that, although NAR models are faster on GPUs, with small batch sizes, they are almost always slower under more realistic usage conditions. We call for more realistic and extensive evaluation of NAR models in future work.
翻译:高效机器翻译模型具有商业重要性,因为它们可以提高推论速度,降低成本和碳排放。最近,人们对非偏向型模型非常感兴趣,这些模型有望更快地翻译。在对NAR模型进行研究的同时,还成功尝试建立最佳自动递减型模型,作为WMT分担的有效翻译任务的一部分。在本文中,我们指出NAR模型文献中现有的评价方法的缺陷,我们提供了对最新NAR模型和自动递增提交共同任务之间的公平比较。我们主张对NAR模型进行一致评价,并主张将NAR模型与其他广泛使用的提高效率方法进行比较的重要性。我们在C++执行的连接时级(CTAC)NAR模型实验,并利用挂时钟时间将其与AR模型进行比较。我们的结果显示,虽然NAR模型在GPUs上速度更快,规模较小,但在更现实的使用条件下,它们几乎总是放慢速度。我们呼吁在未来工作中对NAR模型进行更加现实和广泛的评估。