The majority of language domains require prudent use of terminology to ensure clarity and adequacy of information conveyed. While the correct use of terminology for some languages and domains can be achieved by adapting general-purpose MT systems on large volumes of in-domain parallel data, such quantities of domain-specific data are seldom available for less-resourced languages and niche domains. Furthermore, as exemplified by COVID-19 recently, no domain-specific parallel data is readily available for emerging domains. However, the gravity of this recent calamity created a high demand for reliable translation of critical information regarding pandemic and infection prevention. This work is part of WMT2021 Shared Task: Machine Translation using Terminologies, where we describe Tilde MT systems that are capable of dynamic terminology integration at the time of translation. Our systems achieve up to 94% COVID-19 term use accuracy on the test set of the EN-FR language pair without having access to any form of in-domain information during system training. We conclude our work with a broader discussion considering the Shared Task itself and terminology translation in MT.
翻译:大多数语文领域都需要谨慎地使用术语,以确保所传递的信息的清晰和充分性。虽然某些语文和领域的术语的正确使用可以通过调整关于大量内部平行数据的通用MT系统来实现,但很少能为资源较少的语文和特定领域提供这种数量的具体域数据。此外,如COVID-19最近的例子所示,没有为新兴领域提供特定域的平行数据。然而,最近这场灾难的严重性造成了对可靠翻译有关大流行病和感染预防的关键信息的大量需求。这项工作是WMT2021共同任务的一部分:利用Terminos进行机器翻译,我们描述了在翻译时能够动态整合术语的Tilde MT系统。我们的系统在EN-FR语言配对的测试组上实现了高达94%的COVID-19术语的准确性,而没有在系统培训期间获得任何形式的内部信息。我们结束我们的工作时,以更广泛的讨论共同任务本身和MT的术语翻译来结束我们的工作。