Generative Pre-trained Transformer (GPT) models have shown remarkable capabilities for natural language generation, but their performance for machine translation has not been thoroughly investigated. In this paper, we present a comprehensive evaluation of GPT models for machine translation, covering various aspects such as quality of different GPT models in comparison with state-of-the-art research and commercial systems, effect of prompting strategies, robustness towards domain shifts and document-level translation. We experiment with eighteen different translation directions involving high and low resource languages, as well as non English-centric translations, and evaluate the performance of three GPT models: ChatGPT, GPT3.5 (text-davinci-003), and text-davinci-002. Our results show that GPT models achieve very competitive translation quality for high resource languages, while having limited capabilities for low resource languages. We also show that hybrid approaches, which combine GPT models with other translation systems, can further enhance the translation quality. We perform comprehensive analysis and human evaluation to further understand the characteristics of GPT translations. We hope that our paper provides valuable insights for researchers and practitioners in the field and helps to better understand the potential and limitations of GPT models for translation.
翻译:受过培训的变革者(GPT)模型显示自然语言生成的非凡能力,但是其机器翻译的性能没有得到彻底调查。我们在本文件中对GPT机器翻译模型进行了全面评估,其中包括不同GPT模型的质量,与最先进的研究和商业系统相比,不同GPT模型的质量,促进战略的效果,对域变换和文件级翻译的稳健性;我们试验了18种不同的翻译方向,涉及高低资源语言,以及非以英语为中心的翻译,并评价了三种GPT模型的性能:ChateGPT、GPT3.5(Text-davinci-003)和文本-davinci-002。我们的结果显示,GPT模型在高资源语言方面实现了非常有竞争力的翻译质量,同时对低资源语言的能力有限。我们还表明,将GPT模型与其他翻译系统相结合的混合方法可以进一步提高翻译质量。我们进行全面分析和人文评估,以进一步理解GPT翻译的特性。我们希望我们的论文为实地研究人员和从业人员提供宝贵的见解,有助于更好地了解GPTT模型的潜力和局限性。