This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. We adopt the prompts advised by ChatGPT to trigger its translation ability and find that the candidate prompts generally work well and show minor performance differences. By evaluating on a number of benchmark test sets, we find that ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages but lags behind significantly on low-resource or distant languages. For distant languages, we explore an interesting strategy named $\mathbf{pivot~prompting}$ that asks ChatGPT to translate the source sentence into a high-resource pivot language before into the target language, which improves the translation performance significantly. As for the translation robustness, ChatGPT does not perform as well as the commercial systems on biomedical abstracts or Reddit comments but exhibits good results on spoken language. With the launch of the GPT-4 engine, the translation performance of ChatGPT is significantly boosted, becoming comparable to commercial translation products, even for distant languages. In other words, $\mathbf{ChatGPT~has~already~become~a~good~translator!}$ Scripts and data: https://github.com/wxjiao/Is-ChatGPT-A-Good-Translator
翻译:本文对ChatGPT进行了初步评估,包括翻译提示、多语种翻译和翻译健壮性。我们采用ChatGPT建议的提示来触发其翻译能力,并发现候选提示通常表现良好,并显示出轻微的性能差异。通过在多个基准测试集上评估,我们发现,ChatGPT在高资源欧洲语言上与商业翻译产品(例如Google翻译)竞争力强,但在低资源或远程语言方面显著落后。对于远程语言,我们探索了一种有趣的策略,称为$\mathbf{枢纽提示}$,即要求ChatGPT先将源句翻译成高资源的枢纽语言,然后再翻译成目标语言,这显著提高了翻译性能。至于翻译健壮性,在生物医学摘要或Reddit评论方面,ChatGPT的表现不如商业系统,但在口语方面表现良好。随着GPT-4引擎的推出,ChatGPT的翻译性能得到了显著提高,即使对于远程语言,也成为商业翻译产品的可比较对象。换句话说,$\mathbf{ChatGPT已经成为一款好的翻译器!}$ 脚本和数据:https://github.com/wxjiao/Is-ChatGPT-A-Good-Translator