The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals were originally written in an archaic Korean writing system, `Hanja', and were translated into Korean from 1968 to 1993. The resulting translation was however too literal and contained many archaic Korean words; thus, a new expert translation effort began in 2012. Since then, the records of only one king have been completed in a decade. In parallel, expert translators are working on English translation, also at a slow pace and produced only one king's records in English so far. Thus, we propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English. Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English. We compare our method against two baselines: a recent model that simultaneously learns to restore and translate Hanja historical document and a Transformer based model trained only on newly translated corpora. The experiments reveal that our method significantly outperforms the baselines in terms of BLEU scores for both contemporary Korean and English translations. We further conduct extensive human evaluation which shows that our translation is preferred over the original expert translations by both experts and non-expert Korean speakers.
翻译:朝鲜王朝的Annals of Joseon Dynasty (AJD) 包含朝鲜现代国家500年前500年的朝鲜王朝的每日记录。 Annals最初以古老的朝鲜书写系统“ hanja” 写成,从1968年至1993年翻译成朝鲜文,因此译文太过字面化,包含许多古老的朝鲜文字;因此,2012年开始了一项新的专家翻译工作,从那时以来,仅有一位国王的记录在十年内完成。 同时,专家翻译正在研究英语翻译工作,而且速度缓慢,至今只制作了一位国王的英文记录。 因此,我们建议H2KE,即神经机器翻译模型,将历史文件翻译成汉贾文,更易懂朝鲜文和英文。 在多语言神经机器翻译之外,H2KE学会翻译一部以汉贾文写成的历史文件,从一个完整的韩文翻译版翻译版本,以及最近一个小版的朝鲜文和英文译本。我们的方法与两个基线比较:一个最新的翻译模型,即最近的一个翻译模型,我们用来同时翻译并大量翻译成韩文的实验室标准。