Understanding voluminous historical records provides clues on the past in various aspects, such as social and political issues and even natural science facts. However, it is generally difficult to fully utilize the historical records, since most of the documents are not written in a modern language and part of the contents are damaged over time. As a result, restoring the damaged or unrecognizable parts as well as translating the records into modern languages are crucial tasks. In response, we present a multi-task learning approach to restore and translate historical documents based on a self-attention mechanism, specifically utilizing two Korean historical records, ones of the most voluminous historical records in the world. Experimental results show that our approach significantly improves the accuracy of the translation task than baselines without multi-task learning. In addition, we present an in-depth exploratory analysis on our translated results via topic modeling, uncovering several significant historical events.
翻译:了解大量历史记录在社会和政治问题甚至自然科学事实等各方面为过去提供了线索,然而,一般很难充分利用历史记录,因为大多数文件不是以现代语言撰写的,而且部分内容会随着时间推移而受损,因此,修复受损或无法辨认的部分以及将记录翻译成现代语言是关键任务。作为回应,我们提出了一个多任务学习方法,以基于自我注意机制,具体利用两个韩国历史记录,即世界上数量最多的历史记录,来恢复和翻译历史文件。实验结果显示,我们的方法大大提高了翻译工作的准确性,而不是没有多任务学习的基线。此外,我们通过专题建模,揭示了几个重大的历史事件,对翻译结果进行了深入的探索性分析。