We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language. This is a fundamentally important routine to historians and digital humanities researchers but has never been automated. We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of years ago summarised in modern German or Chinese. Based on cross-lingual transfer learning techniques, we propose a summarisation model that can be trained even with no cross-lingual (historical to modern) parallel data, and further benchmark it against state-of-the-art algorithms. We report automatic and human evaluations that distinguish the historic to modern language summarisation task from standard cross-lingual summarisation (i.e., modern to modern language), highlight the distinctness and value of our dataset, and demonstrate that our transfer learning approach outperforms standard cross-lingual benchmarks on this task.
翻译:我们引入历史文本总结任务, 将语言历史形式的文件以相应的现代语言进行总结, 这是历史学家和数字人文研究者最重要的例行工作, 但从未实现自动化。 我们汇编了一个高质量的黄金标准文本汇总数据集, 由数百年前的德国和中国历史新闻组成, 以现代德文或中文进行总结。 根据跨语言传输学习技术, 我们提出了一个汇总模型, 即使没有跨语言( 历史到现代) 的平行数据, 也可以对其进行培训, 并进一步根据最新算法进行基准。 我们报告自动和人文评估, 将历史到现代语言的汇总任务与标准的跨语言汇总( 现代到现代语言) 任务区分开来, 突出我们数据集的独特性和价值, 并展示我们的传输学习方法比标准跨语言基准更符合这项任务。