现代语言总结历史文字 (Summarising Historical Text in Modern Languages) - 专知论文

会员服务 ·

0

迁移学习 · state-of-the-art · 学成 · 数据集 · 编译器 ·

2021 年 1 月 27 日

Summarising Historical Text in Modern Languages

翻译：现代语言总结历史文字

Xutan Peng,Yi Zheng,Chenghua Lin,Advaith Siddharthan

from arxiv, To appear at EACL 2021

We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language. This is a fundamentally important routine to historians and digital humanities researchers but has never been automated. We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of years ago summarised in modern German or Chinese. Based on cross-lingual transfer learning techniques, we propose a summarisation model that can be trained even with no cross-lingual (historical to modern) parallel data, and further benchmark it against state-of-the-art algorithms. We report automatic and human evaluations that distinguish the historic to modern language summarisation task from standard cross-lingual summarisation (i.e., modern to modern language), highlight the distinctness and value of our dataset, and demonstrate that our transfer learning approach outperforms standard cross-lingual benchmarks on this task.

翻译：我们引入历史文本总结任务, 将语言历史形式的文件以相应的现代语言进行总结, 这是历史学家和数字人文研究者最重要的例行工作, 但从未实现自动化。我们汇编了一个高质量的黄金标准文本汇总数据集, 由数百年前的德国和中国历史新闻组成, 以现代德文或中文进行总结。根据跨语言传输学习技术, 我们提出了一个汇总模型, 即使没有跨语言( 历史到现代) 的平行数据, 也可以对其进行培训, 并进一步根据最新算法进行基准。我们报告自动和人文评估, 将历史到现代语言的汇总任务与标准的跨语言汇总( 现代到现代语言) 任务区分开来, 突出我们数据集的独特性和价值, 并展示我们的传输学习方法比标准跨语言基准更符合这项任务。

0

相关内容

迁移学习

迁移学习（Transfer Learning）是一种机器学习方法，是把一个领域（即源领域）的知识，迁移到另外一个领域（即目标领域），使得目标领域能够取得更好的学习效果。迁移学习（TL）是机器学习（ML）中的一个研究问题，着重于存储在解决一个问题时获得的知识并将其应用于另一个但相关的问题。例如，在学习识别汽车时获得的知识可以在尝试识别卡车时应用。尽管这两个领域之间的正式联系是有限的，但这一领域的研究与心理学文献关于学习转移的悠久历史有关。从实践的角度来看，为学习新任务而重用或转移先前学习的任务中的信息可能会显着提高强化学习代理的样本效率。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【2020新书】Python文本分析，104页pdf

【2020新书】Python文本分析，104页pdf

专知会员服务

100+阅读 · 2020年12月23日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

现代深度学习技术在自然语言处理的应用（Modern Deep Learning Techniques Applied to Natural Language Processing）

现代深度学习技术在自然语言处理的应用（Modern Deep Learning Techniques Applied to Natural Language Processing）

专知会员服务

53+阅读 · 2020年4月7日

简明扼要！Python教程手册，206页pdf

简明扼要！Python教程手册，206页pdf

专知会员服务

48+阅读 · 2020年3月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

VerSe: A Vertebrae Labelling and Segmentation Benchmark for Multi-detector CT Images

Arxiv

0+阅读 · 2021年3月22日

Monolingual and Parallel Corpora for Kangri Low Resource Language

Arxiv

0+阅读 · 2021年3月22日

On Learning Universal Representations Across Languages

Arxiv

0+阅读 · 2021年3月22日

The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation

Arxiv

0+阅读 · 2021年3月20日

MuRIL: Multilingual Representations for Indian Languages

Arxiv

0+阅读 · 2021年3月19日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Evolutionary Data Measures: Understanding the Difficulty of Text Classification Tasks

Evolutionary Data Measures: Understanding the Difficulty of Text Classification Tasks

Arxiv

4+阅读 · 2018年11月5日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Word Translation Without Parallel Data

Arxiv

8+阅读 · 2018年1月30日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【2020新书】Python文本分析，104页pdf

【2020新书】Python文本分析，104页pdf

专知会员服务

100+阅读 · 2020年12月23日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

140+阅读 · 2020年7月10日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

现代深度学习技术在自然语言处理的应用（Modern Deep Learning Techniques Applied to Natural Language Processing）

现代深度学习技术在自然语言处理的应用（Modern Deep Learning Techniques Applied to Natural Language Processing）

专知会员服务

53+阅读 · 2020年4月7日

简明扼要！Python教程手册，206页pdf

简明扼要！Python教程手册，206页pdf

专知会员服务

48+阅读 · 2020年3月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】反事实推理在多模态对话生成中的应用

基于强化学习的智能体化搜索全面综述：基础、角色、优化、评估与应用

ICCV最佳论文出炉，朱俊彦团队用砖块积木摘得桂冠

面向具身操作的高效视觉–语言–动作模型：系统综述

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

相关论文

VerSe: A Vertebrae Labelling and Segmentation Benchmark for Multi-detector CT Images

Arxiv

0+阅读 · 2021年3月22日

Monolingual and Parallel Corpora for Kangri Low Resource Language

Arxiv

0+阅读 · 2021年3月22日

On Learning Universal Representations Across Languages

Arxiv

0+阅读 · 2021年3月22日

The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation

Arxiv

0+阅读 · 2021年3月20日

MuRIL: Multilingual Representations for Indian Languages

Arxiv

0+阅读 · 2021年3月19日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Evolutionary Data Measures: Understanding the Difficulty of Text Classification Tasks

Evolutionary Data Measures: Understanding the Difficulty of Text Classification Tasks

Arxiv

4+阅读 · 2018年11月5日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Word Translation Without Parallel Data

Arxiv

8+阅读 · 2018年1月30日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员