XSCITLDR: 学术文件的跨语言极端摘要 (X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents) - 专知论文

会员服务 ·

0

讲稿 · MoDELS · 文本数据挖掘 · INFORMS · MINE ·

2022 年 5 月 30 日

X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

翻译：XSCITLDR: 学术文件的跨语言极端摘要

Sotaro Takeshita,Tommaso Green,Niklas Friedrich,Kai Eckert,Simone Paolo Ponzetto

from arxiv, JCDL2022

The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Consequently, recent work on applying text mining technologies for scholarly publications has investigated the application of automatic text summarization technologies, including extreme summarization, for this domain. However, previous work has concentrated only on monolingual settings, primarily in English. In this paper, we fill this research gap and present an abstractive cross-lingual summarization dataset for four different languages in the scholarly domain, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different models based on a state-of-the-art multilingual pre-trained model, including a two-stage `summarize and translate' approach and a direct cross-lingual model. We additionally explore the benefits of intermediate-stage training using English monolingual summarization and machine translation as intermediate tasks and analyze performance in zero- and few-shot scenarios.

翻译：目前科学出版物的数量正在迅速增加,给研究人员造成信息超负荷,使学者难以跟上当前趋势和工作路线,因此,最近关于将文本采矿技术应用于学术出版物的工作调查了该领域自动文本汇总技术的应用情况,包括极端摘要化,然而,以前的工作仅集中于单语环境,主要是英语;在本文件中,我们填补了这一研究空白,为学术领域的四种不同语文提供了抽象的跨语言汇总数据集,使我们能够训练和评价处理英文论文的模型,并制作德文、意大利文、中文和日文摘要;我们介绍了我们新的X-SCITLDR数据集,用于多语汇总,并根据最新多语种预先培训模式,对不同模式进行彻底基准,包括两阶段“总结和翻译”办法和直接跨语言模式;我们还探讨了利用英语单语合成和机器翻译作为中间任务进行中期培训的好处,并分析了零和几近情景的绩效。

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

诊断超声击破载Aβ抗体微泡联合NSCs跨BBB治疗阿尔茨海默病的研究

国家自然科学基金

0+阅读 · 2014年12月31日

农户生产行为对农业面源污染的影响及控制对策研究

国家自然科学基金

0+阅读 · 2013年12月31日

表达反式嵌合性抗原受体的T细胞抗肿瘤作用的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

荒漠绿洲区景观格局与生态水文耦合及调控

国家自然科学基金

0+阅读 · 2012年12月31日

施肥至农田土壤的抗生素抗性基因的污染特征和传播扩散机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

蔬果类商品网上直销的“农—宅”配送车辆路径方案在线智能生成方法

国家自然科学基金

0+阅读 · 2012年12月31日

百脉根AP2/ERF转录因子LcSRA1耐盐胁迫应答的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

遗传性LCAT缺陷症抗动脉粥样硬化发生的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

黑河流域生态－水文过程综合遥感观测试验：航空光学遥感

国家自然科学基金

0+阅读 · 2011年12月31日

IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks

Arxiv

0+阅读 · 2022年7月16日

Current Trends in Deep Learning for Earth Observation: An Open-source Benchmark Arena for Image Classification

Arxiv

0+阅读 · 2022年7月14日

Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation

Arxiv

0+阅读 · 2022年7月14日

A comparison of latent semantic analysis and correspondence analysis of document-term matrices

Arxiv

0+阅读 · 2022年7月14日

TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents

Arxiv

0+阅读 · 2022年7月14日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

文本数据挖掘

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks

Arxiv

0+阅读 · 2022年7月16日

Current Trends in Deep Learning for Earth Observation: An Open-source Benchmark Arena for Image Classification

Arxiv

0+阅读 · 2022年7月14日

Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation

Arxiv

0+阅读 · 2022年7月14日

A comparison of latent semantic analysis and correspondence analysis of document-term matrices

Arxiv

0+阅读 · 2022年7月14日

TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents

Arxiv

0+阅读 · 2022年7月14日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

相关基金

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

诊断超声击破载Aβ抗体微泡联合NSCs跨BBB治疗阿尔茨海默病的研究

国家自然科学基金

0+阅读 · 2014年12月31日

农户生产行为对农业面源污染的影响及控制对策研究

国家自然科学基金

0+阅读 · 2013年12月31日

表达反式嵌合性抗原受体的T细胞抗肿瘤作用的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

荒漠绿洲区景观格局与生态水文耦合及调控

国家自然科学基金

0+阅读 · 2012年12月31日

施肥至农田土壤的抗生素抗性基因的污染特征和传播扩散机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

蔬果类商品网上直销的“农—宅”配送车辆路径方案在线智能生成方法

国家自然科学基金

0+阅读 · 2012年12月31日

百脉根AP2/ERF转录因子LcSRA1耐盐胁迫应答的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

遗传性LCAT缺陷症抗动脉粥样硬化发生的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

黑河流域生态－水文过程综合遥感观测试验：航空光学遥感

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员