俄罗斯新闻自动汇总数据集 (Dataset for Automatic Summarization of Russian News) - 专知论文

会员服务 ·

0

自动摘要 · 数据集 · MoDELS · 自然语言处理 ·

2021 年 10 月 5 日

Dataset for Automatic Summarization of Russian News

翻译：俄罗斯新闻自动汇总数据集

from arxiv, Version 4, October 2021, corrected BLEU scores

Automatic text summarization has been studied in a variety of domains and languages. However, this does not hold for the Russian language. To overcome this issue, we present Gazeta, the first dataset for summarization of Russian news. We describe the properties of this dataset and benchmark several extractive and abstractive models. We demonstrate that the dataset is a valid task for methods of text summarization for Russian. Additionally, we prove the pretrained mBART model to be useful for Russian text summarization.

翻译：已在多个领域和语言中研究过自动文本汇总。但是, 这对于俄语来说并不有效。为了解决这个问题, 我们介绍俄罗斯新闻汇总的第一个数据集Gazeta。我们描述该数据集的属性, 并设定若干采掘和抽象模型的基准。我们证明该数据集是俄罗斯文本汇总方法的有效任务。此外, 我们证明预先训练的 mBART 模型对俄罗斯文本汇总有用。

0

相关内容

自动摘要

就是说在不改变文档原意的情况下，利用计算机程序自动地总结出文档的主要内容。自动摘要的应用场景非常多，例如新闻标题生成、科技文献摘要生成、搜索结果片段（snippets）生成、商品评论摘要等。

自然语言生成综述

专知会员服务

65+阅读 · 2021年5月29日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

专知会员服务

60+阅读 · 2020年5月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

已删除

将门创投

6+阅读 · 2019年11月21日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

A Survey on Multi-modal Summarization

Arxiv

49+阅读 · 2021年9月11日

Few-Shot Text Generation with Pattern-Exploiting Training

Arxiv

3+阅读 · 2020年12月22日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Fine-tune BERT for Extractive Summarization

Arxiv

21+阅读 · 2019年3月25日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Multi-Reward Reinforced Summarization with Saliency and Entailment

Arxiv

4+阅读 · 2018年4月17日

Deep Communicating Agents for Abstractive Summarization

Arxiv

5+阅读 · 2018年3月27日

Generating Wikipedia by Summarizing Long Sequences

Arxiv

7+阅读 · 2018年1月30日

Graph Summarization: A Survey

Arxiv

5+阅读 · 2017年4月12日

VIP会员

文章信息

相关主题

自然语言处理

相关VIP内容

自然语言生成综述

专知会员服务

65+阅读 · 2021年5月29日

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

【KDD2020】CAST:一种基于相关关系的多尺度数据自适应光谱聚类算法,CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

专知会员服务

20+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

【2020关键词提取】基于深度神经网络的关键词提取，Keywords extraction with deep neural network model

专知会员服务

60+阅读 · 2020年5月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

已删除

将门创投

6+阅读 · 2019年11月21日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Survey on Multi-modal Summarization

Arxiv

49+阅读 · 2021年9月11日

Few-Shot Text Generation with Pattern-Exploiting Training

Arxiv

3+阅读 · 2020年12月22日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Fine-tune BERT for Extractive Summarization

Arxiv

21+阅读 · 2019年3月25日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Multi-Reward Reinforced Summarization with Saliency and Entailment

Arxiv

4+阅读 · 2018年4月17日

Deep Communicating Agents for Abstractive Summarization

Arxiv

5+阅读 · 2018年3月27日

Generating Wikipedia by Summarizing Long Sequences

Arxiv

7+阅读 · 2018年1月30日

Graph Summarization: A Survey

Arxiv

5+阅读 · 2017年4月12日

微信扫码咨询专知VIP会员