HowSumm: 从 WikiHow 文章中提取的多文档汇总数据集 (HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles) - 专知论文

会员服务 ·

0

WikiHow · 数据集 · MoDELS · 统计量 · 情景 ·

2021 年 10 月 7 日

HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

翻译：HowSumm: 从 WikiHow 文章中提取的多文档汇总数据集

Odellia Boni,Guy Feigenblat,Guy Lev,Michal Shmueli-Scheuer,Benjamin Sznajder,David Konopnicki

from arxiv, 8 pages, 4 figures, 5 tables. HowSumm dataset is publicly available at \url{https://ibm.biz/BdfhzH}

We present \textsc{HowSumm}, a novel large-scale dataset for the task of query-focused multi-document summarization (qMDS), which targets the use-case of generating actionable instructions from a set of sources. This use-case is different from the use-cases covered in existing multi-document summarization (MDS) datasets and is applicable to educational and industrial scenarios. We employed automatic methods, and leveraged statistics from existing human-crafted qMDS datasets, to create \textsc{HowSumm} from wikiHow website articles and the sources they cite. We describe the creation of the dataset and discuss the unique features that distinguish it from other summarization corpora. Automatic and human evaluations of both extractive and abstractive summarization models on the dataset reveal that there is room for improvement. % in existing summarization models We propose that \textsc{HowSumm} can be leveraged to advance summarization research.

翻译：我们为以查询为焦点的多文档汇总任务(qMDS)展示了一个新的大型数据集。该数据集是针对从一组来源生成可操作指令的实用案例的。该使用案例不同于现有多文档汇总数据集所涵盖的使用案例, 并适用于教育和工业情景。我们采用了自动方法, 以及利用现有人造的 qMDS 数据集的杠杆统计数据, 从wikishow网站文章及其引用的来源中创建了\ textsc{How Summ} 。我们描述数据集的创建, 并讨论将其与其他汇总组合区分的独特特征。对数据集上的采掘和抽象合成模型的自动和人性评估显示有改进的余地。在现有的汇总模型中,% 我们建议可以利用\ textsc{HowSumm} 来推进合成研究。

0

相关内容

WikiHow

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

学术报告|港科大助理教授宋阳秋博士

学术报告|港科大助理教授宋阳秋博士

科技创新与创业

7+阅读 · 2019年7月19日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

2018年中科院JCR分区发布！

2018年中科院JCR分区发布！

材料科学与工程

3+阅读 · 2018年12月11日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

An analysis of document graph construction methods for AMR summarization

Arxiv

0+阅读 · 2021年11月27日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

Arxiv

8+阅读 · 2020年3月23日

Keyphrase Generation for Scientific Articles using GANs

Keyphrase Generation for Scientific Articles using GANs

Arxiv

8+阅读 · 2019年9月24日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Arxiv

5+阅读 · 2018年7月16日

Deep Communicating Agents for Abstractive Summarization

Arxiv

5+阅读 · 2018年3月27日

Generating Wikipedia by Summarizing Long Sequences

Arxiv

7+阅读 · 2018年1月30日

Graph Summarization: A Survey

Arxiv

5+阅读 · 2017年4月12日

VIP会员

文章信息

相关主题

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

最新《扩散模型原理》新书，470页pdf

无人机作战：演进、创新与未来战场

AI 智能体简史

多模态空间推理在大模型时代：综述与基准测试

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

学术报告|港科大助理教授宋阳秋博士

学术报告|港科大助理教授宋阳秋博士

科技创新与创业

7+阅读 · 2019年7月19日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

2018年中科院JCR分区发布！

2018年中科院JCR分区发布！

材料科学与工程

3+阅读 · 2018年12月11日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

相关论文

An analysis of document graph construction methods for AMR summarization

Arxiv

0+阅读 · 2021年11月27日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

Arxiv

8+阅读 · 2020年3月23日

Keyphrase Generation for Scientific Articles using GANs

Keyphrase Generation for Scientific Articles using GANs

Arxiv

8+阅读 · 2019年9月24日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Arxiv

5+阅读 · 2018年7月16日

Deep Communicating Agents for Abstractive Summarization

Arxiv

5+阅读 · 2018年3月27日

Generating Wikipedia by Summarizing Long Sequences

Arxiv

7+阅读 · 2018年1月30日

Graph Summarization: A Survey

Arxiv

5+阅读 · 2017年4月12日

微信扫码咨询专知VIP会员