We present \textsc{HowSumm}, a novel large-scale dataset for the task of query-focused multi-document summarization (qMDS), which targets the use-case of generating actionable instructions from a set of sources. This use-case is different from the use-cases covered in existing multi-document summarization (MDS) datasets and is applicable to educational and industrial scenarios. We employed automatic methods, and leveraged statistics from existing human-crafted qMDS datasets, to create \textsc{HowSumm} from wikiHow website articles and the sources they cite. We describe the creation of the dataset and discuss the unique features that distinguish it from other summarization corpora. Automatic and human evaluations of both extractive and abstractive summarization models on the dataset reveal that there is room for improvement. % in existing summarization models We propose that \textsc{HowSumm} can be leveraged to advance summarization research.
翻译:我们为以查询为焦点的多文档汇总任务(qMDS)展示了一个新的大型数据集 。 该数据集是针对从一组来源生成可操作指令的实用案例的。 该使用案例不同于现有多文档汇总数据集所涵盖的使用案例, 并适用于教育和工业情景。 我们采用了自动方法, 以及利用现有人造的 qMDS 数据集的杠杆统计数据, 从wikishow网站文章及其引用的来源中创建了\ textsc{How Summ} 。 我们描述数据集的创建, 并讨论将其与其他汇总组合区分的独特特征。 对数据集上的采掘和抽象合成模型的自动和人性评估显示有改进的余地。 在现有的汇总模型中,% 我们建议可以利用\ textsc{HowSumm} 来推进合成研究。