Controllable summarization allows users to generate customized summaries with specified attributes. However, due to the lack of designated annotations of controlled summaries, existing works have to craft pseudo datasets by adapting generic summarization benchmarks. Furthermore, most research focuses on controlling single attributes individually (e.g., a short summary or a highly abstractive summary) rather than controlling a mix of attributes together (e.g., a short and highly abstractive summary). In this paper, we propose MACSum, the first human-annotated summarization dataset for controlling mixed attributes. It contains source texts from two domains, news articles and dialogues, with human-annotated summaries controlled by five designed attributes (Length, Extractiveness, Specificity, Topic, and Speaker). We propose two simple and effective parameter-efficient approaches for the new task of mixed controllable summarization based on hard prompt tuning and soft prefix tuning. Results and analysis demonstrate that hard prompt models yield the best performance on all metrics and human evaluations. However, mixed-attribute control is still challenging for summarization tasks. Our dataset and code are available at https://github.com/psunlpgroup/MACSum.
翻译:控制总和使用户能够生成带有特定属性的定制摘要。然而,由于缺乏受控摘要的指定说明,现有工作必须通过调整通用总和基准来编造假数据集。此外,大多数研究侧重于单独控制单一属性(如简短摘要或高度抽象摘要),而不是同时控制各种属性的组合(如简短和高度抽象摘要)。在本文件中,我们提议了用于控制混合属性的首个人类附加说明的人类总和数据集MACSum。它包含两个域的源文本,即新闻文章和对话,由五个设计属性(Length、采掘业、特性、专题和演讲人)来控制带有附加说明的人类摘要。我们建议了两种简单有效的参数效率方法,用于在硬性调整和软性前置调的基础上进行混合可控的组合。结果和分析表明,硬性快速模型能够产生所有计量和人类评价的最佳性能。但是,混合分配控制仍然对合成任务构成挑战。我们的数据设置和代码可在 https://gimpussum/comm.