Summarization systems make numerous "decisions" about summary properties during inference, e.g. degree of copying, specificity and length of outputs, etc. However, these are implicitly encoded within model parameters and specific styles cannot be enforced. To address this, we introduce HydraSum, a new summarization architecture that extends the single decoder framework of current models to a mixture-of-experts version with multiple decoders. We show that HydraSum's multiple decoders automatically learn contrasting summary styles when trained under the standard training objective without any extra supervision. Through experiments on three summarization datasets (CNN, Newsroom and XSum), we show that HydraSum provides a simple mechanism to obtain stylistically-diverse summaries by sampling from either individual decoders or their mixtures, outperforming baseline models. Finally, we demonstrate that a small modification to the gating strategy during training can enforce an even stricter style partitioning, e.g. high- vs low-abstractiveness or high- vs low-specificity, allowing users to sample from a larger area in the generation space and vary summary styles along multiple dimensions.
翻译:简略系统在推断过程中对摘要属性做出许多“决定”,例如复制的程度、特性和产出长度等。然而,这些是在模型参数和具体风格中暗含编码的。为此,我们引入了HydraSum,这是一个新的总称结构,将当前模型的单一解码框架扩展为专家混合版本,并配有多个解码器。我们显示,在按照标准培训目标培训时,HydraSum的多个解码器自动学习对比摘要样式,而无需任何额外的监督。通过对三个汇总数据集(CNN、Newsroom和XSum)的实验,我们显示,HyalSum提供了一个简单的机制,通过对单个解码器或其混合物进行取样,将当前模型的单一解码框架扩展为多种解码模型。最后,我们证明,在培训期间对格调战略作小小改动,可以实施更为严格的风格分隔,例如高调低调或高调低调,或低调低调,允许用户从一代空间的更大区域进行抽样,并沿多种类型进行总结。