Existing abstractive summarization models lack explicit control mechanisms that would allow users to influence the stylistic features of the model outputs. This results in generating generic summaries that do not cater to the users needs or preferences. To address this issue we introduce HydraSum, a new summarization architecture that extends the single decoder framework of current models, e.g. BART, to a mixture-of-experts version consisting of multiple decoders. Our proposed model encourages each expert, i.e. decoder, to learn and generate stylistically-distinct summaries along dimensions such as abstractiveness, length, specificity, and others. At each time step, HydraSum employs a gating mechanism that decides the contribution of each individual decoder to the next token's output probability distribution. Through experiments on three summarization datasets (CNN, Newsroom, XSum), we demonstrate that this gating mechanism automatically learns to assign contrasting summary styles to different HydraSum decoders under the standard training objective without the need for additional supervision. We further show that a guided version of the training process can explicitly govern which summary style is partitioned between decoders, e.g. high abstractiveness vs. low abstractiveness or high specificity vs. low specificity, and also increase the stylistic-difference between individual decoders. Finally, our experiments demonstrate that our decoder framework is highly flexible: during inference, we can sample from individual decoders or mixtures of different subsets of the decoders to yield a diverse set of summaries and enforce single- and multi-style control over summary generation.
翻译:现有的抽象总和模型缺乏明确的控制机制, 使用户能够影响模型输出的立体特征。 这导致生成不符合用户需要或偏好的通用摘要。 为了解决这个问题, 我们引入了HydrySum, 这是一种将当前模型的单一解码框架( 如BART) 扩展为由多个解码器组成的专家混合版本的新的统称架构。 我们提议的模型鼓励每个专家, 即解码器, 学习并生成与抽象性、 长度、 特殊性及其他维度等维度不相容的不同摘要。 每次步骤, HydSum 使用一个标签机制, 来决定每个个体解析器对下一个代号输出概率概率概率分布的贡献。 我们通过对三个加和解码数据集( CNN、 Newsroom、 XSum) 的实验, 显示, 在标准培训框架下, 将对比式摘要样式指定为不同的流体, 无需额外的监督。 我们进一步显示, 导式的培训过程可以在每个缩式的精度中清晰度中, 显示一个高清晰度的解度解度, 的精度解度, 也可以在高分解的精度之间 。