带有编码器-十进制变异器的零热控制生成器 (Zero-Shot Controlled Generation with Encoder-Decoder Transformers)

Controlling neural network-based models for natural language generation (NLG) has broad applications in numerous areas such as machine translation, document summarization, and dialog systems. Approaches that enable such control in a zero-shot manner would be of great importance as, among other reasons, they remove the need for additional annotated data and training. In this work, we propose novel approaches for controlling encoder-decoder transformer-based NLG models in a zero-shot manner. This is done by introducing three control knobs; namely, attention biasing, decoder mixing, and context augmentation, that are applied to these models at generation time. These knobs control the generation process by directly manipulating trained NLG models (e.g., biasing cross-attention layers) to realize the desired attributes in the generated outputs. We show that not only are these NLG models robust to such manipulations, but also their behavior could be controlled without an impact on their generation performance. These results, to the best of our knowledge, are the first of their kind. Through these control knobs, we also investigate the role of transformer decoder's self-attention module and show strong evidence that its primary role is maintaining fluency of sentences generated by these models. Based on this hypothesis, we show that alternative architectures for transformer decoders could be viable options. We also study how this hypothesis could lead to more efficient ways for training encoder-decoder transformer models.

翻译：控制自然语言生成的神经网络模型( NLG) 在机器翻译、文档总和和对话框等多个领域有着广泛的应用。能够以零发方式进行这种控制的方法非常重要, 因为除其他原因外, 这些方法可以消除额外附加附加说明的数据和培训的需要。在这项工作中, 我们提议以零发方式控制基于自然语言生成的神经网络模型( NLG) 。这样做的方法是引入三个控制 knobs; 即关注偏差、解码混合和上下文增强, 这些模型在生成时应用到这些模型。这些 knobs通过直接操控经过训练的 NLG 模型( 例如, 偏向交叉注意层) 来控制生成过程, 从而实现生成输出输出输出的预期属性。我们表明, 这些 NLG 模型不仅能适应这种操纵, 而且它们的行为也可以在不对其生成性能产生影响的情况下加以控制。这些结果, 也就是我们的知识中的首选选择。通过这些控制 knobsest 来控制 knobs, 我们还通过直接操控 knobs 模型来控制生成这些变压模型, 显示这些变压模型的模型的作用, 显示这些变压模型的模型是如何产生变压模型的。