带有编码器-十进制变异器的零热控制生成器 (Zero-Shot Controlled Generation with Encoder-Decoder Transformers)

Controlling neural network-based models for natural language generation (NLG) has broad applications in numerous areas such as machine translation, document summarization, and dialog systems. Approaches that enable such control in a zero-shot manner would be of great importance as, among other reasons, they remove the need for additional annotated data and training. In this work, we propose novel approaches for controlling encoder-decoder transformer-based NLG models in zero-shot. This is done by introducing three control knobs, namely, attention biasing, decoder mixing, and context augmentation, that are applied to these models at generation time. These knobs control the generation process by directly manipulating trained NLG models (e.g., biasing cross-attention layers) to realize the desired attributes in the generated outputs. We show that not only are these NLG models robust to such manipulations, but also their behavior could be controlled without an impact on their generation performance. These results, to the best of our knowledge, are the first of their kind. Through these control knobs, we also investigate the role of transformer decoder's self-attention module and show strong evidence that its primary role is maintaining fluency of sentences generated by these models. Based on this hypothesis, we show that alternative architectures for transformer decoders could be viable options. We also study how this hypothesis could lead to more efficient ways for training encoder-decoder transformer models.

翻译：控制自然语言生成的神经网络模型( NLG) 在机器翻译、文档总和和对话框等多个领域有着广泛的应用。能够以零发方式进行这种控制的方法非常重要, 因为除其他原因外, 这些方法可以消除额外附加附加说明的数据和培训的需求。在这项工作中, 我们提出新的方法来控制基于自然语言生成的神经网络模型( NLG) 。这样做的方法是引入三种控制工具, 即, 注意偏向、解码混合和上下文增强, 适用于这些模型的生成时间。这些 knobs 通过直接操控经过训练的 NLG 模型( 例如, 偏向交叉注意层) 来控制生成过程, 以实现生成输出输出输出的预期属性。我们表明, 这些 NLGG 模型不仅能够适应这种操纵, 而且它们的行为也可以在不对其生成性能产生影响的情况下加以控制。这些结果, 就我们的知识而言, 是它们类型的第一种类型。通过这些强大的控制 knobs, 我们还通过直接操控 NLG 模式来控制生成这些变压模型, 显示这些变压模型的模型的作用, 显示这些变压模型的模型的变压模型的模型是如何显示我们的主要变压模型。