通过辅助序列预测任务诱导变异器的构成性一般化能力 (Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks)

Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions. However, existing neural models have been shown to lack this basic ability in learning symbolic structures. Motivated by the failure of a Transformer model on the SCAN compositionality challenge (Lake and Baroni, 2018), which requires parsing a command into actions, we propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics, as additional training supervision. These automatically-generated sequences are more representative of the underlying compositional symbolic structures of the input data. During inference, the model jointly predicts the next action and the next tokens in the auxiliary sequences at each step. Experiments on the SCAN dataset show that our method encourages the Transformer to understand compositional structures of the command, improving its accuracy on multiple challenging splits from <= 10% to 100%. With only 418 (5%) training instances, our approach still achieves 97.8% accuracy on the MCD1 split. Therefore, we argue that compositionality can be induced in Transformers given minimal but proper guidance. We also show that a better result is achieved using less contextualized vectors as the attention's query, providing insights into architecture choices in achieving systematic compositionality. Finally, we show positive generalization results on the groundedSCAN task (Ruis et al., 2020). Our code is publicly available at: https://github.com/jiangycTarheel/compositional-auxseq

翻译：系统性的构成性是人类语言的一个基本机制, 使得已知部件的重新组合能够创建新表达式。然而, 现有的神经模型显示缺乏学习符号结构的基本能力。受 SCAN 构成性挑战的变换模型( Lake 和 Baroni, 2018) 失败的驱动, 需要将命令分为行动, 我们提议了两个辅助序列预测任务, 以跟踪功能的进展和参数语义, 作为额外的培训监督。这些自动生成的序列更能代表输入数据的基本组成性象征结构。但是, 在推断期间, 模型共同预测了每个步骤的辅助序列中的下一个动作和下一个象征。 SCAN 数据集的实验显示, 我们的方法鼓励变换者理解命令的构成结构, 提高它从 10% 到 100% 的多重挑战性分裂的准确性。在培训中只有 418 ( 5%), 我们的方法仍然在 MCD1 和的 com 分割上达到97.8% 的精确性。因此, 我们说, 在变换时, 以最小但正确的方向显示我们的配置/ 方向, 显示实现直观。我们还显示一个更好的结果。在平局平局上显示比较的结果。