Despite success in many domains, neural models struggle in settings where train and test examples are drawn from different distributions. In particular, in contrast to humans, conventional sequence-to-sequence (seq2seq) models fail to generalize systematically, i.e., interpret sentences representing novel combinations of concepts (e.g., text segments) seen in training. Traditional grammar formalisms excel in such settings by implicitly encoding alignments between input and output segments, but are hard to scale and maintain. Instead of engineering a grammar, we directly model segment-to-segment alignments as discrete structured latent variables within a neural seq2seq model. To efficiently explore the large space of alignments, we introduce a reorder-first align-later framework whose central component is a neural reordering module producing {\it separable} permutations. We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations, and, thus, enabling end-to-end differentiable training of our model. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks (i.e., semantic parsing and machine translation).
翻译:尽管在许多领域都取得了成功,但神经模型在从不同分布区块中从火车和试验实例中得出来的环境下挣扎。特别是,与人类相反,常规序列到序列(seq2saqeq)模型未能系统地加以概括,也就是说,对培训中看到的概念的新组合(例如文字部分)的句子进行解释;传统语法形式主义在这种环境中表现突出,在输入和产出部分之间暗含编码,对输入和产出部分进行校正,但很难缩放和保持。我们不是设计一个语法,而是直接将区段到组合的对齐作为神经后继2seq模型中的离散结构潜在变量来模拟。为了有效地探索大范围的校正空间,我们引入了一个重新排序-顺序第一对齐相对框架,其核心部分是一个神经重新排序模块,产生 ~it separable} perposulations。我们展示了一种高效的动态编程算法,以精确的边际推推推,从而使得我们模型的最终到可变异的训练。由此产生的后代号模型展示了更系统化的常规化和结构化任务(Segraphly some2q salmalalal) lagidudustration sal sutional subild sild sald sald salmalmalmalmalds)。