There is mounting evidence that existing neural network models, in particular the very popular sequence-to-sequence architecture, struggle to systematically generalize to unseen compositions of seen components. We demonstrate that one of the reasons hindering compositional generalization relates to representations being entangled. We propose an extension to sequence-to-sequence models which encourages disentanglement by adaptively re-encoding (at each time step) the source input. Specifically, we condition the source representations on the newly decoded target context which makes it easier for the encoder to exploit specialized information for each prediction rather than capturing it all in a single forward pass. Experimental results on semantic parsing and machine translation empirically show that our proposal delivers more disentangled representations and better generalization.
翻译:越来越多的证据表明,现有的神经网络模型,特别是非常流行的序列到序列结构,正努力系统地将所见部件的无形构成归纳为普通化。我们证明,阻碍合成概括化的原因之一是陈述被纠缠在一起。我们提议扩展顺序到序列模型,这种模型鼓励通过适应性地重新编码(每一步)源输入而解析。具体地说,我们把源代码表达设置在新解码的目标上,使编码者更容易为每一项预测利用专门信息,而不是在单一的远端通道中捕捉全部信息。关于语义分析的实验结果和机器翻译经验显示,我们的提案提供了更加分解的表述和更好的概括化。