Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax." In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can make successful zero-shot generalizations when the differences between training and test commands are small, so that they can apply "mix-and-match" strategies to solve the task. However, when generalization requires systematic compositional skills (as in the "dax" example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, suggesting that lack of systematicity might be partially responsible for neural networks' notorious training data thirst.
翻译:人类可以不费力地理解并产生新的发音, 因为他们的构成技能。 一旦一个人学会了一个新的动词“ 达克斯” 的含义, 他或她可以立即理解“ 达克斯 两次” 或“ 萨克斯 ” 或“ 达克斯 ” 的含义。 在本文中, 我们引入 SCAN 域, 由一套简单的组成导航命令组成, 并配以相应的动作序列。 然后我们用从顺序到顺序的方法测试在 SCAN 上培训过的各种经常性神经网络( RNNS ) 的零光谱化能力。 我们发现, 当培训和测试命令之间的差异很小时, RNN 就可以成功进行零光化的概括化, 这样他们就可以应用“ 混合和 匹配” 战略来解决问题。 但是, 当一般化需要系统化的构成技能( 如上面的“ 达克斯 ” 示例), RNNNS 就会大失常。 我们最后用神经机翻译的校准实验来结束。 我们发现, 缺乏系统性可能是神经网络的臭的训练数据缺乏的部分责任。