Moving towards human-like linguistic performance is often argued to require compositional generalisation. Whether neural networks exhibit this ability is typically studied using artificial languages, for which the compositionality of input fragments can be guaranteed and their meanings algebraically composed. However, compositionality in natural language is vastly more complex than this rigid, arithmetics-like version of compositionality, and as such artificial compositionality tests do not allow us to draw conclusions about how neural models deal with compositionality in more realistic scenarios. In this work, we re-instantiate three compositionality tests from the literature and reformulate them for neural machine translation (NMT). The results highlight two main issues: the inconsistent behaviour of NMT models and their inability to (correctly) modulate between local and global processing. Aside from an empirical study, our work is a call to action: we should rethink the evaluation of compositionality in neural networks of natural language, where composing meaning is not as straightforward as doing the math.
翻译:向类似人类的语言性能转变往往被论证为要求形成概括性。神经网络是否表现出这种能力,通常都是用人工语言来研究的,对于人工语言来说,输入碎片的构成性是可以保证的,其含义是代数组成的。然而,自然语言的构成性远比这种刻板的、像算术一样的构成性版本复杂得多,而且由于这种人为的构成性测试无法使我们得出关于神经模型如何在更现实的情景中处理构成性的结论。在这项工作中,我们从文献中重新对三次构成性测试进行反省,并将它们重新进行神经机器翻译(NMT ) 。结果突出了两个主要问题:NMT模型的不一致行为及其无法(正确)调整本地和全球处理过程。除了经验研究之外,我们的工作是呼吁采取行动:我们应该重新考虑对自然语言神经网络的构成性的评估,其中含意不象数学那样简单。