Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (2200 models, 16 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never led to any non-trivial generalization, despite models having sufficient capacity to perfectly fit the training data. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks.
翻译:可靠的概括化是安全 ML 和 AI 的核心。 但是,理解何时和如何将神经网络普遍化仍然是该领域最重要的尚未解决的问题之一。 在这项工作中,我们开展了一项广泛的实证研究(2200个模型,16项任务),以调查从计算理论中得出的洞察力能否预测实践中神经网络一般化的局限性。我们证明,根据乔姆斯基等级划分的分组任务能够让我们预测某些结构是否能够概括化分配之外的投入。这包括负面结果,即使大量的数据和培训时间也从未导致任何非三重的概括化,尽管模型有足够的能力完全适应培训数据。我们的结果显示,对于我们的任务组别,RNN和变形器无法概括非常规任务,LSTMs能够解决常规和反语言的任务,只有结构记忆(如堆叠或记忆带)增强的网络才能成功地普及无背景和对背景敏感的任务。