Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never lead to any non-trivial generalization, despite models having sufficient capacity to fit the training data perfectly. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks.
翻译:可靠的概括化是安全 ML 和 AI 的核心。 但是,理解何时和如何将神经网络普遍化仍然是该领域最重要的尚未解决的问题之一。 在这项工作中,我们进行了广泛的实证研究(20'910模型,15项任务),以调查从计算理论中得出的见解能否预测实际中神经网络一般化的局限性。我们证明,根据乔姆斯基等级划分任务组别,使我们能够预测某些结构结构是否能够概括出无法分配的投入。这包括负面结果,即使大量的数据和培训时间也永远不会导致任何非三面性的一般化,尽管模型有足够的能力完全适应培训数据。我们的结果显示,对于我们的任务组别,RNN和变换器无法概括非常规任务,LSTMS能够解决常规和反语言的任务,只有结构记忆(如堆叠或记忆带)才能增强网络才能成功地普及无背景和对背景敏感的任务。</s>