Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (10250 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never lead to any non-trivial generalization, despite models having sufficient capacity to fit the training data perfectly. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks.
翻译:可靠的概括化是安全 ML 和 AI 的核心。 但是,理解何时和如何将神经网络普遍化仍然是该领域最重要的尚未解决的问题之一。 在这项工作中,我们进行了广泛的实证研究(10250个模型,15项任务),以调查从计算理论中得出的洞察力能否预测实践中神经网络一般化的局限性。我们证明,根据乔姆斯基等级划分的分组任务能够让我们预测某些结构是否能够笼统化分配外的投入。这包括负面结果,即使大量的数据和培训时间也永远不会导致任何非三重的概括化,尽管模型有足够的能力完全适应培训数据。我们的结果显示,对于我们的任务组别而言,RNN和变换器无法概括非常规任务,LSTMs能够解决常规和反语言的任务,只有结构上的记忆(如堆放或记忆带)才能扩展网络才能成功地概括无背景和对背景敏感的任务。