神经网络和乔姆斯基等级制度</s> (Neural Networks and the Chomsky Hierarchy)

Grégoire Delétang,Anian Ruoss,Jordi Grau-Moya,Tim Genewein,Li Kevin Wenliang,Elliot Catt,Chris Cundy,Marcus Hutter,Shane Legg,Joel Veness,Pedro A. Ortega

Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never lead to any non-trivial generalization, despite models having sufficient capacity to fit the training data perfectly. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks.

翻译：可靠的概括化是安全 ML 和 AI 的核心。但是,理解何时和如何将神经网络普遍化仍然是该领域最重要的尚未解决的问题之一。在这项工作中,我们进行了广泛的实证研究(20'910模型,15项任务),以调查从计算理论中得出的见解能否预测实际中神经网络一般化的局限性。我们证明,根据乔姆斯基等级划分任务组别,使我们能够预测某些结构结构是否能够概括出无法分配的投入。这包括负面结果,即使大量的数据和培训时间也永远不会导致任何非三面性的一般化,尽管模型有足够的能力完全适应培训数据。我们的结果显示,对于我们的任务组别,RNN和变换器无法概括非常规任务,LSTMS能够解决常规和反语言的任务,只有结构记忆(如堆叠或记忆带)才能增强网络才能成功地普及无背景和对背景敏感的任务。</s>

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日