在宽阔神经网络中出现代表性分裂症 (Representation mitosis in wide neural networks)

Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that exactly interpolates its training data will typically improve its generalisation performance. Explaining the mechanism behind the benefit of such over-parameterisation is an outstanding challenge for deep learning theory. Here, we study the last layer representation of various deep architectures such as Wide-ResNets for image classification and find evidence for an underlying mechanism that we call *representation mitosis*: if the last hidden representation is wide enough, its neurons tend to split into groups which carry identical information, and differ from each other only by a statistically independent noise. Like in a mitosis process, the number of such groups, or ``clones'', increases linearly with the width of the layer, but only if the width is above a critical value. We show that a key ingredient to activate mitosis is continuing the training process until the training error is zero. Finally, we show that in one of the learning tasks we considered, a wide model with several automatically developed clones performs significantly better than a deep ensemble based on architectures in which the last layer has the same size as the clones.

翻译：深心神经网络(DNNs) 无视经典的偏差偏差取舍: 给 DNNN增加参数, 确切地将其培训数据插入内部, 会改善它的概括性表现。解释这种超分化的好处背后的机制对于深层学习理论来说是一个突出的挑战。在这里, 我们研究各种深层结构的最后一层代表结构, 如用于图像分类的宽度, 并找到我们称之为“ 代表性分裂” 的基本机制的证据 : 如果最后一个隐含的表达面足够宽, 其神经元往往分裂成一个组, 包含相同的信息, 并且只有统计上独立的噪音才不同。就像一个线性过程一样, 这些组的数目, 或者“ 克隆”, 与层的宽度相比, 线性会增加, 但只有当宽度超过一个关键值时。我们显示, 激活线性分裂的关键成分会持续到培训错误为零。最后, 我们发现, 在一项我们考虑的学习任务中, 一个宽的模型, 有多个自动开发的克隆的宽度模型, 其表现得大大高于基于上层结构的深层。

相关内容

Neural Networks

关注 1645

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

专知会员服务

78+阅读 · 2020年5月31日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日