通过迭代线化学习深神经网络 (Learning Deep Neural Networks by Iterative Linearisation)

The excellent real-world performance of deep neural networks has received increasing attention. Despite the capacity to overfit significantly, such large models work better than smaller ones. This phenomenon is often referred to as the scaling law by practitioners. It is of fundamental interest to study why the scaling law exists and how it avoids/controls overfitting. One approach has been looking at infinite width limits of neural networks (e.g., Neural Tangent Kernels, Gaussian Processes); however, in practise, these do not fully explain finite networks as their infinite counterparts do not learn features. Furthermore, the empirical kernel for finite networks (i.e., the inner product of feature vectors), changes significantly during training in contrast to infinite width networks. In this work we derive an iterative linearised training method. We justify iterative lineralisation as an interpolation between finite analogs of the infinite width regime, which do not learn features, and standard gradient descent training which does. We show some preliminary results where iterative linearised training works well, noting in particular how much feature learning is required to achieve comparable performance. We also provide novel insights into the training behaviour of neural networks.

翻译：深神经网络在现实世界中的出色表现日益受到越来越多的关注。尽管这些大型模型具有显著的超额适应能力,但这类大型模型比较小的模型效果更好。这种现象通常被称为从业人员的缩放法。研究为什么存在按比例调整的法律,以及它如何避免/控制过度调整,具有根本意义。一种做法是研究神经网络的无限宽度限制(例如,神经唐氏内核、高西进程);然而,在实践中,这些并不完全解释有限的网络,因为它们无穷无尽的对应方没有学习特点。此外,有限网络的经验内核(即地貌矢量器的内部产品)在培训期间发生了重大变化,与无限宽度网络形成对比。我们在此工作中得出了一个反复的线性培训方法。我们有理由将迭代线性线性线性作为无限宽度系统(不学习特征)的有限类比和标准梯度下层培训之间的一种相互交错的内推法。我们展示了一些初步结果,在迭线性培训中取得了良好的效果,我们特别注意到需要多少特征学习才能取得可比较的绩效。我们还对神经网络的行为提供新的洞见。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日