梯度空间通过减少多维度少于热的学习 (Few-Shot Learning by Dimensionality Reduction in Gradient Space)

Martin Gauch,Maximilian Beck,Thomas Adler,Dmytro Kotsur,Stefan Fiel,Hamid Eghbal-zadeh,Johannes Brandstetter,Johannes Kofler,Markus Holzleitner,Werner Zellinger,Daniel Klotz,Sepp Hochreiter,Sebastian Lehner

from arxiv, Accepted at Conference on Lifelong Learning Agents (CoLLAs) 2022. Code: https://github.com/ml-jku/subgd Blog post: https://ml-jku.github.io/subgd

We introduce SubGD, a novel few-shot learning method which is based on the recent finding that stochastic gradient descent updates tend to live in a low-dimensional parameter subspace. In experimental and theoretical analyses, we show that models confined to a suitable predefined subspace generalize well for few-shot learning. A suitable subspace fulfills three criteria across the given tasks: it (a) allows to reduce the training error by gradient flow, (b) leads to models that generalize well, and (c) can be identified by stochastic gradient descent. SubGD identifies these subspaces from an eigendecomposition of the auto-correlation matrix of update directions across different tasks. Demonstrably, we can identify low-dimensional suitable subspaces for few-shot learning of dynamical systems, which have varying properties described by one or few parameters of the analytical system description. Such systems are ubiquitous among real-world applications in science and engineering. We experimentally corroborate the advantages of SubGD on three distinct dynamical systems problem settings, significantly outperforming popular few-shot learning methods both in terms of sample efficiency and performance.

翻译：我们引入了SUGD, 这是一种新颖的微小学习方法,它基于最近的发现,即随机梯度梯度下降更新往往生活在一个低维参数子空间中。在实验和理论分析中,我们显示模型局限于一个合适的预设子空间,一般适用于几发学习。一个合适的子空间满足了三项特定任务的标准:(a) 允许通过梯度流减少培训错误,(b) 导致模型全面推广,和(c) 可以通过随机梯度下降来识别。SubGD从不同任务更新方向的自动加速矩阵的等离子空间中确定了这些子空间。显而易见,我们可以确定低维度合适的子空间,用于几发动态系统学习,这些系统具有分析系统描述的一个或几个参数所描述的不同特性。这些系统在科学和工程领域的实际应用中普遍存在。我们实验性地证实了SubGD在三个截然不同的动态系统问题设置上的优势,在抽样效率和性表现方面明显超出流行的微小学习方法。