Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-by-problem basis. This selective sparsity results in better generalization and less interference in a range of few-shot and continual learning problems. Moreover, we find that sparse learning also emerges in a more expressive model where learning rates are meta-learned. Our results shed light on an ongoing debate on whether meta-learning can discover adaptable features and suggest that learning by sparse gradient descent is a powerful inductive bias for meta-learning systems.
翻译:很难从小数据集中找到能通融的神经网络权重。 一个有希望的方法是学习权重初始化,这样一小撮重量变化会导致低一般化错误。 我们表明,通过让学习算法来决定改变的权重,即通过学习在哪里学习,可以改进这种形式的元学习。我们发现,模式化的宽度从这一过程中产生,在问题逐个出现的基础上,零散的模式各不相同。这种选择性的宽度导致更普遍的化,减少对一系列少见和持续学习问题的干扰。此外,我们发现,稀疏的学习也出现在一个更直观的模型中,学习率是元化的。我们的结果揭示了正在进行的辩论,即元学习能否发现可适应的特征,并表明微小的梯根系的学习对于元学习系统来说是一种强大的诱导偏差。