Modern machine learning requires system designers to specify aspects of the learning pipeline, such as losses, architectures, and optimizers. Meta-learning, or learning-to-learn, instead aims to learn those aspects, and promises to unlock greater capabilities with less manual effort. One particularly ambitious goal of meta-learning is to train general-purpose in-context learning algorithms from scratch, using only black-box models with minimal inductive bias. Such a model takes in training data, and produces test-set predictions across a wide range of problems, without any explicit definition of an inference model, training loss, or optimization algorithm. In this paper we show that Transformers and other black-box models can be meta-trained to act as general-purpose in-context learners. We characterize phase transitions between algorithms that generalize, algorithms that memorize, and algorithms that fail to meta-train at all, induced by changes in model size, number of tasks, and meta-optimization. We further show that the capabilities of meta-trained algorithms are bottlenecked by the accessible state size (memory) determining the next prediction, unlike standard models which are thought to be bottlenecked by parameter count. Finally, we propose practical interventions such as biasing the training distribution that improve the meta-training and meta-generalization of general-purpose learning algorithms.
翻译:现代机器学习要求系统设计者指定学习管道的方方面面,如损失、建筑和优化。 元学习或学习到学习,相反的目的是学习这些方面,并承诺以较少人工努力释放更大的能力。 元学习的一个特别雄心勃勃的目标是从零开始培训一般目的的内流学习算法,只使用带有最小感知偏差的黑盒模型。这种模型在培训数据中采用,并在一系列广泛的问题中进行测试集成预测,而没有对推论模型、培训损失或优化算法作任何明确定义。 在本文中,我们显示变换器和其他黑盒模型可以接受元培训,以作为通用的同源学习者。 我们把演算法分为两个阶段,一是通用算法,一是记忆的算法,二是模型规模、任务数量和元理解性算法。 我们进一步表明,元训练算算法的能力被可理解的状态大小(模范式)所束缚。 我们的模型和元化算算算法,最后是标准式的变校程, 与模型不同的是,我们思考的公式比标准的变式模型,最后的算方法是用来分析。