We consider a new problem of few-shot learning of compact models. Meta-learning is a popular approach for few-shot learning. Previous work in meta-learning typically assumes that the model architecture during meta-training is the same as the model architecture used for final deployment. In this paper, we challenge this basic assumption. For final deployment, we often need the model to be small. But small models usually do not have enough capacity to effectively adapt to new tasks. In the mean time, we often have access to the large dataset and extensive computing power during meta-training since meta-training is typically performed on a server. In this paper, we propose task-specific meta distillation that simultaneously learns two models in meta-learning: a large teacher model and a small student model. These two models are jointly learned during meta-training. Given a new task during meta-testing, the teacher model is first adapted to this task, then the adapted teacher model is used to guide the adaptation of the student model. The adapted student model is used for final deployment. We demonstrate the effectiveness of our approach in few-shot image classification using model-agnostic meta-learning (MAML). Our proposed method outperforms other alternatives on several benchmark datasets.
翻译:我们考虑的是对紧凑模型进行微小的学习的新问题。 元学习是一种流行的方法。 元学习是一种流行的方法。 元学习的以往工作通常假定元培训中的模型结构与用于最终部署的模型结构相同。 在本文中,我们质疑这一基本假设。 为了最终部署,我们往往需要模型小一些。 但是小模型通常没有足够的能力来有效地适应新的任务。 在中间时期,我们常常在元培训期间有机会获得大型数据集和广泛的计算能力,因为元培训通常在服务器上进行。 在本文中,我们建议进行任务特定的元学习,同时在元学习中学习两个模型:一个大型教师模型和一个小型学生模型。这两个模型是在元培训期间共同学习的。在元测试期间,由于一项新任务,教师模型首先适应了这项任务,然后使用经调整的教师模型来指导学生模型的适应。 调整后的学生模型用于最终部署。 我们用模型来展示了我们采用其他几张图像分类方法的实效。