## Unsupervised Learning via Meta-Learning

2019 年 1 月 3 日 CreateAMind

Unsupervised Learning via Meta-Learning

Kyle Hsu, Sergey Levine, Chelsea Finn

arXiv preprint

Abstract

A central goal of unsupervised learning is to acquire representations from unlabeled data or experience that can be used for more effective learning of downstream tasks from modest amounts of labeled data. Many prior unsupervised learning works aim to do so by developing proxy objectives based on reconstruction, disentanglement, prediction, and other metrics. Instead, we develop an unsupervised learning method that explicitly optimizes for the ability to learn a variety of tasks from small amounts of data. To do so, we construct tasks from unlabeled data in an automatic way and run meta-learning over the constructed tasks. Surprisingly, we find that, when integrated with meta-learning, relatively simple mechanisms for task design, such as clustering unsupervised representations, lead to good performance on a variety of downstream tasks. Our experiments across four image datasets indicate that our unsupervised meta-learning approach acquires a learning algorithm without any labeled data that is applicable to a wide range of downstream classification tasks, improving upon the representation learned by four prior unsupervised learning methods.

Algorithm

Input: a dataset consisting only of unlabeled images. Output: a learning procedure.

1. We run an out-of-the-box unsupervised learning algorithm to learn an embedding function. We pass each image through this function to obtain its embedding, which can be thought of as a summary of the image.

2. We consider image classification tasks. Each such task consists of some classes, and some examples for each class.  We first create multiple sets of pseudo-labels for the images via clustering their embeddings. To construct a task, we sample a set of labels, sample a few clusters, and sample a few embedding points from each cluster. We select the images whose embeddings were sampled. Some of a task's image examples are designated as training examples (colored border), and the rest are treated as testing examples (gray border).

3. The automatically generated tasks are fed into a meta-learning algorithm. For each task, the meta-learning algorithm applies the current form of its eventual output, a learning procedure, to the task's training examples. This results in a classifier that is tailored to the current task. The meta-learning algorithm then assesses its learning procedure by testing how well the classifier does on the task's testing examples. Based on this, it updates its learning procedure.

We call our method Clustering to Automatically Construct Tasks for Unsupervised meta-learning (CACTUs). The key insight which CACTUs relies on is that while the embeddings may not be directly suitable for learning downstream tasks, they can still be leveraged to create structured yet diverse training tasks. We assess performance by deploying the meta-learned learning procedure to new human-specified tasks based on unseen images and classes.Code

For CACTUs code that uses model-agnostic meta-learning (MAML), please see this GitHub repo.

For CACTUs code that uses prototypical networks (ProtoNets), please see this GitHub repo.

Credits

The results in this project leveraged six open-source codebases from six prior works.

We used four unsupervised learning methods for step 1:

• "Adversarial Feature Learning": paper, code

• "Deep Clustering for Unsupervised Learning of Visual Features": paper, code

• "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets": paper, code

• "Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer": paper, code

We used two meta-learning methods for step 3:

• "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks": paper, code

• "Prototypical Networks for Few-Shot Learning": paper, code

### 相关内容

Pre-training text representations has recently been shown to significantly improve the state-of-the-art in many natural language processing tasks. The central goal of pre-training is to learn text representations that are useful for subsequent tasks. However, existing approaches are optimized by minimizing a proxy objective, such as the negative log likelihood of language modeling. In this work, we introduce a learning algorithm which directly optimizes model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps. The standard multi-task learning objective adopted in BERT is a special case of our learning algorithm where the depth of meta-train is zero. We study the problem in two settings: unsupervised pre-training and supervised pre-training with different pre-training objects to verify the generality of our approach.Experimental results show that our algorithm brings improvements and learns better initializations for a variety of downstream tasks.

We present a new method to learn video representations from large-scale unlabeled video data. Ideally, this representation will be generic and transferable, directly usable for new tasks such as action recognition and zero or few-shot learning. We formulate unsupervised representation learning as a multi-modal, multi-task learning problem, where the representations are shared across different modalities via distillation. Further, we introduce the concept of loss function evolution by using an evolutionary search algorithm to automatically find optimal combination of loss functions capturing many (self-supervised) tasks and modalities. Thirdly, we propose an unsupervised representation evaluation metric using distribution matching to a large unlabeled dataset as a prior constraint, based on Zipf's law. This unsupervised constraint, which is not guided by any labeling, produces similar results to weakly-supervised, task-specific ones. The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods. Notably, it is also more effective than several label-based methods (e.g., ImageNet), with the exception of large, fully labeled video datasets.

Few-shot image classification aims to classify unseen classes with limited labeled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta learning becomes an essential component and can largely affects the performance in practice. To this end, many pre-trained methods have been proposed, and most of them are trained in supervised way with limited transfer ability for unseen classes. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide slow and robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets ({\em i.e.,} MiniImageNet and CUB). Based on the evaluation results, the proposed method achieves significantly better performance, i.e., improve 1-shot and 5-shot tasks by nearly \textbf{3\%} and \textbf{4\%} on MiniImageNet, by nearly \textbf{9\%} and \textbf{3\%} on CUB. Moreover, the proposed method can gain the improvement of (\textbf{15\%}, \textbf{13\%}) on MiniImageNet and (\textbf{15\%}, \textbf{8\%}) on CUB by pretraining using more unlabeled data. Our code will be available at \hyperref[https://github.com/phecy/SSL-FEW-SHOT.]{https://github.com/phecy/ssl-few-shot.}

Continual learning aims to improve the ability of modern learning systems to deal with non-stationary distributions, typically by attempting to learn a series of tasks sequentially. Prior art in the field has largely considered supervised or reinforcement learning tasks, and often assumes full knowledge of task labels and boundaries. In this work, we propose an approach (CURL) to tackle a more general problem that we will refer to as unsupervised continual learning. The focus is on learning representations without any knowledge about task identity, and we explore scenarios when there are abrupt changes between tasks, smooth transitions from one task to another, or even when the data is shuffled. The proposed approach performs task inference directly within the model, is able to dynamically expand to capture new concepts over its lifetime, and incorporates additional rehearsal-based techniques to deal with catastrophic forgetting. We demonstrate the efficacy of CURL in an unsupervised learning setting with MNIST and Omniglot, where the lack of labels ensures no information is leaked about the task. Further, we demonstrate strong performance compared to prior art in an i.i.d setting, or when adapting the technique to supervised tasks such as incremental class learning.

This work tackles the problem of semi-supervised learning of image classifiers. Our main insight is that the field of semi-supervised learning can benefit from the quickly advancing field of self-supervised visual representation learning. Unifying these two approaches, we propose the framework of self-supervised semi-supervised learning ($S^4L$) and use it to derive two novel semi-supervised image classification methods. We demonstrate the effectiveness of these methods in comparison to both carefully tuned baselines, and existing semi-supervised learning methods. We then show that $S^4L$ and existing semi-supervised methods can be jointly trained, yielding a new state-of-the-art result on semi-supervised ILSVRC-2012 with 10% of labels.

Meta-learning, or learning to learn, is the science of systematically observing how different machine learning approaches perform on a wide range of learning tasks, and then learning from this experience, or meta-data, to learn new tasks much faster than otherwise possible. Not only does this dramatically speed up and improve the design of machine learning pipelines or neural architectures, it also allows us to replace hand-engineered algorithms with novel approaches learned in a data-driven way. In this chapter, we provide an overview of the state of the art in this fascinating and continuously evolving field.

Meta-learning is a powerful tool that builds on multi-task learning to learn how to quickly adapt a model to new tasks. In the context of reinforcement learning, meta-learning algorithms can acquire reinforcement learning procedures to solve new problems more efficiently by meta-learning prior tasks. The performance of meta-learning algorithms critically depends on the tasks available for meta-training: in the same way that supervised learning algorithms generalize best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of unsupervised meta-learning algorithms for reinforcement learning. We describe a general recipe for unsupervised meta-reinforcement learning, and describe an effective instantiation of this approach based on a recently proposed unsupervised exploration technique and model-agnostic meta-learning. We also discuss practical and conceptual considerations for developing unsupervised meta-learning methods. Our experimental results demonstrate that unsupervised meta-reinforcement learning effectively acquires accelerated reinforcement learning procedures without the need for manual task design, significantly exceeds the performance of learning from scratch, and even matches performance of meta-learning methods that use hand-specified task distributions.

A major goal of unsupervised learning is to discover data representations that are useful for subsequent tasks, without access to supervised labels during training. Typically, this goal is approached by minimizing a surrogate objective, such as the negative log likelihood of a generative model, with the hope that representations useful for subsequent tasks will arise incidentally. In this work, we propose instead to directly target a later desired task by meta-learning an unsupervised learning rule, which leads to representations useful for that task. Here, our desired task (meta-objective) is the performance of the representation on semi-supervised classification, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations that perform well under this meta-objective. Additionally, we constrain our unsupervised update rule to a be a biologically-motivated, neuron-local function, which enables it to generalize to novel neural network architectures. We show that the meta-learned update rule produces useful features and sometimes outperforms existing unsupervised learning techniques. We further show that the meta-learned unsupervised update rule generalizes to train networks with different widths, depths, and nonlinearities. It also generalizes to train on data with randomly permuted input dimensions and even generalizes from image datasets to a text task.

Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. We present mixed-integer optimization approaches to find optimal distance metrics that generalize the Mahalanobis metric extensively studied in the literature. Additionally, we generalize and improve upon leading methods by removing reliance on pre-designated "target neighbors," "triplets," and "similarity pairs." Another salient feature of our method is its ability to enable active learning by recommending precise regions to sample after an optimal metric is computed to improve classification performance. This targeted acquisition can significantly reduce computational burden by ensuring training data completeness, representativeness, and economy. We demonstrate classification and computational performance of the algorithms through several simple and intuitive examples, followed by results on real image and medical datasets.

CreateAMind
12+阅读 · 2019年5月22日
CreateAMind
8+阅读 · 2019年5月18日
CreateAMind
7+阅读 · 2019年1月7日
CreateAMind
20+阅读 · 2019年1月4日
CreateAMind
9+阅读 · 2019年1月2日

10+阅读 · 2018年12月24日
CreateAMind
23+阅读 · 2018年9月12日
CreateAMind
16+阅读 · 2018年5月25日
CreateAMind
5+阅读 · 2017年8月4日

Shangwen Lv,Yuechen Wang,Daya Guo,Duyu Tang,Nan Duan,Fuqing Zhu,Ming Gong,Linjun Shou,Ryan Ma,Daxin Jiang,Guihong Cao,Ming Zhou,Songlin Hu
10+阅读 · 2020年4月12日
AJ Piergiovanni,Anelia Angelova,Michael S. Ryoo
20+阅读 · 2020年2月26日
Da Chen,Yuefeng Chen,Yuhong Li,Feng Mao,Yuan He,Hui Xue
13+阅读 · 2019年11月14日
Dushyant Rao,Francesco Visin,Andrei A. Rusu,Yee Whye Teh,Razvan Pascanu,Raia Hadsell
5+阅读 · 2019年10月31日
Xiaohua Zhai,Avital Oliver,Alexander Kolesnikov,Lucas Beyer
4+阅读 · 2019年5月9日
Joaquin Vanschoren
115+阅读 · 2018年10月8日
Abhishek Gupta,Benjamin Eysenbach,Chelsea Finn,Sergey Levine
6+阅读 · 2018年6月12日
Chengxiang Yin,Jian Tang,Zhiyuan Xu,Yanzhi Wang
6+阅读 · 2018年6月8日
Luke Metz,Niru Maheswaranathan,Brian Cheung,Jascha Sohl-Dickstein
6+阅读 · 2018年5月23日
Krishnan Kumaran,Dimitri Papageorgiou,Yutong Chang,Minhan Li,Martin Takáč
8+阅读 · 2018年3月28日
Top