Latent variable models like the Variational Auto-Encoder (VAE) are commonly used to learn representations of images. However, for downstream tasks like semantic classification, the representations learned by VAE are less competitive than other non-latent variable models. This has led to some speculations that latent variable models may be fundamentally unsuitable for representation learning. In this work, we study what properties are required for good representations and how different VAE structure choices could affect the learned properties. We show that by using a decoder that prefers to learn local features, the remaining global features can be well captured by the latent, which significantly improves performance of a downstream classification task. We further apply the proposed model to semi-supervised learning tasks and demonstrate improvements in data efficiency.
翻译:静语分类等下游任务通常使用VAE所学的表示方式,但VAE所学的表示方式比其他非惯性可变模型的表示方式竞争力较低。这导致了一些猜测,认为潜在的可变模型根本上不适于代言学习。在这项工作中,我们研究了良好表示方式需要哪些属性,不同的VAE结构选择会如何影响所学属性。我们表明,通过使用偏爱学习本地特征的解码器,潜伏部分可以很好地捕捉到其余的全球特征,从而大大改进下游分类任务的业绩。我们进一步将拟议的模型应用于半监督的学习任务,并展示数据效率的提高。