When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes. To learn the class conditional distribution of CNN features, these models rely on pairs of image features and class attributes. Hence, they can not make use of the abundance of unlabeled data samples. In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings. We develop a conditional generative model that combines the strength of VAE and GANs and in addition, via an unconditional discriminator, learns the marginal feature distribution of unlabeled images. We empirically show that our model learns highly discriminative CNN features for five datasets, i.e. CUB, SUN, AWA and ImageNet, and establish a new state-of-the-art in any-shot learning, i.e. inductive and transductive (generalized) zero- and few-shot learning settings. We also demonstrate that our learned features are interpretable: we visualize them by inverting them back to the pixel space and we explain them by generating textual arguments of why they are associated with a certain label.
翻译:当标签式培训数据稀少时,一个有希望的数据增强方法就是利用特征生成未知类的视觉特征。为了学习有线电视新闻网特征的等级有条件分布,这些模型依赖于成类分布的图像特征和类属性。因此,这些模型不能使用大量未贴标签的数据样本。在本文中,我们在一个在感化和感化学习环境中运作的统一特性生成框架内,解决任何直接的学习问题,即零射和少射。我们开发了一个有条件的基因化模型,将VAE和GANs的力量结合起来,此外,通过一个无条件的区分器,学习未贴标签图像的边际特征分布。我们从经验上表明,我们的模型对五个数据集,即CUB、SUN、AWAW和图像网络,学习非常有歧视性的CNN特征,并在任何感化和感化学习环境中,即感化和感化(一般化)零和微光谱学习环境的特性生成一个新的状态。我们还表明,我们所学的特性是可以解释的:我们通过将它们与某些空间参数的文字解释,我们用图像来解读它们。