Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training. It is natural to derive generative models and hallucinate training samples for unseen classes based on the knowledge learned from the seen samples. However, most of these models suffer from the `generation shifts', where the synthesized samples may drift from the real distribution of unseen data. In this paper, we conduct an in-depth analysis on this issue and propose a novel Generation Shifts Mitigating Flow (GSMFlow) framework, which is comprised of multiple conditional affine coupling layers for learning unseen data synthesis efficiently and effectively. In particular, we identify three potential problems that trigger the generation shifts, i.e., semantic inconsistency, variance decay, and structural permutation and address them respectively. First, to reinforce the correlations between the generated samples and the respective attributes, we explicitly embed the semantic information into the transformations in each of the coupling layers. Second, to recover the intrinsic variance of the synthesized unseen features, we introduce a visual perturbation strategy to diversify the intra-class variance of generated data and hereby help adjust the decision boundary of the classifier. Third, to avoid structural permutation in the semantic space, we propose a relative positioning strategy to manipulate the attribute embeddings, guiding which to fully preserve the inter-class geometric structure. Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings. Our code is available at: https://github.com/uqzhichen/GSMFlow
翻译:普通零热学习( GZSL) 的任务是利用语义信息( 如属性) 来识别可见和看不见的样本, 在培训期间看不见的样本中, 隐形和隐形的样本是不可见的。 根据从所见样本中学到的知识, 自然地为隐形的课堂产生基因模型和幻觉培训样本。 然而, 大多数这些模型都存在“ 生成变换 ”, 合成样本可能从真实的不可见数据分布中流出。 在本文中, 我们对这个问题进行深入分析, 并提议一个新型的“ 一代变换 ” 流流( 属性) 框架, 这个框架由多个有条件的亲吻混合层组成, 以便高效和有效地学习隐形数据合成。 特别是, 我们发现三个可能引发生成变换的潜在问题, 即语义、 变异、 结构变异、 结构变异和解决问题。 首先, 要强化所生成的样本和各自的属性, 我们明确将语义的性别变变数信息嵌到每个变换层的变换结构中。 其次, 要恢复内部变变的内变变变变,, 我们的变变变变变变变变的, 在变变变变变变变 变 变 变变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变 变