为零光学习拆解语义到视觉混杂 (Disentangling Semantic-to-visual Confusion for Zero-shot Learning)

Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multimodal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets. Our code is available at https://github.com/FouriYe/DCRGAN-TMM.

翻译：利用基因模型合成语义分布的视觉特征是近年来ZSL图像分类最受欢迎的解决办法之一。三重损失(TL)被广泛用来通过自动搜索歧视表示方式从语义中产生现实的视觉分布。然而,传统TL由于ZSL没有隐蔽的阶级,因此无法寻找可靠的、隐蔽的分解表征。为了减轻这一缺陷,我们在此工作中建议采用多式三重损失(MMMTL),利用多式联运信息寻找不相容的展示空间。因此,所有类别都可以相互影响,从而有利于学习搜索空间中分解的阶级表征。此外,我们开发了一个叫作脱钩类代表的基因化Adversarial网络(DCR-GAN)的新颖模型,侧重于在培训、特征合成和最后识别阶段中利用分解的表征。DCRC-GAN可以从解的表征中获取更现实的分布。广博实验显示,我们提议的模型可以导致在搜索空间中学习出优异的阶级表现。我们在MAF-RGA/RG 4上可以使用的数据代码。

相关内容