Generalized zero-shot learning (GZSL) aims to classify samples under the assumption that some classes are not observable during training. To bridge the gap between the seen and unseen classes, most GZSL methods attempt to associate the visual features of seen classes with attributes or to generate unseen samples directly. Nevertheless, the visual features used in the prior approaches do not necessarily encode semantically related information that the shared attributes refer to, which degrades the model generalization to unseen classes. To address this issue, in this paper, we propose a novel semantics disentangling framework for the generalized zero-shot learning task (SDGZSL), where the visual features of unseen classes are firstly estimated by a conditional VAE and then factorized into semantic-consistent and semantic-unrelated latent vectors. In particular, a total correlation penalty is applied to guarantee the independence between the two factorized representations, and the semantic consistency of which is measured by the derived relation network. Extensive experiments conducted on four GZSL benchmark datasets have evidenced that the semantic-consistent features disentangled by the proposed SDGZSL are more generalizable in tasks of canonical and generalized zero-shot learning. Our source code is available at https://github.com/uqzhichen/SDGZSL.
翻译:通用零光学习(GZSL)旨在根据以下假设对样本进行分类:有些班级在培训期间无法观测;为缩小可见和看不见班级之间的差距,大多数GZSL试图将可见班级的视觉特征与属性挂钩,或直接生成不可见的样本;然而,以往方法中使用的视觉特征不一定将共享属性所指的与语义相关的信息编码成词,将模型的概括性降低至不可见班级;为解决这一问题,我们提议为普遍零光学习任务(SDGZSL)建立一个新的语义模糊框架(SDGZSL),在那里,对看不见班的视觉特征首先由有条件的VAE进行估算,然后将其纳入语义一致和语义无关的潜在矢量。特别是,对于保障两个要素化的表达之间的独立性,以及由衍生关系网络测量的语义一致性,适用了全面的相关处罚。在四个GZSL基准数据集上进行的广泛实验证明,在拟议通用的SDGZSLSL中,在通用的源中可以更多地学习。