Compared to conventional zero-shot learning (ZSL) where recognising unseen classes is the primary or only aim, the goal of generalized zero-shot learning (GZSL) is to recognise both seen and unseen classes. Most GZSL methods typically learn to synthesise visual representations from semantic information on the unseen classes. However, these types of models are prone to overfitting the seen classes, resulting in distribution overlap between the generated features of the seen and unseen classes. The overlapping region is filled with uncertainty as the model struggles to determine whether a test case from within the overlap is seen or unseen. Further, these generative methods suffer in scenarios with sparse training samples. The models struggle to learn the distribution of high dimensional visual features and, therefore, fail to capture the most discriminative inter-class features. To address these issues, in this paper, we propose a novel framework that leverages dual variational autoencoders with a triplet loss to learn discriminative latent features and applies the entropy-based calibration to minimize the uncertainty in the overlapped area between the seen and unseen classes. Specifically, the dual generative model with the triplet loss synthesises inter-class discriminative latent features that can be mapped from either visual or semantic space. To calibrate the uncertainty for seen classes, we calculate the entropy over the softmax probability distribution from a general classifier. With this approach, recognising the seen samples within the seen classes is relatively straightforward, and there is less risk that a seen sample will be misclassified into an unseen class in the overlapped region. Extensive experiments on six benchmark datasets demonstrate that the proposed method outperforms state-of-the-art approaches.
翻译:与常规零光学习(ZSL)相比(ZSL),在常规零光学习(ZSL)中,承认看不见的班级是主要或唯一目的,普遍零光学习(GZSL)的目标是既承认可见的班级,也承认看不见的班级。大多数GZSL方法通常会学习从隐性班级的语义信息中合成视觉表现。然而,这些类型的模型容易过度适应所见班级,导致所见班级和不可见班级所产生特征之间的分布重叠。随着模型努力确定从重叠的班级内部的测试案例被看还是不可见,重叠的班级(GZSL)的目标是既认得普通班级,又认得普通班级间视觉显示的视觉表达方式。从高度视觉显示的离差分级的双精度模型,从可辨的班级中看出的智能级间混校正的六级方法。我们可以看到,在普通班级内部显示的智能化方法中,从视觉级间校正的机级中可以看到。