Zero-shot learning (ZSL) aims to recognize unseen classes by exploiting semantic descriptions shared between seen classes and unseen classes. Current methods show that it is effective to learn visual-semantic alignment by projecting semantic embeddings into the visual space as class prototypes. However, such a projection function is only concerned with seen classes. When applied to unseen classes, the prototypes often perform suboptimally due to domain shift. In this paper, we propose to learn prototypes via placeholders, termed LPL, to eliminate the domain shift between seen and unseen classes. Specifically, we combine seen classes to hallucinate new classes which play as placeholders of the unseen classes in the visual and semantic space. Placed between seen classes, the placeholders encourage prototypes of seen classes to be highly dispersed. And more space is spared for the insertion of well-separated unseen ones. Empirically, well-separated prototypes help counteract visual-semantic misalignment caused by domain shift. Furthermore, we exploit a novel semantic-oriented fine-tuning to guarantee the semantic reliability of placeholders. Extensive experiments on five benchmark datasets demonstrate the significant performance gain of LPL over the state-of-the-art methods. Code is available at https://github.com/zaiquanyang/LPL.
翻译:零点学习( ZSL) 的目的是通过利用在可见的班级和不见的班级之间共享的语义描述来识别隐蔽的班级。 目前的方法显示,通过将语义嵌入作为类原型投射到视觉空间中,学习视觉- 语义一致是有效的。 然而, 这样的投影功能只与视觉类有关。 当应用到隐形类时, 原型通常会由于域变换而产生亚光极性作用。 在本文中, 我们提议通过占位符( 称为LPLPL) 来学习原型, 以消除在可见的班级和不可见的班级之间的域变。 具体地说, 我们看到的班级是幻觉新班级, 在视觉和语义空间中作为看不见的班级的占位。 在视觉和语义空间之间放置, 占位符鼓励被看见的班级的原型高度分散。 而更多的空间则被忽略了插入那些被完全隔绝的不可见的班级。 带有良好的原型原型, 有助于抵制由域变换而成的视觉- 。 此外, 我们利用了以新语言为主方向的微微调整, 以保障视觉- 标定点/ 。