Compositional Zero-Shot learning (CZSL) requires to recognize state-object compositions unseen during training. In this work, instead of assuming the presence of prior knowledge about the unseen compositions, we operate on the open world setting, where the search space includes a large number of unseen compositions some of which might be unfeasible. In this setting, we start from the cosine similarity between visual features and compositional embeddings. After estimating the feasibility score of each composition, we use these scores to either directly mask the output space or as a margin for the cosine similarity between visual features and compositional embeddings during training. Our experiments on two standard CZSL benchmarks show that all the methods suffer severe performance degradation when applied in the open world setting. While our simple CZSL model achieves state-of-the-art performances in the closed world scenario, our feasibility scores boost the performance of our approach in the open world setting, clearly outperforming the previous state of the art.
翻译:零热成份学习( CZSL) 需要识别培训期间看不见的状态对象构成。 在这项工作中, 我们不假定先前对看不见成份有了解, 而是在开放世界环境中操作, 搜索空间包含大量不可见成份, 其中有些可能是不可行的。 在这种环境中, 我们从视觉特征与构件嵌入的相近性开始, 在估计每种成份的可行性分数后, 我们用这些分数来直接遮盖输出空间, 或者作为在培训期间视觉特征和构件嵌入的相近性差值。 我们关于两个标准的 CZSLL 基准的实验显示, 当在开放世界环境中应用时, 所有方法都严重性能退化。 虽然我们简单的 CZSL 模型在封闭世界的情景中实现了艺术状态性能, 我们的可行性分数提高了我们在开放世界中的方法表现, 明显优于艺术的先前状态 。