The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images, given only a subset of them during training and no prior on the unseen compositions. In this setting, models operate on a huge output space, containing all possible state-object compositions. While previous works tackle the problem by learning embeddings for the compositions jointly, here we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently. To ensure that the model develops primitive-specific features, we equip the state and object classifiers with separate, non-linear feature extractors. Moreover, we estimate the feasibility of each composition through external knowledge, using this prior to remove unfeasible compositions from the output space. Finally, we propose a new setting, i.e. CZSL under partial supervision (pCZSL), where either only objects or state labels are available during training, and we can use our prior to estimate the missing labels. Our model, Knowledge-Guided Simple Primitives (KG-SP), achieves state of the art in both OW-CZSL and pCZSL, surpassing most recent competitors even when coupled with semi-supervised learning techniques. Code available at: https://github.com/ExplainableML/KG-SP.
翻译:开放世界成份零光学习( OW- CZSL) 的目标是识别图像中状态和对象的构成, 在培训期间只给出其中的一个子集, 而在未见的成份上也不存在。 在这个设置中, 模型在巨大的输出空间中运行, 包含所有可能的状态对象构成。 虽然先前的工作通过学习组成内容的嵌入来解决这个问题, 我们在这里重温一个简单的 CZSL 基线, 并独立预测原始的, 即状态和对象。 为确保模型开发原始的特性, 我们为国家和对象分类器配备了单独的非线性特征提取器。 此外, 我们通过外部知识来估计每种成份的可行性, 使用此方法在输出空间中移除不可行的成份。 最后, 我们提出一个新的设置, 即部分监管下的 CZSL( PCZSL), 在培训期间只能提供对象或状态标签, 我们可以使用我们之前的标签来估计缺失的标签。 我们的模型, 知识- 简易的 Primitive( K- SAP), 甚至是最近 K- mest- SLSLSLSL) 和半SLSLSLSL 学习技术时, 。