Compositional zero-shot learning aims to recognize unseen compositions of seen visual primitives of object classes and their states. While all primitives (states and objects) are observable during training in some combination, their complex interaction makes this task especially hard. For example, wet changes the visual appearance of a dog very differently from a bicycle. Furthermore, we argue that relationships between compositions go beyond shared states or objects. A cluttered office can contain a busy table; even though these compositions don't share a state or object, the presence of a busy table can guide the presence of a cluttered office. We propose a novel method called Compositional Attention Propagated Embedding (CAPE) as a solution. The key intuition to our method is that a rich dependency structure exists between compositions arising from complex interactions of primitives in addition to other dependencies between compositions. CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions. In the challenging generalized compositional zero-shot setting, we show that our method outperforms previous baselines to set a new state-of-the-art on three publicly available benchmarks.
翻译:零光的构成学习旨在识别可见的物体类别及其状态的视觉原始物的无形构成。 虽然所有原始物( 状态和对象) 在某些组合中可见, 他们的复杂互动使得这项任务特别困难。 例如, 湿改变狗的视觉外观与自行车截然不同。 此外, 我们主张, 组成之间的关系可以超越共同的状态或对象。 杂乱的办公室可以包含繁忙的表格; 即使这些组成不共享一个状态或对象, 繁忙的表格的存在可以指导杂乱的办公室的存在 。 我们提出了一个名为“ 组成式注意促进嵌入式( CAPE) ” 的新方法作为解决方案。 我们的方法的关键直觉是, 原始物的复杂互动以及结构之间的其他依赖性存在丰富的依赖性结构。 CAPE 学会识别这一结构, 并传播它们之间的知识, 学习所有可见的和看不见的构成的类嵌入。 在具有挑战性的通用零镜头设置中, 我们展示的方法比以前的基线更符合以往在三种公开的基点上设定新的状态基准 。