zero-shot learning is an essential part of computer vision. As a classical downstream task, zero-shot semantic segmentation has been studied because of its applicant value. One of the popular zero-shot semantic segmentation methods is based on the generative model Most new proposed works added structures on the same architecture to enhance this model. However, we found that, from the view of causal inference, the result of the original model has been influenced by spurious statistical relationships. Thus the performance of the prediction shows severe bias. In this work, we consider counterfactual methods to avoid the confounder in the original model. Based on this method, we proposed a new framework for zero-shot semantic segmentation. Our model is compared with baseline models on two real-world datasets, Pascal-VOC and Pascal-Context. The experiment results show proposed models can surpass previous confounded models and can still make use of additional structures to improve the performance. We also design a simple structure based on Graph Convolutional Networks (GCN) in this work.
翻译:零光学习是计算机视觉的一个基本部分。 作为一种典型的下游任务, 已经研究过零光语义分解, 因为它的申请人价值。 流行的零光语义分解方法之一以基因模型为基础 。 新的拟议工程在同一个结构中添加了结构, 以加强这个模型。 然而, 我们发现, 从因果推断来看, 原始模型的结果受到虚假统计关系的影响 。 因此, 预测的性能显示出严重的偏差 。 在这项工作中, 我们考虑反现实的方法来避免原始模型中的混结者 。 基于这个方法, 我们提出了一个新的零光语义分解框架 。 我们的模型与两个真实世界数据集( Pascal- VOC 和 Pascal- Context) 的基线模型进行比较。 实验结果显示, 拟议的模型可以超过先前的集成模型, 并且仍然可以使用其他结构来改进性能。 我们还在这项工作中设计了一个基于图形革命网络的简单结构 。