Zero-shot learning relies on semantic class representations such as hand-engineered attributes or learned embeddings to predict classes without any labeled examples. We propose to learn class representations by embedding nodes from common sense knowledge graphs in a vector space. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort to apply to a range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a general-purpose framework with a novel transformer graph convolutional network (TrGCN) for generating class representations. Our proposed TrGCN architecture computes non-linear combinations of node neighbourhoods. Our results show that ZSL-KG improves over existing WordNet-based methods on five out of six zero-shot benchmark datasets in language and vision.
翻译:零光学习依赖于语义类的表达方式,例如手动设计的属性或学习的嵌入式,以在没有标签实例的情况下预测阶级。我们提议通过将常识知识图中的节点嵌入矢量空间来学习阶级表达方式。常识知识图是尚未开发的显性高级知识的来源,需要人类很少努力才能应用于一系列任务。为了在图表中捕捉知识,我们引入了ZSL-KG,这是一个通用框架,有一个创新的变压器图解剖面网(TrGCN)来生成阶级表达方式。我们拟议的TRGCN结构对节点街区的非线性组合进行了计算。我们的结果显示,ZSL-KG在语言和视觉的6个零点基准数据集中,有5个改进了基于WordNet的现有方法。