Open vocabulary models (e.g. CLIP) have shown strong performance on zero-shot classification through their ability generate embeddings for each class based on their (natural language) names. Prior work has focused on improving the accuracy of these models through prompt engineering or by incorporating a small amount of labeled downstream data (via finetuning). However, there has been little focus on improving the richness of the class names themselves, which can pose issues when class labels are coarsely-defined and uninformative. We propose Classification with Hierarchical Label Sets (or CHiLS), an alternative strategy for zero-shot classification specifically designed for datasets with implicit semantic hierarchies. CHiLS proceeds in three steps: (i) for each class, produce a set of subclasses, using either existing label hierarchies or by querying GPT-3; (ii) perform the standard zero-shot CLIP procedure as though these subclasses were the labels of interest; (iii) map the predicted subclass back to its parent to produce the final prediction. Across numerous datasets with underlying hierarchical structure, CHiLS leads to improved accuracy in situations both with and without ground-truth hierarchical information. CHiLS is simple to implement within existing CLIP pipelines and requires no additional training cost. Code is available at: https://github.com/acmi-lab/CHILS.
翻译:开放词汇模型(例如CLIP)通过以其(自然语言)名称为基础为每类生成嵌入器的能力,展示了零点分类的强效性。先前的工作重点是通过迅速工程或纳入少量贴标签的下游数据(通过微调)提高这些模型的准确性。然而,很少注重改进分类名称本身的丰富性,当分类标签定义粗略和不提供信息规范时,这可能带来问题。我们提议与等级标签标签标签标签分类(或CHilLS)一起进行零点分类,这是专门为含有隐含语义等级等级等级等级的数据集设计的零点分类替代战略。CHilLS分三步前进:(一) 对于每个类别,利用现有的标签等级等级结构或查询GPT-3,产生一套子类的分类;(二) 执行标准零点点的CLIP程序,因为这些分类是感兴趣的标签;(三) 将预测的子类追溯到其父级,以产生最后的预测。在具有基本等级结构的CLS/CLS/CRS中,在现有的等级结构中不需要执行额外的CILS/GLS。