Open vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without using explicit knowledge of the image domain and with far fewer hand-constructed sentences. To achieve this, we combine open vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that are customized for each object category. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this method requires no additional training and remains completely zero-shot. Code is available at https://github.com/sarahpratt/CuPL.
翻译:开放词汇模式是图像分类的有希望的新范例。 与传统分类模式不同, 开放词汇模式在任何任意的类别中分类, 在推断过程中使用自然语言指定。 这种自然语言, 称为“ 提示”, 通常由一组手写模板组成( 例如“ ⁇ ” 照片 ), 每个类别的名称都已完成。 这项工作引入了一种简单的方法, 提高准确性, 而不使用对图像域的明确知识, 手建句数要少得多。 为了实现这一目标, 我们将开放词汇模式与大型语言模式( LLLMs) 结合起来, 通过语言模型( CPL, 直译为“ couple ” ) 创建自定义的提示。 特别是, 我们利用LLMS 中所包含的知识来生成许多针对每个对象类别定制的描述性句子。 我们发现, 这一简单和一般的方法可以提高一系列零点图像分类基准的准确性, 包括图像网络上超过一个百分点的收益 。 最后, 这种方法不需要额外培训, 并且仍然完全为零点 。 代码可在 https://github.com/ sarahprat/ CuPLPLPL.