CLIP (Contrastive Language-Image Pre-training) models developed by OpenAI have achieved outstanding results on various image recognition and retrieval tasks, displaying strong zero-shot performance. This means that they are able to perform effectively on tasks for which they have not been explicitly trained. Inspired by the success of OpenAI CLIP, a new publicly available dataset called LAION-5B was collected which resulted in the development of open ViT-H/14, ViT-G/14 models that outperform the OpenAI L/14 model. The LAION-5B dataset also released an approximate nearest neighbor index, with a web interface for search & subset creation. In this paper, we evaluate the performance of various CLIP models as zero-shot face recognizers. Our findings show that CLIP models perform well on face recognition tasks, but increasing the size of the CLIP model does not necessarily lead to improved accuracy. Additionally, we investigate the robustness of CLIP models against data poisoning attacks by testing their performance on poisoned data. Through this analysis, we aim to understand the potential consequences and misuse of search engines built using CLIP models, which could potentially function as unintentional face recognition engines.
翻译:由 OpenAI 开发的图像识别和检索模型(CLIP (Contracting Special-Limage Pretraction) 模型在各种图像识别和检索任务上取得了杰出的成果,展示了很强的零弹性能。这意味着他们能够有效地完成他们未受过明确培训的任务。在OpenAI CLIP 成功的基础上,收集了一个新的公开数据集,称为LAION-5B, 称为LAION-5B, 从而开发了开放 VIT-H/14, VIT-G/14 模型, 超越了 OpenAI L/14 模型。 LAION-5B 数据集还发布了一个近距离的近邻索引, 并带有搜索和子创建的网络界面。 在本文中,我们评估了各种 CLIP 模型作为零弹式脸识别仪的性能。 我们的研究结果显示, CLIP 模型在面部识别任务上表现良好, 但增加 CLIP 模式的规模并不一定导致提高准确性。 此外,我们通过测试其在毒害数据上的性能, 来调查CLIP 。我们的目的是了解使用CLIP 引擎进行搜索引擎的潜在后果和滥用。