The Google Universal Image Embedding (GUIE) Challenge is one of the first competitions in multi-domain image representations in the wild, covering a wide distribution of objects: landmarks, artwork, food, etc. This is a fundamental computer vision problem with notable applications in image retrieval, search engines and e-commerce. In this work, we explain our 4th place solution to the GUIE Challenge, and our "bag of tricks" to fine-tune zero-shot Vision Transformers (ViT) pre-trained using CLIP.
翻译:谷歌通用图像嵌入(GUIE)挑战(GUIE)是野生多域图像展示的首批竞赛之一,它覆盖了广泛分布的物体:地标、艺术品、食物等等。 这是一个基本的计算机愿景问题,在图像检索、搜索引擎和电子商务中应用了显著的应用。 在这项工作中,我们解释了我们解决GUIE挑战的第4位解决方案,以及我们用CLIP预先培训的微调零弹视野变异器的“一袋花招 ” 。