2021年图像相似性数据集与挑战 (The 2021 Image Similarity Dataset and Challenge)

Matthijs Douze,Giorgos Tolias,Ed Pizzi,Zoë Papakipos,Lowik Chanussot,Filip Radenovic,Tomas Jenicek,Maxim Maximov,Laura Leal-Taixé,Ismail Elezi,Ondřej Chum,Cristian Canton Ferrer

This paper introduces a new benchmark for large-scale image similarity detection. This benchmark is used for the Image Similarity Challenge at NeurIPS'21 (ISC2021). The goal is to determine whether a query image is a modified copy of any image in a reference corpus of size 1~million. The benchmark features a variety of image transformations such as automated transformations, hand-crafted image edits and machine-learning based manipulations. This mimics real-life cases appearing in social media, for example for integrity-related problems dealing with misinformation and objectionable content. The strength of the image manipulations, and therefore the difficulty of the benchmark, is calibrated according to the performance of a set of baseline approaches. Both the query and reference set contain a majority of "distractor" images that do not match, which corresponds to a real-life needle-in-haystack setting, and the evaluation metric reflects that. We expect the DISC21 benchmark to promote image copy detection as an important and challenging computer vision task and refresh the state of the art. Code and data are available at https://github.com/facebookresearch/isc2021

翻译：本文引入了大规模图像相似性检测的新基准。该基准用于 NeurIPS'21 图像相似性挑战( ISC2021) 。目标是确定查询图像是否是大小1~百万的参考体中任何图像的修改副本。基准包含各种图像转换, 如自动转换、手工制作图像编辑和基于机器学习的操控。这模仿了社交媒体中出现的真实生活案例, 例如涉及错误和可憎内容的与完整性有关的问题。图像操纵的强度, 因而也即基准的难度, 是根据一套基线方法的性能校准的。查询和参考集都包含大多数不匹配的“ 吸引图像 ”, 这与真实生活中的针头在海雀设置相匹配, 以及评估指标反映了这一点。我们期待 DISC21 基准将图像复制检测作为重要且具有挑战性的计算机视觉任务, 并更新艺术状态。代码和数据可在 https://github.com/facebreadres2021 中查阅。