Images tell powerful stories but cannot always be trusted. Matching images back to trusted sources (attribution) enables users to make a more informed judgment of the images they encounter online. We propose a robust image hashing algorithm to perform such matching. Our hash is sensitive to manipulation of subtle, salient visual details that can substantially change the story told by an image. Yet the hash is invariant to benign transformations (changes in quality, codecs, sizes, shapes, etc.) experienced by images during online redistribution. Our key contribution is OSCAR-Net (Object-centric Scene Graph Attention for Image Attribution Network); a robust image hashing model inspired by recent successes of Transformers in the visual domain. OSCAR-Net constructs a scene graph representation that attends to fine-grained changes of every object's visual appearance and their spatial relationships. The network is trained via contrastive learning on a dataset of original and manipulated images yielding a state of the art image hash for content fingerprinting that scales to millions of images.
翻译:图像显示强大的故事, 但不能总是可信。 将图像匹配到可信任的来源( 属性) 使用户能够对其在网上看到的图像做出更知情的判断 。 我们提出一个强大的图像散射算法来进行匹配 。 我们的大麻对操纵微妙的、 突出的视觉细节非常敏感, 能够大大改变图像所讲述的故事 。 然而, 大麻对图像在在线再分配期间经历的良性转变( 质量、 代码、 大小、 形状等的变化 ) 来说是不可改变的 。 我们的主要贡献是 OSCAR- Net ( 以对象为中心的图像归因网络的 Scene Graction 注意 ) ; 一个受视觉领域变换者最近成功启发的强健健健健的图像集成型模型 。 OSCAR- Net 构建了一个场景图示代表, 关注每个对象的视觉外观及其空间关系的微微变形变化。 网络通过对比性学习原始和被操纵的图像生成状态的数据集来接受培训 。 我们的主要贡献是 OCAR- Net ( Obcentclegn scrign) 指印成百万图像的规模。