Instance-level image retrieval aims to find images containing the same object as a given query, despite variations in size, position, or appearance. To address this challenging task, we propose Patchify, a simple yet effective patch-wise retrieval framework that offers high performance, scalability, and interpretability without requiring fine-tuning. Patchify divides each database image into a small number of structured patches and performs retrieval by comparing these local features with a global query descriptor, enabling accurate and spatially grounded matching. To assess not just retrieval accuracy but also spatial correctness, we introduce LocScore, a localization-aware metric that quantifies whether the retrieved region aligns with the target object. This makes LocScore a valuable diagnostic tool for understanding and improving retrieval behavior. We conduct extensive experiments across multiple benchmarks, backbones, and region selection strategies, showing that Patchify outperforms global methods and complements state-of-the-art reranking pipelines. Furthermore, we apply Product Quantization for efficient large-scale retrieval and highlight the importance of using informative features during compression, which significantly boosts performance. Project website: https://wons20k.github.io/PatchwiseRetrieval/
翻译:实例级图像检索旨在根据给定查询图像,在尺寸、位置或外观存在差异的情况下,找到包含相同对象的图像。为应对这一挑战性任务,我们提出了Patchify——一种简单而有效的分块检索框架,该框架无需微调即可实现高性能、可扩展性和可解释性。Patchify将每张数据库图像划分为少量结构化图像块,并通过将这些局部特征与全局查询描述符进行比较来执行检索,从而实现精确且具有空间定位能力的匹配。为评估检索准确性及空间正确性,我们引入了LocScore,这是一种定位感知度量指标,用于量化检索区域是否与目标对象对齐,使其成为理解和改进检索行为的重要诊断工具。我们在多个基准数据集、骨干网络及区域选择策略上进行了广泛实验,结果表明Patchify优于全局方法,并能与先进的重新排序流程形成互补。此外,我们应用乘积量化技术以实现高效的大规模检索,并强调了在压缩过程中使用信息丰富特征的重要性,这显著提升了性能。项目网站:https://wons20k.github.io/PatchwiseRetrieval/