We tackle the problem of visual search under resource constraints. Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images. Such systems inherently face a hard accuracy-efficiency trade-off: the embedding model needs to be large enough to ensure high accuracy, yet small enough to enable query-embedding computation on resource-constrained platforms. This trade-off could be mitigated if gallery embeddings are generated from a large model and query embeddings are extracted using a compact model. The key to building such a system is to ensure representation compatibility between the query and gallery models. In this paper, we address two forms of compatibility: One enforced by modifying the parameters of each model that computes the embeddings. The other by modifying the architectures that compute the embeddings, leading to compatibility-aware neural architecture search (CMP-NAS). We test CMP-NAS on challenging retrieval tasks for fashion images (DeepFashion2), and face images (IJB-C). Compared to ordinary (homogeneous) visual search using the largest embedding model (paragon), CMP-NAS achieves 80-fold and 23-fold cost reduction while maintaining accuracy within 0.3% and 1.6% of the paragon on DeepFashion2 and IJB-C respectively.
翻译:我们处理的是资源限制下的视觉搜索问题。 现有的系统使用相同的嵌入模型来计算查询和画廊图像的显示( 组合) 。 这些系统在本质上面临着一个严格的准确性- 效率权衡: 嵌入模型必须足够大, 以确保高精度, 但是小到足以在资源限制的平台上进行查询组化计算。 如果从大型模型中生成了画廊嵌入, 并且使用一个紧凑模型来提取查询嵌入的查询和画廊图像, 建立这样一个系统的关键是确保查询和画廊模型之间的代表兼容性。 在本文中, 我们处理两种兼容性形式: 一种是通过修改计算嵌入器的每个模型的参数来强制执行的。 另一种是通过修改嵌入模型的架构, 导致在资源限制的平台上进行兼容性神经结构搜索( CMP-NAS ) 。 我们测试CMP-NAS 是如何挑战时装图像的检索任务( Dep Fashason 2) 和脸图像( IJB- C) 的关键是, 与使用最大嵌嵌入式模型的普通( hogenous) 视觉搜索( ) ) 匹配) 和 0.3- F- pal- fal- pal 和 prain- bly 和 3- pal- fal- palbly 和 pal- pal- pal- pal- pripalbly) 。