We present an algorithm for searching image collections using free-hand sketches that describe the appearance and relative positions of multiple objects. Sketch based image retrieval (SBIR) methods predominantly match queries containing a single, dominant object invariant to its position within an image. Our work exploits drawings as a concise and intuitive representation for specifying entire scene compositions. We train a convolutional neural network (CNN) to encode masked visual features from sketched objects, pooling these into a spatial descriptor encoding the spatial relationships and appearances of objects in the composition. Training the CNN backbone as a Siamese network under triplet loss yields a metric search embedding for measuring compositional similarity which may be efficiently leveraged for visual search by applying product quantization.
翻译:我们用描述多个物体外观和相对位置的免费手动草图绘制图像收藏的算法。基于 Sletch 的图像检索方法主要匹配包含单一、主要对象的查询,使其与图像中的位置不一不一。我们的工作利用图纸作为简明和直观的缩略语来说明整个场景的构造。我们训练一个革命性神经网络,将素描对象的蒙面视觉特征编码起来,将这些特征汇集到一个空间描述符中,将组成对象的空间关系和外观编码起来。将CNN骨干训练成三重损失下的一个Siamese网络,从而产生一个衡量成像性(通过应用产品量化可有效用于视觉搜索的)指标性搜索嵌入。