The goal of this work is to efficiently identify visually similar patterns from a pair of images, e.g. identifying an artwork detail copied between an engraving and an oil painting, or matching a night-time photograph with its daytime counterpart. Lack of training data is a key challenge for this co-segmentation task. We present a simple yet surprisingly effective approach to overcome this difficulty: we generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image. We then learn to predict the repeated object masks. We find that it is crucial to predict the correspondences as an auxiliary task and to use Poisson blending and style transfer on the training pairs to generalize on real data. We analyse results with two deep architectures relevant to our joint image analysis task: a transformer-based architecture and Sparse Nc-Net, a recent network designed to predict coarse correspondences using 4D convolutions. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset and achieves competitive performance on two place recognition benchmarks, Tokyo247 and Pitts30K. We then demonstrate the potential of our approach by performing object discovery on the Internet object discovery dataset and the Brueghel dataset. Our code and data are available at http://imagine.enpc.fr/~shenx/SegSwap/.
翻译:这项工作的目标是有效地从一对图像中找出视觉相似的模式,例如确定在雕刻和油画之间复制的艺术细节,或将夜间照片与日间照片相匹配。缺乏培训数据是这一共同组合任务的关键挑战。我们提出了一个简单但令人惊讶的有效方法来克服这一困难:我们通过在图像中选择对象部分并将它们复制成另一幅图像来生成合成培训配对;我们然后学会预测重复的物体面罩。我们发现,关键是要将信件作为辅助任务加以预测,并利用培训配对中的Poisson混合和风格传输来概括真实数据。我们用两个与我们联合图像分析任务相关的深层结构来分析结果:一个基于变压器的架构和Sprass Nc-Net,这是一个最近设计用来用 4D Convoluctions预测粗糙通信的网络。我们展示了我们的方法,在Brueghheel数据集上为艺术细节检索提供了明确的改进,并在两个地方识别基准上实现竞争性业绩,即Tyo247和Pitfrex30K。我们用MetfrexS/Brug 30K。然后用我们现有的数据发现数据的可能性。