While deep Embedding Learning approaches have witnessed widespread success in multiple computer vision tasks, the state-of-the-art methods for representing natural images need not necessarily perform well on images from other domains, such as paintings, cartoons, and sketch. This is because of the huge shift in the distribution of data from across these domains, as compared to natural images. Domains like sketch often contain sparse informative pixels. However, recognizing objects in such domains is crucial, given multiple relevant applications leveraging such data, for instance, sketch to image retrieval. Thus, achieving an Embedding Learning model that could perform well across multiple domains is not only challenging, but plays a pivotal role in computer vision. To this end, in this paper, we propose a novel Embedding Learning approach with the goal of generalizing across different domains. During training, given a query image from a domain, we employ gated fusion and attention to generate a positive example, which carries a broad notion of the semantics of the query object category (from across multiple domains). By virtue of Contrastive Learning, we pull the embeddings of the query and positive, in order to learn a representation which is robust across domains. At the same time, to teach the model to be discriminative against examples from different semantic categories (across domains), we also maintain a pool of negative embeddings (from different categories). We show the prowess of our method using the DomainBed framework, on the popular PACS (Photo, Art painting, Cartoon, and Sketch) dataset.
翻译:虽然深嵌学习方法在多种计算机视觉任务中取得了广泛成功,但代表自然图像的最先进的嵌入学习方法不一定需要在其它领域的图像上表现良好,例如绘画、漫画和素描。这是因为与自然图像相比,这些领域的数据分布发生了巨大变化。像草图这样的域往往包含信息量少的像素。然而,承认这些域的物体至关重要,因为利用这些数据的多种相关应用,例如草图和图像检索。因此,实现一个可在多个领域发挥良好效果的嵌入学习模式不仅具有挑战性,而且在计算机视觉中发挥着关键作用。为此,我们提议采用新的嵌入学习方法,目标是在不同的领域推广。在培训期间,根据一个域的查询图像,我们用门状组合和注意力来生成一个正面的例子,这个例子含有查询对象类别(来自多个领域的)的语义性概念。通过对比学习,我们把查询和正面模型的嵌入于多个域中,从正嵌入到正嵌入,我们从一个透明的域,从正嵌入到展示一个清晰的域,从不同的域,从我们从不同的域中,从一个刻入到展示一个透明的域,从一个方向,从一个不同的域到展示一个方向,从我们不同的域。