Food image-to-recipe aims to learn an embedded space linking the rich semantics in recipes with the visual content in food image for cross-modal retrieval. The existing research works carry out the learning of such space by assuming that all the image-recipe training example pairs belong to the same cuisine. As a result, despite the excellent performance reported in the literature, such space is not transferable for retrieving recipes of different cuisine. In this paper, we aim to address this issue by cross-domain food image-to-recipe retrieval, such that by leveraging abundant image-recipe pairs in source domain (one cuisine), the embedding space is generalizable to a target domain (the other cuisine) that does not have images to pair with recipes for training. With the intuition that the importance of different source samples should vary, this paper proposes two novel mechanisms for cross-domain food image-to-recipe retrieval, i.e., source data selector and weighted cross-modal adversarial learning. The former aims to select source samples similar to the target data and filter out distinctive ones for training. The latter is capable to assign higher weights to the source samples more similar to the target data and lower weights to suppress the distinctive ones for both cross-modal and adversarial learning. The weights are computed from the recipe features extracted from a pre-trained source model. Experiments on three different cuisines (Chuan, Yue and Washoku) demonstrate that the proposed method manages to achieve state-of-the-art performances in all the transfers.
翻译:食品图像到配方的目标是学习一个嵌入空间,将配方中丰富的语义与食品图像中的视觉内容进行跨模态检索。现有的研究通过假设所有的图像-配方训练示例都属于同一种菜系来进行这样的空间学习。因此,尽管文献报道了出色的性能,但是这样的空间对于检索不同菜系的配方是不可转移的。本文旨在通过跨领域食品图像到配方检索解决这个问题,通过利用源领域(一种菜系)中丰富的图像-配方对,实现嵌入空间对于不具备图像配对的目标领域(另一种菜系)的配方的泛化。本文提出了两种新颖的基于加权对抗学习的跨领域食品图像到配方检索机制,即源数据选择器和加权跨模态对抗学习。前者旨在选择与目标数据相似的源数据,过滤出用于训练的不同源数据;后者能够赋予更高的权重给与目标数据更相似的源数据,以及低的权重来抑制不同的源数据,用于跨模态和对抗学习。权重是从预训练源模型提取的配方特征中计算出来的。对Sichuan,Yue和Washoku三种不同菜系进行的实验表明,所提出的方法在所有转移中均能实现最先进的性能。