In this paper we want to address the problem of automation for recognition of photographed cooking dishes and generating the corresponding food recipes. Current image-to-recipe models are computation expensive and require powerful GPUs for model training and implementation. High computational cost prevents those existing models from being deployed on portable devices, like smart phones. To solve this issue we introduce a lightweight image-to-recipe prediction model, RecipeSnap, that reduces memory cost and computational cost by more than 90% while still achieving 2.0 MedR, which is in line with the state-of-the-art model. A pre-trained recipe encoder was used to compute recipe embeddings. Recipes from Recipe1M dataset and corresponding recipe embeddings are collected as a recipe library, which are used for image encoder training and image query later. We use MobileNet-V2 as image encoder backbone, which makes our model suitable to portable devices. This model can be further developed into an application for smart phones with a few effort. A comparison of the performance between this lightweight model to other heavy models are presented in this paper. Code, data and models are publicly accessible on github.
翻译:在本文中,我们想要解决识别摄影烹饪餐具和制作相应食谱的自动化问题。 目前的图像到配方模型计算费用昂贵,需要强大的GPU来进行模型培训和实施。 高计算成本使得现有模型无法被安装在便携式设备上, 如智能手机。 为了解决这个问题, 我们引入了轻量图像到配方的预测模型, ReeppeSnap, 将记忆成本和计算成本降低90%以上, 但仍能达到2.0 MedR, 这与最新模型一致。 一个预修的配方编码用于计算配方嵌嵌入。 将Recipe1M数据集和相应的配方嵌入作为配方库, 以后用于图像编码培训和图像查询。 我们用MiveNet-V2作为图像编码主干线, 这使得我们的模型适合便携式设备。 这个模型可以进一步开发成智能手机应用程序, 少数努力。 将这一轻量模型与其他重模型的性能进行对比, 本文的模型可以公开查阅。 代码 数据 和图案模型是 。