Food recognition plays an important role in food choice and intake, which is essential to the health and well-being of humans. It is thus of importance to the computer vision community, and can further support many food-oriented vision and multimodal tasks. Unfortunately, we have witnessed remarkable advancements in generic visual recognition for released large-scale datasets, yet largely lags in the food domain. In this paper, we introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images.Compared with existing food recognition datasets, Food2K bypasses them in both categories and images by one order of magnitude, and thus establishes a new challenging benchmark to develop advanced models for food visual representation learning. Furthermore, we propose a deep progressive region enhancement network for food recognition, which mainly consists of two components, namely progressive local feature learning and region feature enhancement. The former adopts improved progressive training to learn diverse and complementary local features, while the latter utilizes self-attention to incorporate richer context with multiple scales into local features for further local feature enhancement. Extensive experiments on Food2K demonstrate the effectiveness of our proposed method. More importantly, we have verified better generalization ability of Food2K in various tasks, including food recognition, food image retrieval, cross-modal recipe retrieval, food detection and segmentation. Food2K can be further explored to benefit more food-relevant tasks including emerging and more complex ones (e.g., nutritional understanding of food), and the trained models on Food2K can be expected as backbones to improve the performance of more food-relevant tasks. We also hope Food2K can serve as a large scale fine-grained visual recognition benchmark.
翻译:粮食确认在食物选择和摄取方面起着重要作用,这是人类健康和福祉所必不可少的,因此对计算机视觉界十分重要,可以进一步支持许多面向食品的愿景和多式联运任务。不幸的是,我们看到在对发布大型数据集的通用直观识别方面取得了显著进展,但在粮食领域却在很大程度上落后。在本文件中,我们引入了粮食2K,这是食物识别数据的最大数据集,有2 000个类别和100多万图象。与现有的食品识别数据集相比,Food2K在类别和图像上都绕过这两个类别和图像,从而可以建立一个新的具有挑战性的基准,以开发食品视觉表现学习的先进模型。此外,我们建议建立一个由两大组成部分组成的食品识别区域强化网络,即地方特征学习进步和地区特征增强。我们采用改进的渐进培训,学习多样化和互补的地方特征,而后者利用自我关注将更丰富的背景和多种规模的本地特征纳入本地特征,以进一步增强本地特征。我们对食品2K的大规模测试展示了我们拟议方法的有效性。 更重要的是,我们进一步探索食品评估的预期能力,包括食品恢复到食品的预期部分。</s>