Image-based dietary assessment refers to the process of determining what someone eats and how much energy and nutrients are consumed from visual data. Food classification is the first and most crucial step. Existing methods focus on improving accuracy measured by the rate of correct classification based on visual information alone, which is very challenging due to the high complexity and inter-class similarity of foods. Further, accuracy in food classification is conceptual as description of a food can always be improved. In this work, we introduce a new food classification framework to improve the quality of predictions by integrating the information from multiple domains while maintaining the classification accuracy. We apply a multi-task network based on a hierarchical structure that uses both visual and nutrition domain specific information to cluster similar foods. Our method is validated on the modified VIPER-FoodNet (VFN) food image dataset by including associated energy and nutrient information. We achieve comparable classification accuracy with existing methods that use visual information only, but with less error in terms of energy and nutrient values for the wrong predictions.
翻译:以图像为基础的饮食评估是指确定某人吃什么以及从视觉数据中消耗多少能量和营养素的过程。食物分类是第一步和最重要的一步。现有方法侧重于提高仅以视觉信息为基础的正确分类率所测量的准确性,由于食品的高度复杂性和类别间相似性,这非常具有挑战性。此外,食品分类的准确性是概念性的,因为对食物的描述总是可以改进。在这项工作中,我们引入了新的食品分类框架,通过整合多个领域的信息,同时保持分类准确性,提高预测质量。我们采用基于等级结构的多任务网络,将视觉和营养领域特定信息用于类似食物的分组。我们的方法在修改后的VIPER-FoodNet(VFNF)食品图像数据集上得到验证,将相关的能源和营养信息包括在内。我们与仅使用视觉信息的现有方法实现了可比的分类准确性,但错误预测的能源和营养值方面则不差。