Food classification serves as the basic step of image-based dietary assessment to predict the types of foods in each input image. However, food image predictions in a real world scenario are usually long-tail distributed among different food classes, which cause heavy class-imbalance problems and a restricted performance. In addition, none of the existing long-tailed classification methods focus on food data, which can be more challenging due to the lower inter-class and higher intra-class similarity among foods. In this work, we first introduce two new benchmark datasets for long-tailed food classification including Food101-LT and VFN-LT where the number of samples in VFN-LT exhibits the real world long-tailed food distribution. Then we propose a novel 2-Phase framework to address the problem of class-imbalance by (1) undersampling the head classes to remove redundant samples along with maintaining the learned information through knowledge distillation, and (2) oversampling the tail classes by performing visual-aware data augmentation. We show the effectiveness of our method by comparing with existing state-of-the-art long-tailed classification methods and show improved performance on both Food101-LT and VFN-LT benchmarks. The results demonstrate the potential to apply our method to related real life applications.
翻译:食品分类是以图像为基础的膳食评估的基本步骤,以预测每种投入图像中的食品类型;然而,在现实世界情景中,食品图像预测通常在不同的食品类别中分布长尾,造成严重的阶级不平衡问题和有限的性能;此外,现有的长尾分类方法没有一个侧重于食品数据,由于低阶级之间和高阶层内部食物相似性,这些数据可能更具挑战性;在这项工作中,我们首先为长期食品分类引入两个新的基准数据集,包括Food101-LT和VFNF-LT, 食品分类样本数量显示真实的世界长尾食品分配情况;然后,我们提出一个新的2阶段框架,以解决阶级平衡问题,即(1) 低估头类,删除多余的样品,同时通过知识蒸馏保持所学得的信息;(2) 通过进行视觉觉悟数据增强,过度抽取尾类。