Every day, a new method is published to tackle Few-Shot Image Classification, showing better and better performances on academic benchmarks. Nevertheless, we observe that these current benchmarks do not accurately represent the real industrial use cases that we encountered. In this work, through both qualitative and quantitative studies, we expose that the widely used benchmark tieredImageNet is strongly biased towards tasks composed of very semantically dissimilar classes e.g. bathtub, cabbage, pizza, schipperke, and cardoon. This makes tieredImageNet (and similar benchmarks) irrelevant to evaluate the ability of a model to solve real-life use cases usually involving more fine-grained classification. We mitigate this bias using semantic information about the classes of tieredImageNet and generate an improved, balanced benchmark. Going further, we also introduce a new benchmark for Few-Shot Image Classification using the Danish Fungi 2020 dataset. This benchmark proposes a wide variety of evaluation tasks with various fine-graininess. Moreover, this benchmark includes many-way tasks (e.g. composed of 100 classes), which is a challenging setting yet very common in industrial applications. Our experiments bring out the correlation between the difficulty of a task and the semantic similarity between its classes, as well as a heavy performance drop of state-of-the-art methods on many-way few-shot classification, raising questions about the scaling abilities of these methods. We hope that our work will encourage the community to further question the quality of standard evaluation processes and their relevance to real-life applications.
翻译:每天发布一种新方法,处理少许热图像分类,在学术基准方面表现更好、更好。然而,我们观察到,这些当前基准并不准确地代表我们遇到的实际工业使用案例。在这项工作中,通过定性和定量研究,我们发现,广泛使用的基准分级图像网强烈偏向于由非常不相同的类别构成的任务,如浴缸、卷心菜、披萨、披萨、奇普尔克和卡通。这使得分级图像网(和类似的应用基准)与评估模型解决实际使用案例的能力无关,这些案例通常涉及更精细的分类。我们利用关于分级图像网等级的语义信息来减少这种偏见,并产生一个更完善、平衡的基准。此外,我们还采用丹麦文吉2020数据集,为少类图像分类引入了一个新的基准。这个基准提出了多种多样的精细评价任务。此外,这一基准包括许多方面的任务(例如由100个类组成的分级组成的实际使用案例,通常涉及更精细的分类。 )我们利用分级网络的准确性评估过程的难度很大,这是我们共同工业应用中一个非常艰巨的任务。