Fine-Grained Visual Recognition (FGVR) tackles the problem of distinguishing highly similar categories. One of the main approaches to FGVR, namely subset learning, tries to leverage information from existing class taxonomies to improve the performance of deep neural networks. However, these methods rely on the existence of handcrafted hierarchies that are not necessarily optimal for the models. In this paper, we propose ELFIS, an expert learning framework for FGVR that clusters categories of the dataset into meta-categories using both dataset-inherent lexical and model-specific information. A set of neural networks-based experts are trained focusing on the meta-categories and are integrated into a multi-task framework. Extensive experimentation shows improvements in the SoTA FGVR benchmarks of up to +1.3% of accuracy using both CNNs and transformer-based networks. Overall, the obtained results evidence that ELFIS can be applied on top of any classification model, enabling the obtention of SoTA results. The source code will be made public soon.
翻译:精密视觉识别(FGVR)解决了区别高度相似类别的问题。FGVR的主要方法之一是子学习,试图利用现有阶级分类系统的信息来改进深神经网络的性能。然而,这些方法依赖于手工艺的等级结构的存在,这些结构不一定最适合模型。在本文中,我们提议ELFIS,这是FGVR的一个专家学习框架,将数据集的类别分组成元类,同时使用数据集固有词汇和模型特定信息。一组神经网络专家接受了侧重于元分类的培训,并被纳入一个多任务框架。广泛的实验显示,Sota FGVR基准的改进程度,即使用CNN和变压网络的精度达到+1.3%。总体而言,所获得的结果证据表明,在任何分类模式之外,可以应用ELFIS,从而能够保持 SoTA的结果。源代码将很快公之于众。</s>