你的"拉布拉多"就是我的"狗": (Your "Labrador" is My "Dog": Fine-Grained, or Not)

Whether what you see in Figure 1 is a "labrador" or a "dog", is the question we ask in this paper. While fine-grained visual classification (FGVC) strives to arrive at the former, for the majority of us non-experts just "dog" would probably suffice. The real question is therefore -- how can we tailor for different fine-grained definitions under divergent levels of expertise. For that, we re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy -- so that our answer becomes "dog"-->"gun dog"-->"retriever"-->"labrador". To approach this new problem, we first conduct a comprehensive human study where we confirm that most participants prefer multi-granularity labels, regardless whether they consider themselves experts. We then discover the key intuition that: coarse-level label prediction exacerbates fine-grained feature learning, yet fine-level feature betters the learning of coarse-level classifier. This discovery enables us to design a very simple albeit surprisingly effective solution to our new problem, where we (i) leverage level-specific classification heads to disentangle coarse-level features with fine-grained ones, and (ii) allow finer-grained features to participate in coarser-grained label predictions, which in turn helps with better disentanglement. Experiments show that our method achieves superior performance in the new FGVC setting, and performs better than state-of-the-art on traditional single-label FGVC problem as well. Thanks to its simplicity, our method can be easily implemented on top of any existing FGVC frameworks and is parameter-free.

翻译：因此,在图1中,你所看到的是“labrador”还是“dog ” 是本文中我们提出的问题。尽管我们精细的视觉分类(FGVC)努力达到前者,但我们大多数非专家只是“dog”才可能足够。因此,真正的问题是:我们如何在不同的专业知识水平下为不同的细微定义裁剪。为此,我们再次审视FGVC的传统设置,从单标签分类到预先定义的粗到软标签等级的自上而下的翻版。尽管我们的答复是“gog” - >“gun dog” ->“retraiver” ->“labard” 。为了处理这个新问题,我们首先进行一项全面的人类研究,我们确认大多数参与者更喜欢多色标签,而不管他们是否认为自己是专家。然后我们发现关键直觉:粗劣的标签预测会加剧新式的状态特征学习,但精细的层次特征会有助于我们更精细的高级性特性的特性,我们更精锐的特性会帮助我们学习更精细的精细的精细的精细的精细的精细的精细的精细的精细的分类。