We propose a new method for fine-grained few-shot recognition via deep object parsing.In our framework, an object is made up of K distinct parts and for each part, we learn a dictionary of templates, which is shared across all instances and categories. An object is parsed by estimating the locations of these K parts and a set of active templates that can reconstruct the part features. We recognize test instances by comparing its active templates and the relative geometry of its part locations against those of the presented few-shot instances. Our method is end-to-end trainable to learn part templates on-top of a convolutional backbone. To combat visual distortions such as orientation, pose and size, we learn templates at multiple scales, and at test-time parse and match instances across these scales. We show that our method is competitive with the state-of-the-art, and by virtue of parsing enjoys interpretability as well.
翻译:我们提出了一个通过深对象解析微微微微分辨识的新方法。 在我们的框架中, 一个对象由 K 不同部分组成, 每个部分, 我们学习一个模板字典, 在所有实例和类别中共享。 一个对象通过估计这些 K 部分的位置进行解析, 以及一套能重建部分特征的活跃模板集体。 我们通过比较其活动模板和其部分位置相对几何性来识别测试实例。 我们的方法是端到端的训练, 以在组合骨干顶部学习部分模板。 为了克服方向、 形状和大小等视觉扭曲, 我们学习了多个尺度的模板, 测试时分析并匹配了这些尺度的范例 。 我们显示我们的方法与状态相比具有竞争力, 并且通过剖析也享有可解释性 。