In our framework, an object is made up of K distinct parts or units, and we parse a test instance by inferring the K parts, where each part occupies a distinct location in the feature space, and the instance features at this location, manifest as an active subset of part templates shared across all instances. We recognize test instances by comparing its active templates and the relative geometry of its part locations against those of the presented few-shot instances. We propose an end-to-end training method to learn part templates on-top of a convolutional backbone. To combat visual distortions such as orientation, pose and size, we learn multi-scale templates, and at test-time parse and match instances across these scales. We show that our method is competitive with the state-of-the-art, and by virtue of parsing enjoys interpretability as well.
翻译:在我们的框架中,一个对象由 K 不同的部件或单位组成,我们通过推断K 部件,其中每个部件在特征空间中占有一个不同的位置来分析一个试验实例,以及该位置的实例特征,作为所有实例共享的部分模板的一个活跃子集,表现为所有实例共享的部分模板。我们通过比较其活动模板及其部分位置的相对几何与所展示的微小实例的相对几何来认识试验实例。我们提议了一个端对端培训方法,以在组合骨干上方学习部分模板。为了打击方向、外观和大小等视觉扭曲,我们学习了多尺度的模板,在测试时段分析并匹配了这些尺度的范例。我们表明,我们的方法与最先进的模板具有竞争力,而且由于可以使用的解释性加以区分,我们的方法也具有竞争力。