Template matching is a fundamental task in computer vision and has been studied for decades. It plays an essential role in manufacturing industry for estimating the poses of different parts, facilitating downstream tasks such as robotic grasping. Existing methods fail when the template and source images have different modalities, cluttered backgrounds or weak textures. They also rarely consider geometric transformations via homographies, which commonly exist even for planar industrial parts. To tackle the challenges, we propose an accurate template matching method based on differentiable coarse-to-fine correspondence refinement. We use an edge-aware module to overcome the domain gap between the mask template and the grayscale image, allowing robust matching. An initial warp is estimated using coarse correspondences based on novel structure-aware information provided by transformers. This initial alignment is passed to a refinement network using references and aligned images to obtain sub-pixel level correspondences which are used to give the final geometric transformation. Extensive evaluation shows that our method is significantly better than state-of-the-art methods and baselines, providing good generalization ability and visually plausible results even on unseen real data.
翻译:模板匹配是计算机愿景中的一项基本任务,并已进行了数十年的研究。 它在制造业中发挥着关键作用, 以估算不同部件的构成, 便利下游任务( 如机器人抓取) 。 当模板和源图像有不同模式、 背景混杂或质素薄弱时, 现有方法会失败。 它们也很少考虑通过同质谱进行几何转换, 即使是平板工业部件也普遍存在这种转换。 为了应对挑战, 我们提议了一个精确的模板匹配方法, 其依据是不同的粗皮至平面通信的改进。 我们使用边宽度模块来克服遮罩模板和灰度图像之间的域间差距, 允许进行强力匹配。 初始扭曲使用基于变异器提供的新结构认知信息的粗略通信进行估算。 初始对齐将传递到一个精细化的网络, 使用引用和调整图像获得子像素级通信, 用于进行最终的几何转换。 广泛的评估显示, 我们的方法大大优于最新的方法和基线, 提供良好的概括能力, 以及甚至可见真实数据的视觉的直观结果。</s>