We study the problem of learning to assign a characteristic pose, i.e., scale and orientation, for an image region of interest. Despite its apparent simplicity, the problem is non-trivial; it is hard to obtain a large-scale set of image regions with explicit pose annotations that a model directly learns from. To tackle the issue, we propose a self-supervised learning framework with a histogram alignment technique. It generates pairs of image patches by random rescaling/rotating and then train an estimator to predict their scale/orientation values so that their relative difference is consistent with the rescaling/rotating used. The estimator learns to predict a non-parametric histogram distribution of scale/orientation without any supervision. Experiments show that it significantly outperforms previous methods in scale/orientation estimation and also improves image matching and 6 DoF camera pose estimation by incorporating our patch poses into a matching process.
翻译:我们研究的是学习给一个感兴趣的图像区域指定一个特征的形状,即比例和方向的问题。尽管问题明显简单,但问题是非三角的;很难获得一组大型的图像区域,其清晰的外表说明是模型直接学习出来的。为了解决这个问题,我们提议一个自我监督的学习框架,并采用直方图校正技术。它通过随机调整/旋转生成一对图像补丁,然后训练一个估计器来预测其比例/方向值,从而使其相对差异与所使用的比例/旋转一致。估计器学会在没有监督的情况下预测比例/方向的非对称直方分布。实验显示,它大大超过以前在比例/方向估计方面采用的方法,还改进了图像匹配,6 DOF相机通过将我们的配对成成成成成成成成成相匹配的过程来作出估计。