Robust and accurate planar tracking over a whole video sequence is vitally important for many vision applications. The key to planar object tracking is to find object correspondences, modeled by homography, between the reference image and the tracked image. Existing methods tend to obtain wrong correspondences with changing appearance variations, camera-object relative motions and occlusions. To alleviate this problem, we present a unified convolutional neural network (CNN) model that jointly considers homography, visibility, and confidence. First, we introduce correlation blocks that explicitly account for the local appearance changes and camera-object relative motions as the base of our model. Second, we jointly learn the homography and visibility that links camera-object relative motions with occlusions. Third, we propose a confidence module that actively monitors the estimation quality from the pixel correlation distributions obtained in correlation blocks. All these modules are plugged into a Lucas-Kanade (LK) tracking pipeline to obtain both accurate and robust planar object tracking. Our approach outperforms the state-of-the-art methods on public POT and TMT datasets. Its superior performance is also verified on a real-world application, synthesizing high-quality in-video advertisements.
翻译:整个视频序列的固态和准确的平板跟踪对于许多视觉应用至关重要。 平面天体跟踪的关键是找到参考图像和跟踪图像之间的对象对应物, 以同质法为模型。 现有方法往往以变化的外观变异、 相机- 物体相对动作和分解方式获得错误的对应物。 为了缓解这一问题, 我们提出了一个合并的神经神经神经网络( CNN) 模型, 共同考虑同质、 可见度和信任性。 首先, 我们引入了明确反映本地外观变化和相机- 物体相对动作的关联块块作为模型的基础。 其次, 我们共同学习将相机- 物体相对动作与隐蔽点链接的同质和可见度。 第三, 我们提出一个信任模块, 积极监测从相关区获得的像素相关分布的估算质量。 所有这些模块都连接到卢卡斯- Kanade ( LK) 跟踪管道, 以获得准确和稳健的平板物体跟踪。 我们的方法超越了公共 POT 和 TMT 数据集上的最新方法。 高级性广告应用也在现实世界中验证了高质量。