In many applications of computer vision it is important to accurately estimate the trajectory of an object over time by fusing data from a number of sources, of which 2D and 3D imagery is only one. In this paper, we show how to use a deep feature encoding in conjunction with generative densities over the features in a factor-graph based, probabilistic tracking framework. We present a likelihood model that combines a learned feature encoder with generative densities over them, both trained in a supervised manner. We also experiment with directly inferring probability through the use of image classification models that feed into the likelihood formulation. These models are used to implement deep factors that are added to the factor graph to complement other factors that represent domain-specific knowledge such as motion models and/or other prior information. Factors are then optimized together in a non-linear least-squares tracking framework that takes the form of an Extended Kalman Smoother with a Gaussian prior. A key feature of our likelihood model is that it leverages the Lie group properties of the tracked target's pose to apply the feature encoding on an image patch, extracted through a differentiable warp function inspired by spatial transformer networks. To illustrate the proposed approach we evaluate it on a challenging social insect behavior dataset, and show that using deep features does outperform these earlier linear appearance models used in this setting.
翻译:在计算机视觉的许多应用中,重要的是通过从若干来源(其中2D和3D图像只是其中之一)收集数据,准确估计一个物体的长期轨迹。在本文件中,我们展示了如何在基于要素绘图的概率跟踪框架中,与特征的基因密度结合使用一个深度特性编码,同时使用基于要素绘图的参数,我们展示了一种可能的模式,将一个已学的特征编码与基因缩放密度结合起来,两者都经过监督培训。我们还试验了直接推断概率,方法是利用图像分类模型,作为可能性的配方。这些模型用于实施深层因素,添加到要素图中,以补充代表特定领域知识的其他因素,如运动模型和/或先前的其他信息。然后,在非线性最小的最小方位跟踪框架中优化各种因素,其形式是扩展的Kalman光滑动器,其前两者都是经过监督培训的。我们的可能性模型的一个关键特征是,它利用跟踪目标的 Lie组特性在图像模型中应用特征编码,这些深层次模型是添加的,通过移动模型来补充代表特定特定领域知识,例如运动模型,通过早期的变形模型,然后通过先前的变形模型,通过我们用来展示了这些变形模型,然后用不同的演算出一个具有挑战性的社会形态。