Tracking pixels in videos is typically studied as an optical flow estimation problem, where every pixel is described with a displacement vector that locates it in the next frame. Even though wider temporal context is freely available, prior efforts to take this into account have yielded only small gains over 2-frame methods. In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames. We re-build this classic approach using components that drive the current state-of-the-art in flow and object tracking, such as dense cost maps, iterative optimization, and learned appearance updates. We train our models using long-range amodal point trajectories mined from existing optical flow data that we synthetically augment with multi-frame occlusions. We test our approach in trajectory estimation benchmarks and in keypoint label propagation tasks, and compare favorably against state-of-the-art optical flow and feature tracking methods.
翻译:视频中的跟踪像素通常被作为一种光学流估测问题来研究, 每一个像素都被描述成一个离位矢量, 将它定位在下一个框架。 尽管更宽的时间环境是免费的, 但先前考虑到这一点的努力在两个框架的方法上只取得了小的收益。 在本文中, 我们重新审视沙子和Teller的“ 粒子视频” 方法, 并研究像素追踪作为长距离运动估测问题, 每个像素被描述成一个轨迹, 将其定位在多个未来框架中。 我们利用驱动当前最先进的流动和对象跟踪的组件, 如密集的成本地图、 迭接优化和 学习的外观更新等, 重建了这一经典方法。 我们用远程模式点轨迹定点轨, 我们用多框架封闭性合成的光流数据来采集。 我们在轨迹估测基准和关键点标签的传播任务中测试我们的方法, 并且将这种方法与最先进的光学流动和特征跟踪方法相比较。