We present a monocular object parsing framework for consistent keypoint localization by capturing temporal correlation on sequential data. In this paper, we propose a novel recurrent network based architecture to model long-range dependencies between intermediate features which are highly useful in tasks like keypoint localization and tracking. We leverage the expressiveness of the popular stacked hourglass architecture and augment it by adopting memory units between intermediate layers of the network with weights shared across stages for video frames. We observe that this weight sharing scheme not only enables us to frame hourglass architecture as a recurrent network but also prove to be highly effective in producing increasingly refined estimates for sequential tasks. Furthermore, we propose a new memory cell, we call CoordConvGRU which learns to selectively preserve spatio-temporal correlation and showcase our results on the keypoint localization task. The experiments show that our approach is able to model the motion dynamics between the frames and significantly outperforms the baseline hourglass network. Even though our network is trained on a synthetically rendered dataset, we observe that with minimal fine tuning on 300 real images we are able to achieve performance at par with various state-of-the-art methods trained with the same level of supervisory inputs. By using a simpler architecture than other methods enables us to run it in real time on a standard GPU which is desirable for such applications. Finally, we make our architectures and 524 annotated sequences of cars from KITTI dataset publicly available.
翻译:我们提出一个单一对象分析框架,通过在相继数据上捕捉时间相关性,实现一致关键点本地化。 在本文中,我们提出一个新的经常性网络架构,以模拟中间特征之间的长距离依赖性,这些特征在关键点本地化和跟踪等任务中非常有用。我们利用广受欢迎的堆叠沙漏玻璃结构的清晰度,并通过在网络中间层之间采用记忆单元,使图像框架各阶段的重量共享,来扩大这一结构。我们观察到,这种重量共享计划不仅使我们能够将沙漏结构作为一个经常性网络,而且证明在为相继任务编制日益精细的估计数方面非常有效。此外,我们建议建立一个新的记忆细胞,我们叫Coord ConvGRU,它学会有选择地保存时空相关性,并展示我们在关键点本地化任务上的成果。实验表明,我们的方法能够建模在框架之间,而大大超越基准玻璃网络。尽管我们的网络是经过合成的数据集培训的,但我们看到,对300张真实图像的微调效果非常有效。我们可以用新的存储器来实现业绩,在各种州级结构中,因此,要以更精确的方式运行了我们的标准结构。