We present a model for the joint estimation of disparity and motion. The model is based on learning about the interrelations between images from multiple cameras, multiple frames in a video, or the combination of both. We show that learning depth and motion cues, as well as their combinations, from data is possible within a single type of architecture and a single type of learning algorithm, by using biologically inspired "complex cell" like units, which encode correlations between the pixels across image pairs. Our experimental results show that the learning of depth and motion makes it possible to achieve state-of-the-art performance in 3-D activity analysis, and to outperform existing hand-engineered 3-D motion features by a very large margin.
翻译:我们提出了一个用于共同估计差异和运动的模型。 该模型基于对多摄像头图像、视频中多个框架或两者结合的图像之间的相互关系的学习。 我们显示,在单一类型的结构和单一类型的学习算法中,通过使用生物启发的“复合细胞”等单位,从数据中学习深度和运动提示及其组合是可能的,这些单位将图像对立像素的相互关系编码起来。 我们的实验结果表明,深度和运动的学习使得有可能在3D活动分析中实现最先进的性能,并且能够以非常大的范围超越现有的手工设计的3D运动特征。