We present a new learning-based framework to recover vehicle pose in SO(3) from a single RGB image. In contrast to previous works that map from local appearance to observation angles, we explore a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs) to estimate egocentric vehicle orientation. This approach features a deep model that transforms perceived intensities to IGRs, which are mapped to a 3D representation encoding object orientation in the camera coordinate system. Core problems are what IGRs to use and how to learn them more effectively. We answer the former question by designing IGRs based on an interpolated cuboid that derives from primitive 3D annotation readily. The latter question motivates us to incorporate geometry knowledge with a new loss function based on a projective invariant. This loss function allows unlabeled data to be used in the training stage to improve representation learning. Without additional labels, our system outperforms previous monocular RGB-based methods for joint vehicle detection and pose estimation on the KITTI benchmark, achieving performance even comparable to stereo methods. Code and pre-trained models are available at this https URL.
翻译:我们提出了一个新的学习框架,从单一的 RGB 图像中恢复车辆在SO(3) 中的位置。与以前从当地外观到观察角度的地图相比,我们探索了一种渐进式的方法,通过提取有意义的中间几何表示法(IGRs)来估计自我偏重的车辆方向。这个方法具有将感知强度转化为IGRs的深层模型,该模型被映射到摄像协调系统中的3D 表示编码对象方向。核心问题是IGRs应使用什么,以及如何更有效地学习这些物体。我们通过在原始3D 说明中很容易产生的内插型幼类设计IGRs来回答前一个问题。后一个问题激励我们采用基于投影性投影性的新损失函数纳入几何学知识。这一损失函数允许在培训阶段使用未加标签的数据来改进代表学习。没有附加标签,我们的系统就比以前以单项RGB 为基础的车辆联合探测方法要强得多,并且对KITTI 基准作出估计,达到甚至与立体法方法相近。这个 AL AL UR URL URL 。