We introduce the problem of predicting, from a single video frame, a low-dimensional subspace of optical flow which includes the actual instantaneous optical flow. We show how several natural scene assumptions allow us to identify an appropriate flow subspace via a set of basis flow fields parameterized by disparity and a representation of object instances. The flow subspace, together with a novel loss function, can be used for the tasks of predicting monocular depth or predicting depth plus an object instance embedding. This provides a new approach to learning these tasks in an unsupervised fashion using monocular input video without requiring camera intrinsics or poses.
翻译:我们引入了从单一视频框中预测光流的低维次空间的问题,其中包括实际瞬时光流。我们展示了几个自然场景假设如何使我们能够通过一组基流场确定适当的流子空间,这些基流场的参数因差异和物体实例的表示而不同。流动子空间加上新的损失功能,可用于预测单方深度或预测深度,加上一个对象实例嵌入。这提供了一种新的方法,用单镜输入视频在不受监督的情况下学习这些任务,而不需要相机的内在或外表。