Current computer vision tasks based on deep learning require a huge amount of data with annotations for model training or testing, especially in some dense estimation tasks, such as optical flow segmentation and depth estimation. In practice, manual labeling for dense estimation tasks is very difficult or even impossible, and the scenes of the dataset are often restricted to a small range, which dramatically limits the development of the community. To overcome this deficiency, we propose a synthetic dataset generation method to obtain the expandable dataset without burdensome manual workforce. By this method, we construct a dataset called MineNavi containing video footages from first-perspective-view of the aircraft matched with accurate ground truth for depth estimation in aircraft navigation application. We also provide quantitative experiments to prove that pre-training via our MineNavi dataset can improve the performance of depth estimation model and speed up the convergence of the model on real scene data. Since the synthetic dataset has a similar effect to the real-world dataset in the training process of deep model, we also provide additional experiments with monocular depth estimation method to demonstrate the impact of various factors in our dataset such as lighting conditions and motion mode.
翻译:目前基于深层学习的计算机愿景任务需要大量数据说明,用于示范培训或测试,特别是光学流动分解和深度估计等密集估计任务。在实践中,对密集估计任务进行人工标记非常困难,甚至不可能,数据集的场景往往限于小范围,这大大限制了社区的发展。为克服这一缺陷,我们提议了一个合成数据集生成方法,以便在没有繁重的人工劳动力的情况下获得可扩展的数据集。我们采用这种方法,建造了一个称为MineNavi的数据集,其中包含飞机第一视视景的视频片段,与准确的地面真相相匹配,以进行飞机导航应用的深度估计。我们还提供定量实验,证明通过MineNavi数据集进行预先培训可以改进深度估计模型的性能,加快模型在实际现场数据上的趋同速度。由于合成数据集具有与深层模型培训过程中真实世界数据集的类似效果,我们还以单层深度估计方法提供额外的实验,以显示我们数据集中各种因素的影响,例如照明条件和运动模式。