通过利用本地框架集对点云登记进行重叠培训,准确生成地面真相深度图像 (Accurate Ground-Truth Depth Image Generation via Overfit Training of Point Cloud Registration using Local Frame Sets)

Accurate three-dimensional perception is a fundamental task in several computer vision applications. Recently, commercial RGB-depth (RGB-D) cameras have been widely adopted as single-view depth-sensing devices owing to their efficient depth-sensing abilities. However, the depth quality of most RGB-D sensors remains insufficient owing to the inherent noise from a single-view environment. Recently, several studies have focused on the single-view depth enhancement of RGB-D cameras. Recent research has proposed deep-learning-based approaches that typically train networks using high-quality supervised depth datasets, which indicates that the quality of the ground-truth (GT) depth dataset is a top-most important factor for accurate system; however, such high-quality GT datasets are difficult to obtain. In this study, we developed a novel method for high-quality GT depth generation based on an RGB-D stream dataset. First, we defined consecutive depth frames in a local spatial region as a local frame set. Then, the depth frames were aligned to a certain frame in the local frame set using an unsupervised point cloud registration scheme. The registration parameters were trained based on an overfit-training scheme, which was primarily used to construct a single GT depth image for each frame set. The final GT depth dataset was constructed using several local frame sets, and each local frame set was trained independently. The primary advantage of this study is that a high-quality GT depth dataset can be constructed under various scanning environments using only the RGB-D stream dataset. Moreover, our proposed method can be used as a new benchmark GT dataset for accurate performance evaluations. We evaluated our GT dataset on previously benchmarked GT depth datasets and demonstrated that our method is superior to state-of-the-art depth enhancement frameworks.

翻译：精确的三维感知是若干计算机视觉应用中的一项基本任务。最近, 商业的 RGB 深度( RGB- D) 相机因其高效的深度感测能力而被广泛用作单一视图深度感测设备。然而, 大部分 RGB- D 传感器的深度质量仍然不够, 这是因为从单一视图环境中固有的噪音。最近, 一些研究侧重于加强 RGB- D 相机的单视图深度。最近的研究提出了基于深层次的深层次学习方法, 通常使用高质量的监督深度数据集来培训网络, 这表明地- GGB 深度数据集的质量是准确系统最重要的因素; 然而, 如此高质量的 GT 数据集的深度质量仍然难以获得。在此研究中, 我们开发了一种基于 RGB- D 流数据集的高质量GT 深度生成方法。首先, 我们将本地空间区域连续的深度框架定义为本地框架的优势。然后, 深度框架与本地框架的某个框架相匹配, 使用不精确的精度的精度( GT) 深度( GT) (GT) 深度) (GT) (GT) (G) (G) (G) (G) (G) (G) (G) (G) (G) (G) (G) (G) (的深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度) (深度