Retrieving the missing dimension information in acoustic images from 2D forward-looking sonar is a well-known problem in the field of underwater robotics. There are works attempting to retrieve 3D information from a single image which allows the robot to generate 3D maps with fly-through motion. However, owing to the unique image formulation principle, estimating 3D information from a single image faces severe ambiguity problems. Classical methods of multi-view stereo can avoid the ambiguity problems, but may require a large number of viewpoints to generate an accurate model. In this work, we propose a novel learning-based multi-view stereo method to estimate 3D information. To better utilize the information from multiple frames, an elevation plane sweeping method is proposed to generate the depth-azimuth-elevation cost volume. The volume after regularization can be considered as a probabilistic volumetric representation of the target. Instead of performing regression on the elevation angles, we use pseudo front depth from the cost volume to represent the 3D information which can avoid the 2D-3D problem in acoustic imaging. High-accuracy results can be generated with only two or three images. Synthetic datasets were generated to simulate various underwater targets. We also built the first real dataset with accurate ground truth in a large scale water tank. Experimental results demonstrate the superiority of our method, compared to other state-of-the-art methods.
翻译:在水下机器人领域,从 2D 前瞻性声纳的声学图像中检索缺失的维维度信息是一个众所周知的问题。 正在试图从一个单一图像中检索三维信息, 使机器人能够以飞过运动生成三维地图。 但是, 由于独特的图像配置原则, 从一个图像中估算三维信息会面临严重的模糊问题。 多视立体立体的经典方法可以避免模糊问题, 但可能需要大量的观点来生成准确的模型。 在这项工作中, 我们提议一种基于学习的多视立体立体立体法来估算三维信息。 为了更好地利用多个框架的信息, 提议了一个高平面扫荡方法来生成深度的三维升空地图。 正规化后的体积可以被视为目标的概率性体积表示。 我们使用成本量的假前深深度来代表三维信息, 从而避免声学成2D-3D问题。 高精确度结果只能用两或三张图像生成。 为了更好地利用两个或三个图像生成高端平面图像, 合成平面平平面数据比了我们所生成的地面模型, 也模拟了其他的地面级数据。 模拟模型比了我们模拟了其他的地面数据。