GOToNet: 快速单显光照射和探索 (GoToNet: Fast Monocular Scene Exposure and Exploration)

Autonomous scene exposure and exploration, especially in localization or communication-denied areas, useful for finding targets in unknown scenes, remains a challenging problem in computer navigation. In this work, we present a novel method for real-time environment exploration, whose only requirements are a visually similar dataset for pre-training, enough lighting in the scene, and an on-board forward-looking RGB camera for environmental sensing. As opposed to existing methods, our method requires only one look (image) to make a good tactical decision, and therefore works at a non-growing, constant time. Two direction predictions, characterized by pixels dubbed the Goto and Lookat pixels, comprise the core of our method. These pixels encode the recommended flight instructions in the following way: the Goto pixel defines the direction in which the agent should move by one distance unit, and the Lookat pixel defines the direction in which the camera should be pointing at in the next step. These flying-instruction pixels are optimized to expose the largest amount of currently unexplored areas. Our method presents a novel deep learning-based navigation approach that is able to solve this problem and demonstrate its ability in an even more complicated setup, i.e., when computational power is limited. In addition, we propose a way to generate a navigation-oriented dataset, enabling efficient training of our method using RGB and depth images. Tests conducted in a simulator evaluating both the sparse pixels' coordinations inferring process, and 2D and 3D test flights aimed to unveil areas and decrease distances to targets achieve promising results. Comparison against a state-of-the-art algorithm shows our method is able to overperform it, that while measuring the new voxels per camera pose, minimum distance to target, percentage of surface voxels seen, and compute time metrics.

翻译：本地化或通信封闭区等自发的场景曝光和探索,对于在未知场景中找到目标非常有用,这仍然是计算机导航中一个具有挑战性的问题。在这项工作中,我们提出了一个实时环境勘探的新方法,它的唯一要求是预培训所需的视觉相似数据集,现场有足够的照明,以及上机前视RGB环境感测的摄影机。与现有方法相反,我们的方法只需要一个外观(图像)就可以做出一个好的战术决定,从而在一个不增长的固定时间里工作。两种方向预测,以像素为特征,将Goto 和 Lookat 的深度像素作为我们方法的核心。这些像素将推荐的飞行指示编码如下:Goto pixel 定义一个视觉相近的相近的数据集移动方向,而Lookat pixel 则定义摄像头在下一步骤中指向的方向。这些飞行透视像素是最佳的比值, 以显示目前未曝光的区域的最大数量。我们的方法展示了一种新型的远距调速度, 显示一种智能的导航方法, 显示一种我们用来测量方法, 当我们进行更精确的计算的时候, 能够找到一种速度的路径。