Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach. While deep neural networks have shown impressive results in computer vision, most of the previous obstacle detection works only leverage traditional stereo matching techniques to meet the computational constraints for real-time feedback. This paper proposes a computationally efficient method that leverages a deep neural network to detect occupancy from stereo images directly. Instead of learning the point cloud correspondence from the stereo data, our approach extracts the compact obstacle distribution based on volumetric representations. In addition, we prune the computation of safety irrelevant spaces in a coarse-to-fine manner based on octrees generated by the decoder. As a result, we achieve real-time performance on the onboard computer (NVIDIA Jetson TX2). Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model. Furthermore, we validate our method's robustness and real-world feasibility through autonomous navigation experiments with a real robot. Hence, our work contributes toward closing the gap between the stereo-based system in robot perception and state-of-the-art stereo models in computer vision. To counter the scarcity of high-quality real-world indoor stereo datasets, we collect a 1.36 hours stereo dataset with a Jackal robot which is used to fine-tune our model. The dataset, the code, and more visualizations are available at https://lhy.xyz/stereovoxelnet/
翻译:在机器人导航中,立体相匹配是一种流行的视觉方法。虽然深神经网络在计算机视觉中显示了令人印象深刻的结果,但以往的大多数障碍检测工作只能利用传统的立体相匹配技术来满足实时反馈的计算限制。本文建议采用一种计算高效的方法,利用深神经网络直接从立体图像中检测占用情况。我们的方法不是从立体数据中学习点云通信,而是根据体积表示法,提取压缩障碍分布。此外,我们还以离心机生成的离心机的离心机计算安全不相干空间。结果,我们在计算机上实现了实时的立体匹配技术(NIDIA Jetson TX2) 。我们的方法在32米范围内准确检测障碍,从立体图像中获取更好的IoU(跨联盟的内科)和CD(Chamfer距离)分数,而根据量体积模型的计算成本只有2%。此外,我们还验证了我们的方法的稳健性和现实世界可行性,通过自主导航实验,在真实的轨道上,我们用了一个真实的立体-立体标准数据库数据库数据,我们用了一个更精确的立体空基数据向一个高的轨道数据。