Blind and visually challenged face multiple issues with navigating the world independently. Some of these challenges include finding the shortest path to a destination and detecting obstacles from a distance. To tackle this issue, this paper proposes ViT Cane, which leverages a vision transformer model in order to detect obstacles in real-time. Our entire system consists of a Pi Camera Module v2, Raspberry Pi 4B with 8GB Ram and 4 motors. Based on tactile input using the 4 motors, the obstacle detection model is highly efficient in helping visually impaired navigate unknown terrain and is designed to be easily reproduced. The paper discusses the utility of a Visual Transformer model in comparison to other CNN based models for this specific application. Through rigorous testing, the proposed obstacle detection model has achieved higher performance on the Common Object in Context (COCO) data set than its CNN counterpart. Comprehensive field tests were conducted to verify the effectiveness of our system for holistic indoor understanding and obstacle avoidance.
翻译:盲目和视觉挑战者在独立航行世界时面临多重问题。 其中一些挑战包括寻找通往目的地的最短路线和从远处探测障碍。 为解决这一问题,本文件提议ViT Cane, 利用视觉变压器模型来实时发现障碍。 我们的整个系统由Picame 模版 v2、 Raspberry Pi 4B 和 8GB Ram 和 4 个马达组成。 根据使用4个马达的触觉输入, 障碍探测模型对帮助视力受损者穿越未知地形非常有效,并且设计得容易复制。 本文讨论了视觉变压器模型与其他CNN的这一具体应用模型相比的效用。 通过严格的测试, 拟议的障碍探测模型在环境通用物体数据集上取得了比CNN的功能更高的绩效。 进行了全面的实地测试,以核查我们系统在室内整体理解和避免障碍方面的有效性。