Accurate smartphone-based outdoor localization system in deep urban canyons are increasingly needed for various IoT applications such as augmented reality, intelligent transportation, etc. The recently developed feature-based visual positioning system (VPS) by Google detects edges from smartphone images to match with pre-surveyed edges in their map database. As smart cities develop, the building information modeling (BIM) becomes widely available, which provides an opportunity for a new semantic-based VPS. This article proposes a novel 3D city model and semantic-based VPS for accurate and robust pose estimation in urban canyons where global navigation satellite system (GNSS) tends to fail. In the offline stage, a material segmented city model is used to generate segmented images. In the online stage, an image is taken with a smartphone camera that provides textual information about the surrounding environment. The approach utilizes computer vision algorithms to rectify and hand segment between the different types of material identified in the smartphone image. A semantic-based VPS method is then proposed to match the segmented generated images with the segmented smartphone image. Each generated image holds a pose that contains the latitude, longitude, altitude, yaw, pitch, and roll. The candidate with the maximum likelihood is regarded as the precise pose of the user. The positioning results achieves 2.0m level accuracy in common high rise along street, 5.5m in foliage dense environment and 15.7m in alleyway. A 45% positioning improvement to current state-of-the-art method. The estimation of yaw achieves 2.3{\deg} level accuracy, 8 times the improvement to smartphone IMU.
翻译:谷歌最近开发的基于地貌特征的视觉定位系统(VPS)通过智能手机图像探测到智能手机图像的边缘,以匹配其地图数据库的边缘。随着智能城市的发展,建筑信息建模(BIM)将广为普及,为基于语义的新型VPS提供了一个机会。文章提议在城市峡谷中采用新型的 3D 城市模型和语义基VPS 进行精确和稳健的图像估算,因为全球导航卫星系统(GNSS)往往会失败。在离线阶段,使用材料的市级分割模型生成断层图像。在在线阶段,图像使用智能手机相机拍摄,提供关于周围环境的文字信息。该方法利用计算机视觉算法对智能手机图像中发现的不同类型的材料进行校正和手语系。基于语系的 VPS方法随后被提议将部分生成的图像与当前智能卫星系统(GNSS) 智能精确度的准确度水平匹配, 将图像的平面定位时间定位显示为每张的平方位。