We present an approach for estimating a mobile robot's pose w.r.t. the allocentric coordinates of a network of static cameras using multi-view RGB images. The images are processed online, locally on smart edge sensors by deep neural networks to detect the robot and estimate 2D keypoints defined at distinctive positions of the 3D robot model. Robot keypoint detections are synchronized and fused on a central backend, where the robot's pose is estimated via multi-view minimization of reprojection errors. Through the pose estimation from external cameras, the robot's localization can be initialized in an allocentric map from a completely unknown state (kidnapped robot problem) and robustly tracked over time. We conduct a series of experiments evaluating the accuracy and robustness of the camera-based pose estimation compared to the robot's internal navigation stack, showing that our camera-based method achieves pose errors below 3 cm and 1{\deg} and does not drift over time, as the robot is localized allocentrically. With the robot's pose precisely estimated, its observations can be fused into the allocentric scene model. We show a real-world application, where observations from mobile robot and static smart edge sensors are fused to collaboratively build a 3D semantic map of a $\sim$240 m$^2$ indoor environment.
翻译:我们提出了一个方法来估计一个移动机器人的姿势 w.r.t. 。 一个使用多视图 RGB 图像的静态相机网络的全方位坐标。 图像是在线处理的, 由深神经网络在智能边缘传感器上进行本地处理, 以探测机器人, 并估计在3D 机器人模型不同位置上定义的 2D 关键点。 机器人关键点的探测是同步的, 并连接在中央后端, 机器人的姿势是通过多角度最小化再投射错误来估计的。 通过外部相机的外观估计, 机器人的本地化可以从完全未知的状态( 被劫持的机器人问题) 开始, 并随时进行强力跟踪。 我们进行了一系列实验, 评估基于相机的姿势估计的准确性和稳健性, 与机器人的内部导航堆积相比, 显示我们基于相机的方法在3厘米和1×deg} 上出现错误, 并且不会随时间而漂移, 因为机器人是本地的全方摄像头。 由于机器人的姿势, 精确估计, 它的观察力可以与全方值为$美元的硬基值 252 智能空间的图像模型模型模型的模型的模型应用。</s>