Vision-based perception tasks fulfill a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatics maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high precision surgery. Most control-oriented and egocentric perception problems are commonly solved by taking advantage of the robot state estimation as an auxiliary input, particularly when artificial intelligence comes into the picture. In this work, we propose to apply a similar approach for the first time - to the best of our knowledge - to allocentric perception tasks, where the target variables refer to an external subject. We prove how our general and intuitive methodology improves the regression performance of deep convolutional neural networks (CNNs) with ambiguous problems such as the allocentric 3D pose estimation. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R2 metric up to +0.514 compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous pocket-sized UAV in the human pose estimation task. Our results show a significant reduction, i.e., 24% on average, on the mean absolute error of our stateful CNN.
翻译:基于愿景的认知任务在机器人中发挥着最重要的作用,有助于解决许多具有挑战性的情景,如自主无人驾驶飞行器和机器人辅助高精密外科手术等的机器人动作。多数以控制和自我为中心的认知问题通常通过利用机器人状态的估算作为辅助投入来解决,特别是当人工智能进入画面时。在这项工作中,我们提议首次对指向外部主题的目标变量提及目标变量的偏心视觉任务采用类似方法,即指向偏心的视觉任务。我们证明我们的一般和直观方法如何改进了具有偏心3D构成估计等模糊问题的深层神经神经网络的回归性能。通过分析三个高度不同的使用案例,从掌握机器人手臂到跟踪一个口袋大小的UAV的人类主题,我们的结果始终不断地将R2衡量标准提高到+0.514,而其数值则与它们的无国籍基线相比。最后,我们验证了一个封闭式自动口袋式UAVN网络的实地性运行情况,在人类态势估计24度的绝对误差上,我们的结果显示了一种显著的下降状态。