Learning-based navigation systems are widely used in autonomous applications, such as robotics, unmanned vehicles and drones. Specialized hardware accelerators have been proposed for high-performance and energy-efficiency for such navigational tasks. However, transient and permanent faults are increasing in hardware systems and can catastrophically violate tasks safety. Meanwhile, traditional redundancy-based protection methods are challenging to deploy on resource-constrained edge applications. In this paper, we experimentally evaluate the resilience of navigation systems with respect to algorithms, fault models and data types from both RL training and inference. We further propose two efficient fault mitigation techniques that achieve 2x success rate and 39% quality-of-flight improvement in learning-based navigation systems.
翻译:以学习为基础的导航系统被广泛用于自主应用,如机器人、无人驾驶飞行器和无人驾驶飞机等; 已经为此类导航任务提出了高性能和能源效率的专门硬件加速器; 然而,硬件系统中的瞬时和永久性故障正在增加,可能灾难性地破坏任务安全; 同时,传统的基于冗余的保护方法对在资源限制的边缘应用上部署具有挑战性; 在本文件中,我们实验性地评价导航系统在算法、故障模型和RL培训和推断数据类型方面的复原力; 我们还提出了两种高效的减少故障技术,在学习导航系统中实现2x成功率和39%的飞行质量改进。