Reliability and safety are critical in autonomous machine services, such as autonomous vehicles and aerial drones. In this paper, we first present an open-source Micro Aerial Vehicles (MAVs) reliability analysis framework, MAVFI, to characterize transient fault's impacts on the end-to-end flight metrics, e.g., flight time, success rate. Based on our framework, it is observed that the end-to-end fault tolerance analysis is essential for characterizing system reliability. We demonstrate the planning and control stages are more vulnerable to transient faults than the visual perception stage in the common "Perception-Planning-Control (PPC)" compute pipeline. Furthermore, to improve the reliability of the MAV system, we propose two low overhead anomaly-based transient fault detection and recovery schemes based on Gaussian statistical models and autoencoder neural networks. We validate our anomaly fault protection schemes with a variety of simulated photo-realistic environments on both Intel i9 CPU and ARM Cortex-A57 on Nvidia TX2 platform. It is demonstrated that the autoencoder-based scheme can improve the system reliability by 100% recovering failure cases with less than 0.0062% computational overhead in best-case scenarios. In addition, MAVFI framework can be used for other ROS-based cyber-physical applications and is open-sourced at https://github.com/harvard-edge/MAVBench/tree/mavfi
翻译:自动飞行器和空中无人驾驶飞机等自动机机服务中可靠性和安全性至关重要。 在本文中,我们首先展示了一个开放源码微空中飞行器(MAV)可靠性分析框架(MAVFI),以说明瞬时断层对端至端飞行测量标准的影响,例如飞行时间、成功率。根据我们的框架,我们发现端到端的断层容忍度分析对于系统可靠性的定性至关重要。我们显示,规划和控制阶段比通用“Pervition-Planning-Control(PPC)”的视觉认知阶段更容易受到瞬时断层的影响。此外,为了提高MAV系统可靠性,我们提出了基于高斯统计模型和自动电解码神经网络的两套低高端异常瞬时断层检测和恢复计划。我们在Intel i9 CPU和ARM Cortex-A57平台上模拟的摄影现实环境,比Nvidia TX2平台上的视觉认知阶段(PPC)的视觉阶段更易受到瞬间断层- PHAVDFI /ROFAFI 系统使用100%的自动计算系统可以改进其他的自动回收系统。