As Deep Learning (DL) systems are widely deployed for mission-critical applications, debugging such systems becomes essential. Most existing works identify and repair suspicious neurons on the trained Deep Neural Network (DNN), which, unfortunately, might be a detour. Specifically, several existing studies have reported that many unsatisfactory behaviors are actually originated from the faults residing in DL programs. Besides, locating faulty neurons is not actionable for developers, while locating the faulty statements in DL programs can provide developers with more useful information for debugging. Though a few recent studies were proposed to pinpoint the faulty statements in DL programs or the training settings (e.g. too large learning rate), they were mainly designed based on predefined rules, leading to many false alarms or false negatives, especially when the faults are beyond their capabilities. In view of these limitations, in this paper, we proposed DeepFD, a learning-based fault diagnosis and localization framework which maps the fault localization task to a learning problem. In particular, it infers the suspicious fault types via monitoring the runtime features extracted during DNN model training and then locates the diagnosed faults in DL programs. It overcomes the limitations by identifying the root causes of faults in DL programs instead of neurons and diagnosing the faults by a learning approach instead of a set of hard-coded rules. The evaluation exhibits the potential of DeepFD. It correctly diagnoses 52% faulty DL programs, compared with around half (27%) achieved by the best state-of-the-art works. Besides, for fault localization, DeepFD also outperforms the existing works, correctly locating 42% faulty programs, which almost doubles the best result (23%) achieved by the existing works.
翻译:深学习( DL) 系统被广泛用于任务关键应用程序, 调试这些系统就变得至关重要。 大多数现有工作都发现并修复了经过训练的深神经网络( DNN) 中的可疑神经元, 不幸的是, 这可能是一个绕道。 具体地说, 一些现有研究报告说, 许多不满意的行为实际上都源于 DL 程序中存在的缺陷。 此外, 查找有缺陷的神经元对于开发者来说是不可操作的, 同时在 DL 程序中查找错误的语句可以为开发者提供更有用的调试信息。 尽管最近建议进行一些研究, 以确定 DL 程序或培训设置( 例如, 过深的学习率太高) 中的错误语句或可疑神经神经元, 主要是根据预设的规则设计的, 导致许多错误的警报或错误, 特别是当错误超出他们的能力时。 此外, 我们提议, DefD, 一个基于学习错误的诊断和本地错误的语义分析框架, 特别是, 通过监测 DNFD 模型的运行过程的运行特征, 比较了现有错误的半个程序, 而不是DL 原因。