深层FD:深层学习方案自动失责诊断和本地化 (DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs)

As Deep Learning (DL) systems are widely deployed for mission-critical applications, debugging such systems becomes essential. Most existing works identify and repair suspicious neurons on the trained Deep Neural Network (DNN), which, unfortunately, might be a detour. Specifically, several existing studies have reported that many unsatisfactory behaviors are actually originated from the faults residing in DL programs. Besides, locating faulty neurons is not actionable for developers, while locating the faulty statements in DL programs can provide developers with more useful information for debugging. Though a few recent studies were proposed to pinpoint the faulty statements in DL programs or the training settings (e.g. too large learning rate), they were mainly designed based on predefined rules, leading to many false alarms or false negatives, especially when the faults are beyond their capabilities. In view of these limitations, in this paper, we proposed DeepFD, a learning-based fault diagnosis and localization framework which maps the fault localization task to a learning problem. In particular, it infers the suspicious fault types via monitoring the runtime features extracted during DNN model training and then locates the diagnosed faults in DL programs. It overcomes the limitations by identifying the root causes of faults in DL programs instead of neurons and diagnosing the faults by a learning approach instead of a set of hard-coded rules. The evaluation exhibits the potential of DeepFD. It correctly diagnoses 52% faulty DL programs, compared with around half (27%) achieved by the best state-of-the-art works. Besides, for fault localization, DeepFD also outperforms the existing works, correctly locating 42% faulty programs, which almost doubles the best result (23%) achieved by the existing works.

翻译：深学习( DL) 系统被广泛用于任务关键应用程序, 调试这些系统就变得至关重要。大多数现有工作都发现并修复了经过训练的深神经网络( DNN) 中的可疑神经元, 不幸的是, 这可能是一个绕道。具体地说, 一些现有研究报告说, 许多不满意的行为实际上都源于 DL 程序中存在的缺陷。此外, 查找有缺陷的神经元对于开发者来说是不可操作的, 同时在 DL 程序中查找错误的语句可以为开发者提供更有用的调试信息。尽管最近建议进行一些研究, 以确定 DL 程序或培训设置( 例如, 过深的学习率太高) 中的错误语句或可疑神经神经元, 主要是根据预设的规则设计的, 导致许多错误的警报或错误, 特别是当错误超出他们的能力时。此外, 我们提议, DefD, 一个基于学习错误的诊断和本地错误的语义分析框架, 特别是, 通过监测 DNFD 模型的运行过程的运行特征, 比较了现有错误的半个程序, 而不是DL 原因。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日