Deep neural networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques do not support localizing DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box. To address these problems, we propose an approach that automatically determines whether the model is buggy or not, and identifies the root causes. Our key insight is that historic trends in values propagated between layers can be analyzed to identify faults, and localize faults. To that end, we first enable dynamic analysis of deep learning applications: by converting it into an imperative representation and alternatively using a callback mechanism. Both mechanisms allows us to insert probes that enable dynamic analysis over the traces produced by the DNN while it is being trained on the training data. We then conduct dynamic analysis over the traces to identify the faulty layer that causes the error. We propose an algorithm for identifying root causes by capturing any numerical error and monitoring the model during training and finding the relevance of every layer on the DNN outcome. We have collected a benchmark containing 40 buggy models and patches that contain real errors in deep learning applications from Stack Overflow and GitHub. Our benchmark can be used to evaluate automated debugging tools and repair techniques. We have evaluated our approach using this DNN bug-and-patch benchmark, and the results showed that our approach is much more effective than the existing debugging approach used in the state of the practice Keras library. For 34 out of 40 cases, our approach was able to detect faults whereas the best debugging approach provided by Keras detected 32 out of 40 faults. Our approach was able to localize 21 out of 40 bugs whereas Keras did not localize any faults.
翻译:深心神经网络( DNNS) 正在成为大多数软件系统的一个组成部分。 过去的工作显示, DNNS 有错误。 不幸的是, 现有的调试技术并不支持 DNN 错误本地化。 由于对模型行为缺乏理解, 整个 DNN 模型显示为黑盒。 为了解决这些问题, 我们建议了一种方法, 自动确定模型是否错误, 并找出根源。 我们的关键洞察力是, 可以分析各层之间传播的数值的历史趋势, 以识别错误, 并本地化错误。 为此, 我们首先能够对深学习应用程序进行动态分析: 将它转换为一个 Kerimal 代表, 并使用回调机制。 两个机制都允许我们插入探测器, 以便能够对 DNNNW 生成的轨迹进行动态分析。 然后, 我们对轨迹进行动态分析, 找出导致错误的层层层层层层。 我们用了40 本地的错误来识别模型, 并且通过在培训中发现每个层次的错误。 我们从40 级数据库中收集了40 错误, 我们用了40 的错误模型来测量了40 。 我们使用了40 的错误的错误, 我们用了40, 的错误去基底基, 我们用了40 测试的方法, 我们用了40 的错误的方法可以进行真正的错误来测量了40 。 我们用了40 。 我们用了40,, 我们用了40, 我们用了40 做了了40, 我们的错误 的错误来去错误, 我们用了40, 我们用了40 学习了40 做了很多的错误, 我们用了40, 我们用了40 的错误, 我们用了40, 的错误, 我们用了40, 的错误 的错误, 我们用的错误, 我们用了40 的错误, 我们用了40,, 我们用了40, 的错误,, 的错误 的错误 的错误 的错误 的错误, 做了了40, 的错误, 我们用了40,我们用了我们用了我们用了40, 我们用的错误,我们用的错误 的错误 的错误 做了真正的 的错误, 的