Deep learning (DL) techniques are proven effective in many challenging tasks, and become widely-adopted in practice. However, previous work has shown that DL libraries, the basis of building and executing DL models, contain bugs and can cause severe consequences. Unfortunately, existing testing approaches still cannot comprehensively exercise DL libraries. They utilize existing trained models and only detect bugs in model inference phase. In this work we propose Muffin to address these issues. To this end, Muffin applies a specifically-designed model fuzzing approach, which allows it to generate diverse DL models to explore the target library, instead of relying only on existing trained models. Muffin makes differential testing feasible in the model training phase by tailoring a set of metrics to measure the inconsistencies between different DL libraries. In this way, Muffin can best exercise the library code to detect more bugs. To evaluate the effectiveness of Muffin, we conduct experiments on three widely-used DL libraries. The results demonstrate that Muffin can detect 39 new bugs in the latest release versions of popular DL libraries, including Tensorflow, CNTK, and Theano.
翻译:深入学习(DL)技术在很多具有挑战性的任务中被证明是有效的,并在实践中被广泛采用。然而,先前的工作表明,DL图书馆是建立和实施DL模型的基础,含有错误,并可能造成严重后果。不幸的是,现有的测试方法仍然无法全面使用DL图书馆。它们利用现有的经过培训的模式,只是在模型推断阶段检测了错误。我们建议Muffin在这项工作中解决这些问题。为此,Muffin采用专门设计的模型模糊方法,使它能够生成多种DL模型来探索目标图书馆,而不是仅仅依靠现有的经过培训的模式。Muffin通过设计一套衡量不同DL图书馆之间不一致的衡量标准,在模型培训阶段进行差别测试是可行的。通过这种方式,Muffin可以最好地运用图书馆代码来检测更多的错误。为了评估Muffin的效力,我们建议对三个广泛使用的DL图书馆进行实验。结果表明,Muffin可以在流行的DL图书馆的最新发行版本中探测39个新的错误,包括Tensorflow、CNTK和Theano。