Deep-learning (DL) compilers such as TVM and TensorRT are increasingly used to optimize deep neural network (DNN) models to meet performance, resource utilization and other requirements. Bugs in these compilers can produce optimized models whose semantics differ from the original models, and produce incorrect results impacting the correctness of down stream applications. However, finding bugs in these compilers is challenging due to their complexity. In this work, we propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach uses (i) light-weight operator specifications to generate diverse yet valid DNN models allowing us to exercise a large part of the compiler's transformation logic; (ii) a gradient-based search process for finding model inputs that avoid any floating-point exceptional values during model execution, reducing the chance of missed bugs or false alarms; and (iii) differential testing to identify bugs. We implemented this approach in NNSmith which has found 65 new bugs in the last seven months for TVM, TensorRT, ONNXRuntime, and PyTorch. Of these 52 have been confirmed and 44 have been fixed by project maintainers.
翻译:TVM 和 TensorRT 等深学习( DL) 编译者越来越多地利用TVM 和 TensorRT 等新的模糊测试方法优化深神经网络模型,以满足性能、资源利用和其他要求。这些编译者的错误可以产生最优化模型,其语义与原始模型不同,并产生不正确影响下流应用的正确性的结果。然而,由于这些编译者发现错误的复杂性,因此发现错误是困难的。在这项工作中,我们提出了在深学习的编译者中发现错误的新的模糊测试方法。我们的核心方法使用了(一) 轻量操作者规格,以生成多种但有效的 DNNN 模型,从而使我们能够运用编译者转换逻辑的很大一部分内容;(二) 以梯度为基础的搜索过程,以寻找模型输入,避免模型执行过程中出现任何浮点异常值,从而减少错误或错误警报的机会;以及(三) 不同的测试,以辨别错误。我们在NNSmith 中采用了这一方法,在过去七个月中发现了65个新的错误,TVM、 TensorRT、 ONXRunti和PyTorch 已经固定地维持了这52个项目。