Software engineers develop, fine-tune, and deploy deep learning (DL) models. They use and re-use models in a variety of development frameworks and deploy them on a range of runtime environments. In this diverse ecosystem, engineers use DL model converters to move models from frameworks to runtime environments. However, errors in converters can compromise model quality and disrupt deployment. The failure frequency and failure modes of DL model converters are unknown. In this paper, we conduct the first failure analysis on DL model converters. Specifically, we characterize failures in model converters associated with ONNX (Open Neural Network eXchange). We analyze past failures in the ONNX converters in two major DL frameworks, PyTorch and TensorFlow. The symptoms, causes, and locations of failures (for N=200 issues), and trends over time are also reported. We also evaluate present-day failures by converting 8,797 models, both real-world and synthetically generated instances. The consistent result from both parts of the study is that DL model converters commonly fail by producing models that exhibit incorrect behavior: 33% of past failures and 8% of converted models fell into this category. Our results motivate future research on making DL software simpler to maintain, extend, and validate.
翻译:软件工程师开发、微调和部署深度学习(DL)模型。他们在各种开发框架中使用和重复使用模型,并在各种运行时环境中部署它们。在这个多样化的生态系统中,工程师使用DL模型转换器将模型从框架移动到运行时环境。然而,转换器中的错误可能损害模型质量并破坏部署。DL模型转换器的故障频率和故障模式尚不清楚。在本文中,我们进行了DL模型转换器的首次故障分析。具体而言,我们表征了与ONNX(开放神经网络交换)相关的模型转换器中的故障。我们分析了过去两个主要DL框架PyTorch和TensorFlow中ONNX转换器的故障。还报告了故障(N = 200问题)的症状、原因和位置以及随时间的趋势。我们还通过转换8,797个模型(包括真实世界和合成生成的实例)来评估当下的故障。两部分研究的一致结果是,DL模型转换器通常通过生成表现不正确的模型而失败:33%的过去故障和8%的转换模型属于此类别。我们的结果激励未来在使DL软件更简单、更容易维护、扩展和验证方面开展研究。