The tremendous success of Deep Learning (DL) has significantly boosted the number of open-sourced DL frameworks hosted on GitHub. Among others, performance and accuracy bugs are critical factors that affect the reputation of these DL frameworks, therefore understanding the practice of discovering and investigating them for DL is important. In this paper, we conduct an exploratory study on the nature of reporting performance and accuracy bugs bugs for DL frameworks, aiming to improve our knowledge on this topic. Our study covers 10 most popular open-sourced DL frameworks on GitHub (e.g., TensorFlow, Keras, and PyTorch), based on which we sample 664 representative performance and accuracy bugs bug reports out of a total population of 22,522. Through systematic analysis of these samples, our key findings are: (1) low speed is the primary reason that a performance bug related report is submitted but we see no consistent pattern for accuracy related ones; (2) most of the reports are about issues encountered in the training stage; (3) only a small proportion of the reports provide insufficient information to investigate; (4) the majority of the performance and accuracy bugs bug reports (from 69% to 100%) are not related to the actual bug or regarded as unclassified; (5) around 50% of the performance and accuracy bug reports, which indeed reveal bugs, are not resolved by direct patches. Deriving from the above, we discuss a set of actionable implications to the researchers, maintainers, and report submitters on this subject. To promote open science, the labeled dataset has been made publicly available at https://tinyurl.com/4x3tap9w.
翻译:Deep Learning (DL) 的巨大成功极大地提升了 GitHub 上托管的开放源码 DL 框架的数量。 除其他外, 性能和准确性错误是影响这些 DL 框架声誉的关键因素, 因此了解为 DL 发现和调查它们的做法非常重要 。 在本文件中, 我们对DL 框架报告性能和准确性错误的性质进行了探索性研究, 目的是提高我们对这个主题的知识。 我们的研究覆盖了 GitHub 上最受欢迎的10个开放源码 DL 框架( 如 TensorFlow、 Keras 和 PyTorch ) 。 基于这些关键因素, 我们抽样调查了这些DL 框架的664 代表性业绩和准确性错误报告, 从而影响了 DL 5 。 通过对这些样本的系统分析, 我们的主要结论是:(1) 低速度是提交与业绩错误相关的报告,但我们看不到准确性能相关的模式; (2) 大部分报告是在培训阶段遇到的问题; (3) 报告中只有一小部分没有提供足够的信息来进行调查; (4) 多数是提交业绩和准确性报告, 而实际报告是在50 % 的准确性报告。