Artificial Intelligence systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these AI systems is inseparable from open-source software (OSS). This paper presents an empirical study that investigates the issues in the repositories of open-source AI repositories to assist developers in understanding problems during the process of employing AI systems. We collect 576 repositories from the PapersWithCode platform. Among these repositories, we find 24,953 issues by utilizing GitHub REST APIs. Our empirical study includes three phases. First, we manually analyze these issues to categorize the problems that developers are likely to encounter in open-source AI repositories. Specifically, we provide a taxonomy of 13 categories related to AI systems. The two most common issues are runtime errors (23.18%) and unclear instructions (19.53%). Second, we see that 67.5% of issues are closed. We also find that half of these issues resolve within four days. Moreover, issue management features, e.g., label and assign, are not widely adopted in open-source AI repositories. In particular, only 7.81% and 5.9% of repositories label issues and assign these issues to assignees, respectively. Finally, we empirically show that employing GitHub issue management features and writing issues with detailed descriptions facilitate the resolution of issues. Based on our findings, we make recommendations for developers to help better manage the issues of open-source AI repositories and improve their quality.
翻译:人工智能系统受益于大规模数据集的可用性和计算能力的提高,已成为解决各种重要任务(如自然语言理解、语音识别和图像处理)的有效解决方案。这些人工智能系统的进展与开源软件(OSS)是密不可分的。本文提出了一项经验研究,调查开源AI资源库中的问题,以帮助开发人员在使用AI系统过程中了解问题。我们从PapersWithCode平台收集了576个存储库。其中,通过利用GitHub REST API,我们发现了24,953个问题。我们的经验研究包括三个阶段。首先,我们手动分析这些问题,对开发人员在开源AI资源库中可能遇到的问题进行分类。具体而言,我们提供了一个涉及人工智能系统的13个类别的分类法。最常见的两个问题是运行时错误(23.18%)和不明确的说明(19.53%)。其次,我们看到67.5%的问题已关闭。我们还发现,其中一半的问题在四天内解决。此外,问题管理功能,例如标签和指定,未广泛采用开源人工智能存储库。特别是,只有7.81%和5.9%的存储库标记问题,并将这些问题分配给指定人,分别。最后,我们实证表明,采用GitHub问题管理特性并编写具有详细描述的问题有助于解决问题。基于我们的发现,我们对开发人员提出建议,以帮助更好地管理开源AI资源库的问题并提高其质量。