开源AI资源库用户都在问什么问题？GitHub问题的经验研究 (What Do Users Ask in Open-Source AI Repositories? An Empirical Study of GitHub Issues)

Artificial Intelligence systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these AI systems is inseparable from open-source software (OSS). This paper presents an empirical study that investigates the issues in the repositories of open-source AI repositories to assist developers in understanding problems during the process of employing AI systems. We collect 576 repositories from the PapersWithCode platform. Among these repositories, we find 24,953 issues by utilizing GitHub REST APIs. Our empirical study includes three phases. First, we manually analyze these issues to categorize the problems that developers are likely to encounter in open-source AI repositories. Specifically, we provide a taxonomy of 13 categories related to AI systems. The two most common issues are runtime errors (23.18%) and unclear instructions (19.53%). Second, we see that 67.5% of issues are closed. We also find that half of these issues resolve within four days. Moreover, issue management features, e.g., label and assign, are not widely adopted in open-source AI repositories. In particular, only 7.81% and 5.9% of repositories label issues and assign these issues to assignees, respectively. Finally, we empirically show that employing GitHub issue management features and writing issues with detailed descriptions facilitate the resolution of issues. Based on our findings, we make recommendations for developers to help better manage the issues of open-source AI repositories and improve their quality.

翻译：人工智能系统受益于大规模数据集的可用性和计算能力的提高，已成为解决各种重要任务（如自然语言理解、语音识别和图像处理）的有效解决方案。这些人工智能系统的进展与开源软件（OSS）是密不可分的。本文提出了一项经验研究，调查开源AI资源库中的问题，以帮助开发人员在使用AI系统过程中了解问题。我们从PapersWithCode平台收集了576个存储库。其中，通过利用GitHub REST API，我们发现了24,953个问题。我们的经验研究包括三个阶段。首先，我们手动分析这些问题，对开发人员在开源AI资源库中可能遇到的问题进行分类。具体而言，我们提供了一个涉及人工智能系统的13个类别的分类法。最常见的两个问题是运行时错误（23.18%）和不明确的说明（19.53%）。其次，我们看到67.5%的问题已关闭。我们还发现，其中一半的问题在四天内解决。此外，问题管理功能，例如标签和指定，未广泛采用开源人工智能存储库。特别是，只有7.81%和5.9%的存储库标记问题，并将这些问题分配给指定人，分别。最后，我们实证表明，采用GitHub问题管理特性并编写具有详细描述的问题有助于解决问题。基于我们的发现，我们对开发人员提出建议，以帮助更好地管理开源AI资源库的问题并提高其质量。

相关内容

关注 7021

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日