Social network research has focused on hyperlink graphs, bibliographic citations, friend/follow patterns, influence spread, etc. Large software repositories also form a highly valuable networked artifact, usually in the form of a collection of packages, their developers, dependencies among them, and bug reports. This "social network of code" is rarely studied by social network researchers. We introduce two new problems in this setting. These problems are well-motivated in the software engineering community but not closely studied by social network scientists. The first is to identify packages that are most likely to be troubled by bugs in the immediate future, thereby demanding the greatest attention. The second is to recommend developers to packages for the next development cycle. Simple autoregression can be applied to historical data for both problems, but we propose a novel method to integrate network-derived features and demonstrate that our method brings additional benefits. Apart from formalizing these problems and proposing new baseline approaches, we prepare and contribute a substantial dataset connecting multiple attributes built from the long-term history of 20 releases of Ubuntu, growing to over 25,000 packages with their dependency links, maintained by over 3,800 developers, with over 280k bug reports.
翻译:社会网络研究侧重于超链接图、书目引用、朋友/跟踪模式、影响扩散等等。大型软件库也形成高度宝贵的网络文物,通常以包集、开发者、他们之间的依赖性和错误报告的形式形成。这种“社会代码网络”很少由社交网络研究人员研究。我们在这个背景下引入了两个新问题。这些问题在软件工程界具有很好的动机,但社会网络科学家没有仔细研究。首先,确定最有可能在近期受到错误困扰的软件包,从而要求给予最大的关注。第二,建议开发者为下一个开发周期的软件包。简单自动回归可以适用于这两个问题的历史数据,但我们提出了一种新颖的方法,整合网络衍生的特征,并表明我们的方法带来额外的好处。除了将这些问题正规化并提出新的基线方法外,我们还准备和贡献一个庞大的数据集,连接从20个Ubuntu释放的长期历史中建立起来的多个属性,这些属性将增长到25 000多个,其依赖性联系由3 800多名开发者维持,有280多个错误报告。