Software development is becoming increasingly open and collaborative with the advent of platforms such as GitHub. Given its crucial role, there is a need to better understand and model the dynamics of GitHub as a social platform. Previous work has mostly considered the dynamics of traditional social networking sites like Twitter and Facebook. We propose GitEvolve, a system to predict the evolution of GitHub repositories and the different ways by which users interact with them. To this end, we develop an end-to-end multi-task sequential deep neural network that given some seed events, simultaneously predicts which user-group is next going to interact with a given repository, what the type of the interaction is, and when it happens. To facilitate learning, we use graph based representation learning to encode relationship between repositories. We map users to groups by modelling common interests to better predict popularity and to generalize to unseen users during inference. We introduce an artificial event type to better model varying levels of activity of repositories in the dataset. The proposed multi-task architecture is generic and can be extended to model information diffusion in other social networks. In a series of experiments, we demonstrate the effectiveness of the proposed model, using multiple metrics and baselines. Qualitative analysis of the model's ability to predict popularity and forecast trends proves its applicability.
翻译:GitHub 等平台的出现使软件开发变得日益开放和协作。 鉴于其关键作用, 有必要更好地了解和模拟GitHub作为社交平台的动态。 以前的工作主要考虑了Twitter和Facebook等传统社交网络网站的动态。 我们提议GitEvolve, 这个系统可以预测GitHub 库的演变以及用户与它们互动的不同方式。 为此, 我们开发了一个终端到终端的多任务相继的深层神经网络, 以某些种子事件为特点, 同时预测哪个用户群体将下一个与某个特定存储库互动, 互动的类型是什么, 以及何时发生这种互动。 为了便利学习, 我们使用基于图表的演示学习来编码存储库之间的关系。 我们绘制用户的地图, 通过模拟共同利益, 更好地预测受欢迎程度, 并在推断过程中向看不见的用户推广。 我们引入了一种人工事件类型, 以更好地模拟数据集中储存库的不同活动水平。 拟议的多任务结构是通用的, 可以扩展到其他社交网络的信息传播模式。 在一系列实验中, 我们展示了以图表为基础的模型和预测能力。