Open Source Software (OSS) is forming the spines of technology infrastructures, attracting millions of talents to contribute. Notably, it is challenging and critical to consider both the developers' interests and the semantic features of the project code to recommend appropriate development tasks to OSS developers. In this paper, we formulate the novel problem of code recommendation, whose purpose is to predict the future contribution behaviors of developers given their interaction history, the semantic features of source code, and the hierarchical file structures of projects. Considering the complex interactions among multiple parties within the system, we propose CODER, a novel graph-based code recommendation framework for open source software developers. CODER jointly models microscopic user-code interactions and macroscopic user-project interactions via a heterogeneous graph and further bridges the two levels of information through aggregation on file-structure graphs that reflect the project hierarchy. Moreover, due to the lack of reliable benchmarks, we construct three large-scale datasets to facilitate future research in this direction. Extensive experiments show that our CODER framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation. We will release all the datasets, code, and utilities for data retrieval upon the acceptance of this work.
翻译:开放源码软件(OSS)正在形成技术基础设施的支柱,吸引数百万人才作出贡献。值得注意的是,考虑到开发商的利益和项目代码的语义特点,向开放源码软件开发商建议适当的开发任务,我们在此文件中提出代码建议的新问题,其目的是预测开发商根据互动历史、源码的语义特点和项目分级文件结构的未来贡献行为。考虑到系统内多个当事方之间的复杂互动,我们提议CODER,这是为开放源码开发商建立一个基于图表的新颖的代码建议框架。CODER联合模拟微观用户代码互动和宏观用户-项目互动,通过一个混合图进一步连接两个层次的信息。此外,由于缺乏可靠的基准,我们建造了三个大型数据集,以便利今后朝这个方向的研究。广泛的实验表明,我们的CODER框架在各种实验环境中,包括项目内部、跨项目和冷却启动的建议下取得了优异的性功能。我们将通过综合反映项目等级的文件结构图,进一步连接两个层次的信息。此外,由于缺少可靠的基准,我们将建立三个大型数据集,以便利今后朝这个方向的研究。广泛的实验表明,我们的CODERS框架在各种实验环境下,包括项目内部、跨项目、跨项目和冷源的建议,我们将公布所有数据代码的检索的所有数据设置,并公布。