Accurate assessment of the domain expertise of developers is important for assigning the proper candidate to contribute to a project or to attend a job role. Since the potential candidate can come from a large pool, the automated assessment of this domain expertise is a desirable goal. While previous methods have had some success within a single software project, the assessment of a developer's domain expertise from contributions across multiple projects is more challenging. In this paper, we employ doc2vec to represent the domain expertise of developers as embedding vectors. These vectors are derived from different sources that contain evidence of developers' expertise, such as the description of repositories that they contributed, their issue resolving history, and API calls in their commits. We name it dev2vec and demonstrate its effectiveness in representing the technical specialization of developers. Our results indicate that encoding the expertise of developers in an embedding vector outperforms state-of-the-art methods and improves the F1-score up to 21%. Moreover, our findings suggest that ``issue resolving history'' of developers is the most informative source of information to represent the domain expertise of developers in embedding spaces.
翻译:对开发者的领域专长进行准确评估,对于指派合适的候选人为一个项目作出贡献或参加工作非常重要。由于潜在候选人可以来自一个庞大的人才库,因此自动评估这个领域专长是一个可取的目标。虽然以前的方法在一个软件项目中取得了一定的成功,但从多个项目的贡献中评估开发者的领域专长比较困难。在本文件中,我们使用 doc2vec 来代表开发者的域专长作为嵌入矢量。这些矢量来自包含开发者专长证据的不同来源,例如他们贡献的库的描述,他们的问题解决历史,以及API的呼唤等。我们命名Dev2vec并展示其在代表开发者技术专业化方面的有效性。我们的结果表明,将开发者在嵌入矢量超过最新技术的方法方面的专长加以编码,并将F1核心提高到21%。此外,我们的研究结果表明,“解决开发者历史的问题”是代表嵌入空间的开发者的域专长的最信息源。