Background: Accurate representation of developer expertise has always been an important research problem. While a number of studies proposed novel methods of representing expertise within individual projects, these methods are difficult to apply at an ecosystem level. However, with the focus of software development shifting from monolithic to modular, a method of representing developers' expertise in the context of the entire OSS development becomes necessary when, for example, a project tries to find new maintainers and look for developers with relevant skills. Aim: We aim to address this knowledge gap by proposing and constructing the Skill Space where each API, developer, and project is represented and postulate how the topology of this space should reflect what developers know (and projects need). Method: we use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers and, based on that data, employ Doc2Vec embeddings for vector representations of APIs, developers, and projects. We then evaluate if these embeddings reflect the postulated topology of the Skill Space by predicting what new APIs/projects developers use/join, and whether or not their pull requests get accepted. We also check how the developers' representations in the Skill Space align with their self-reported API expertise. Result: Our results suggest that the proposed embeddings in the Skill Space appear to satisfy the postulated topology and we hope that such representations may aid in the construction of signals that increase trust (and efficiency) of open source ecosystems at large and may aid investigations of other phenomena related to developer proficiency and learning.
翻译:开发者专门知识的准确代表性始终是一个重要的研究问题。虽然一些研究提出了代表单个项目内专门知识的新颖方法,但这些方法很难在生态系统一级应用。然而,随着软件开发的重点从单一的转向模块化,在软件开发的整个软件开发过程中代表开发者专门知识的方法变得十分必要,例如,当一个项目试图寻找新的维护者并寻找具有相关技能的开发者时。 目标:我们的目标是通过提出和建造Skill Skill Space来解决这一知识差距,其中每个API、开发者、项目都有代表的Skill Skill Spair Spair Splay Spaility,并以此来说明该空间的表层应如何反映开发者所了解的(以及项目需要)。 方法:我们利用代码世界基础设施来提取由开放源开发者修改的文档中的全部API专门知识,并且根据这些数据,利用Doc2Vec 嵌入软件来显示API、开发者和项目的矢量。然后,我们通过预测新的API/项目开发者使用哪些新的空间数据/join,来描述该空间的庞大的表层学如何反映他们的空间数据,我们所提议和对Akialmalial的图像进行核查的结果如何被接受。