开发者在开放源码软件方面专门知识的代表 (Representation of Developer Expertise in Open Source Software)

Background: Accurate representation of developer expertise has always been an important research problem. While a number of studies proposed novel methods of representing expertise within individual projects, these methods are difficult to apply at an ecosystem level. However, with the focus of software development shifting from monolithic to modular, a method of representing developers' expertise in the context of the entire OSS development becomes necessary when, for example, a project tries to find new maintainers and look for developers with relevant skills. Aim: We aim to address this knowledge gap by proposing and constructing the Skill Space where each API, developer, and project is represented and postulate how the topology of this space should reflect what developers know (and projects need). Method: we use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers and, based on that data, employ Doc2Vec embeddings for vector representations of APIs, developers, and projects. We then evaluate if these embeddings reflect the postulated topology of the Skill Space by predicting what new APIs/projects developers use/join, and whether or not their pull requests get accepted. We also check how the developers' representations in the Skill Space align with their self-reported API expertise. Result: Our results suggest that the proposed embeddings in the Skill Space appear to satisfy the postulated topology and we hope that such representations may aid in the construction of signals that increase trust (and efficiency) of open source ecosystems at large and may aid investigations of other phenomena related to developer proficiency and learning.

翻译：开发者专门知识的准确代表性始终是一个重要的研究问题。虽然一些研究提出了代表单个项目内专门知识的新颖方法,但这些方法很难在生态系统一级应用。然而,随着软件开发的重点从单一的转向模块化,在软件开发的整个软件开发过程中代表开发者专门知识的方法变得十分必要,例如,当一个项目试图寻找新的维护者并寻找具有相关技能的开发者时。目标:我们的目标是通过提出和建造Skill Skill Space来解决这一知识差距,其中每个API、开发者、项目都有代表的Skill Skill Spair Spair Splay Spaility,并以此来说明该空间的表层应如何反映开发者所了解的(以及项目需要)。方法:我们利用代码世界基础设施来提取由开放源开发者修改的文档中的全部API专门知识,并且根据这些数据,利用Doc2Vec 嵌入软件来显示API、开发者和项目的矢量。然后,我们通过预测新的API/项目开发者使用哪些新的空间数据/join,来描述该空间的庞大的表层学如何反映他们的空间数据,我们所提议和对Akialmalial的图像进行核查的结果如何被接受。

相关内容

AIM

关注 655

医学人工智能AIM（Artificial Intelligence in Medicine）杂志发表了多学科领域的原创文章，涉及医学中的人工智能理论和实践，以医学为导向的人类生物学和卫生保健。医学中的人工智能可以被描述为与研究、项目和应用相关的科学学科，旨在通过基于知识或数据密集型的计算机解决方案支持基于决策的医疗任务，最终支持和改善人类护理提供者的性能。官网地址：http://dblp.uni-trier.de/db/journals/artmed/

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【CMU-Google-斯坦福】可控行为的弱监督强化学习，Weakly-Supervised RL

专知会员服务

22+阅读 · 2020年4月8日