Software-related platforms have enabled their users to collaboratively label software entities with topics. Tagging software repositories with relevant topics can be exploited for facilitating various downstream tasks. For instance, a correct and complete set of topics assigned to a repository can increase its visibility. Consequently, this improves the outcome of tasks such as browsing, searching, navigation, and organization of repositories. Unfortunately, assigned topics are usually highly noisy, and some repositories do not have well-assigned topics. Thus, there have been efforts on recommending topics for software projects, however, the semantic relationships among these topics have not been exploited so far. We propose two recommender models for tagging software projects that incorporate the semantic relationship among topics. Our approach has two main phases; (1) we first take a collaborative approach to curate a dataset of quality topics specifically for the domain of software engineering and development. We also enrich this data with the semantic relationships among these topics and encapsulate them in a knowledge graph we call SED-KGraph. Then, (2) we build two recommender systems; The first one operates only based on the list of original topics assigned to a repository and the relationships specified in our knowledge graph. The second predictive model, however, assumes there are no topics available for a repository, hence it proceeds to predict the relevant topics based on both textual information of a software project and SED-KGraph. We built SED-KGraph in a crowd-sourced project with 170 contributors from both academia and industry. The experiment results indicate that our solutions outperform baselines that neglect the semantic relationships among topics by at least 25% and 23% in terms of ASR and MAP metrics.
翻译:与软件有关的平台使用户能够合作给软件实体贴有主题的标签。 将相关主题的软件库贴上标签可以用来促进各种下游任务。 例如, 将正确和完整的一组专题分配给存储库可以提高其能见度。 因此, 这会改进浏览、 搜索、 导航和储存库组织等任务的结果。 不幸的是, 分配的专题通常非常吵闹, 有些储存库没有很好地指定主题。 因此, 一直努力建议软件项目的专题。 但是, 这些专题之间的语义关系至今尚未被利用。 我们提出了两个建议模型, 用于标记包含各专题间语义关系的软件项目项目项目。 我们的方法有两个主要阶段; (1) 我们首先采取协作办法, 整理专门用于软件工程工程工程工程和开发领域的高质量主题数据集。 我们还将这些数据与这些专题之间的语义关系加以丰富, 并用我们称之为 SED-K Graph 的知识图表。 然后, 我们建立了两个建议系统; 我们第一个仅根据分配给存储库的原始主题列表列表列表列表列表列表列表中的第二个模型模型模型运行。 我们的Slodialalalalalma 将Slaft 放在了我们软件的日历上的系统。