The growing availability of open source projects has facilitated developers to reuse existing software artifacts and leverage them to develop new software. However, it is hard to understand the notion of similarity as it varies from developer to developer. Some developers might search for repositories with similar source code, while some might be in search of repositories with similar requirements or issues. Existing approaches tend to find similar projects by comparing similar artifacts such as source-code to source-code, API usage to API usage, documentation to documentation, and so on. Even though there is a dissimilarity between two similar artifacts, there could be a similarity between two dissimilar artifacts. Hence, in this paper, we aim to answer the question - Can we find similarity of software repositories through dissimilar artifacts?. To this end, we conduct an experiment to find similarities between three repositories, two similar and one different project comparing similar and dissimilar artifacts (documentation, commits, and source-code). We observed similarities between dissimilar artifacts such as Commits, Source Code, and Readme Files in the context of both similar and different repositories.
翻译:开放源码项目不断增多,有利于开发者重新使用现有软件文物,并利用其开发新软件。然而,很难理解相似性的概念,因为开发者与开发者之间不同。有些开发者可能会寻找类似源代码的储存库,而有些开发者可能正在寻找类似源代码的储存库,而有些可能正在寻找类似要求或问题的储存库。现有的方法往往会通过比较源代码和源代码等类似文物,如源代码,API的使用与 API 的使用,文档的使用等等,找到类似的项目。尽管两种相似的文物之间存在差异,但两种不同的文物之间也可能存在相似性。因此,在本文件中,我们的目标是回答问题:我们能否通过不同的东西找到软件储存库的类似性?为此目的,我们进行一项实验,在三个储存库之间找到相似性,两个类似和不同的项目,比较相似性和不同的东西(文件、承诺和源代码),我们观察到类似和不同储存库中不同的东西,例如承诺、源代码和Rede文件的相似性。