Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suites of clients and libraries. DUETS is meant to support both static and dynamic analysis. This means that the libraries and the clients compile correctly, they are executable and their test suites pass. The dataset is composed of open-source projects that have more than five stars on GitHub. The final dataset contains 395 libraries and 2,874 clients. Additionally, we provide the raw data that we use to create this dataset, such as 34,560 pom.xml files or the complete file list from 34,560 projects. This dataset can be used to study how libraries are used by their clients or as a list of software projects that successfully build. The client's test suite can be used as an additional verification step for code transformation techniques that modify the libraries.
翻译:软件工程研究者寻找软件文物以研究其特性或评估新技术。 在本文中, 我们引入了DUETS, 这是软件库及其客户的新数据集。 这个数据集可以被利用以获得许多不同见解, 如 API 使用、 使用投入或对客户和图书馆测试套件的新观察。 DUETS 旨在支持静态和动态分析。 这意味着图书馆和客户正确编译, 它们是可以执行的, 测试套件通过。 数据集由 GitHub 上有五个以上恒星的开源项目组成。 最终数据集包含 395 个图书馆和 2 874 个客户。 此外, 我们提供了用于创建该数据集的原始数据, 例如 34 560 ppm. xml 文件或来自 34 560 项目的完整文件列表。 这个数据集可以用来研究图书馆如何被客户使用, 或者作为成功构建的软件项目列表。 客户测试套件可以用作修改图书馆的代码转换技术的额外核查步骤 。