用于开放源码软件的不兼容性检测许可证 (LiDetector: License Incompatibility Detection for Open Source Software)

Open-source software (OSS) licenses dictate the conditions which should be followed to reuse, distribute, and modify the software. Apart from widely-used licenses such as the MIT License, developers are also allowed to customize their own licenses (called custom licenses), whose descriptions are more flexible. The presence of such various licenses imposes challenges to understanding licenses and their compatibility. To avoid financial and legal risks, it is essential to ensure license compatibility when integrating third-party packages or reusing code accompanied with licenses. In this work, we propose LiDetector, an effective tool that extracts and interprets OSS licenses (including both official licenses and custom licenses), and detects license incompatibility among these licenses. Specifically, LiDetector introduces a learning-based method to automatically identify meaningful license terms from an arbitrary license and employs Probabilistic Context-Free Grammar (PCFG) to infer rights and obligations for incompatibility detection. Experiments demonstrate that LiDetector outperforms existing methods with 93.28% precision for term identification, and 91.09% accuracy for right and obligation inference, and can effectively detect incompatibility with a 10.06% FP rate and 2.56% FN rate. Furthermore, with LiDetector, our large-scale empirical study on 1,846 projects reveals that 72.91% of the projects are suffering from license incompatibility, including popular ones such as the MIT License and the Apache License. We highlighted lessons learned from the perspectives of different stakeholders and made all related data and the replication package publicly available to facilitate follow-up research.

翻译：开放源码软件(OSS)许可证规定了重新使用、分发和修改软件所应遵循的条件。除了MIT许可证等广泛使用的许可证外,还允许开发商定制自己的许可证(所谓的海关许可证),这些许可证的描述更加灵活。这些许可证的存在给理解许可证及其兼容性带来了挑战。为了避免金融和法律风险,在整合第三方软件包或重新使用附有许可证的代码时,必须确保许可证的兼容性。在这项工作中,我们提议使用LiSetaor,这是一个有效的工具,可以提取和解释开放源码软件许可证(包括官方许可证和海关许可证),并发现许可证之间不兼容性。具体地说,Lisator采用基于学习的方法,从任意许可证中自动确定有意义的许可证条款(所谓的“海关许可证”),并采用“无环境限制格拉姆尔(PCFG)”(Probabability-Lampmar)来推断不兼容性的权利和义务。实验表明,在确定术语精确度方面,比现有方法要好93.28%,我们强调对权利和义务的精确度为91.09%的精确度,并且能够有效地发现与10.6%的FPP率和2.58%的供应商许可证不相容不兼容性。具体分析。此外,测试了我们从一个大比例和2.91%的相关数据分析项目,包括不相容性研究,还从一个不相容。