CENTRIS: 查明经修改的开放源码软件再利用的精确和可扩展的方法 (CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software Reuse) - 专知论文

会员服务 ·

0

可辨认的 · 查准率/准确率 · 查全率/召回率 · Less · 确切的 ·

2021 年 2 月 11 日

CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software Reuse

翻译：CENTRIS: 查明经修改的开放源码软件再利用的精确和可扩展的方法

Seunghoon Woo,Sunghan Park,Seulbae Kim,Heejo Lee,Hakjoo Oh

from arxiv, To appear in the 43rd International Conference on Software Engineering (ICSE 2021)

Open-source software (OSS) is widely reused as it provides convenience and efficiency in software development. Despite evident benefits, unmanaged OSS components can introduce threats, such as vulnerability propagation and license violation. Unfortunately, however, identifying reused OSS components is a challenge as the reused OSS is predominantly modified and nested. In this paper, we propose CENTRIS, a precise and scalable approach for identifying modified OSS reuse. By segmenting an OSS code base and detecting the reuse of a unique part of the OSS only, CENTRIS is capable of precisely identifying modified OSS reuse in the presence of nested OSS components. For scalability, CENTRIS eliminates redundant code comparisons and accelerates the search using hash functions. When we applied CENTRIS on 10,241 widely-employed GitHub projects, comprising 229,326 versions and 80 billion lines of code, we observed that modified OSS reuse is a norm in software development, occurring 20 times more frequently than exact reuse. Nonetheless, CENTRIS identified reused OSS components with 91% precision and 94% recall in less than a minute per application on average, whereas a recent clone detection technique, which does not take into account modified and nested OSS reuse, hardly reached 10% precision and 40% recall.

翻译：开放源码软件(OSS)被广泛重新利用,因为它为软件开发提供了方便和效率。尽管有明显的好处,但未经管理的OSS组件可能带来威胁,如脆弱性传播和违反许可证规定等。但不幸的是,确定再利用的OSS组件是一个挑战,因为再利用的OSS主要是改造和嵌套。在本文件中,我们提议CENTRIS, 一种精确和可扩缩的方法,用以识别经修改的OSS再利用,因为它为软件开发提供了方便和效率。通过分割开放源码软件代码库,并探测只有开放源码软件独特部分的再利用,CENTRIS能够精确地识别经修改的OSS再利用,但对于可扩展性而言,CENTRIS消除了多余的代码比较,并加快了使用Hash功能的搜索。当我们在10,241个被广泛雇用的GitHub项目(由229,326版本和800亿行代码组成)上,我们发现,经修改的OSS再利用是软件开发的规范,比精确度高出20倍。然而,CENRIS发现再利用的开放源码软件组件的精确度为91%和94%,每分钟回回回回回回回回回回回回回时间不到一分钟,在平均40 %。

0

相关内容

可辨认的

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【干货书】现代数据平台架构，636页pdf

【干货书】现代数据平台架构，636页pdf

专知会员服务

260+阅读 · 2020年6月15日

【干货书】Python机器学习导论，340页pdf数据科学家指南

专知会员服务

175+阅读 · 2020年6月4日

【新书】Azure 深度学习，在微软人工智能平台上构建和部署人工智能解决方案 | Deep Learning with Azure，Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform，附298页pdf

【新书】Azure 深度学习，在微软人工智能平台上构建和部署人工智能解决方案 | Deep Learning with Azure，Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform，附298页pdf

专知会员服务

43+阅读 · 2020年1月13日

【O’Reilly讲座】基于深度学习的异常检测方法用于检测大型数据集的质量：Anomaly detection using deep learning to measure the quality of large datasets

【O’Reilly讲座】基于深度学习的异常检测方法用于检测大型数据集的质量：Anomaly detection using deep learning to measure the quality of large datasets

专知会员服务

31+阅读 · 2020年1月11日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

【O'Reilly TensorFlow Conference 2019】使用TensorFlow Extended（TFX）的生产ML管道（ Production ML pipelines with TensorFlow Extended (TFX) ）， Wifirst 的创始人兼CTO AurélienGéron

【O'Reilly TensorFlow Conference 2019】使用TensorFlow Extended（TFX）的生产ML管道（ Production ML pipelines with TensorFlow Extended (TFX) ）， Wifirst 的创始人兼CTO AurélienGéron

专知会员服务

11+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

用 Python 开发 Excel 宏脚本的神器

用 Python 开发 Excel 宏脚本的神器

私募工场

26+阅读 · 2019年9月8日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

人脸检测库：libfacedetection

人脸检测库：libfacedetection

Python程序员

15+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Nature 一周论文导读 | 2018 年 3 月 29 日

Nature 一周论文导读 | 2018 年 3 月 29 日

科研圈

12+阅读 · 2018年4月7日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Logging Practices with Mobile Analytics: An Empirical Study on Firebase

Arxiv

0+阅读 · 2021年4月6日

Particle Mesh Ewald for Molecular Dynamics in OpenCL on an FPGA Cluster

Particle Mesh Ewald for Molecular Dynamics in OpenCL on an FPGA Cluster

Arxiv

0+阅读 · 2021年4月5日

FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance

Arxiv

0+阅读 · 2021年4月5日

Approximate Byzantine Fault-Tolerance in Distributed Optimization

Arxiv

0+阅读 · 2021年4月3日

A Formal Analysis of the MimbleWimble Cryptocurrency Protocol

Arxiv

0+阅读 · 2021年4月2日

Assessing the Exposure of Software Changes: The DiPiDi Approach

Arxiv

0+阅读 · 2021年4月1日

A Survey of Learning Causality with Data: Problems and Methods

A Survey of Learning Causality with Data: Problems and Methods

Arxiv

19+阅读 · 2018年9月25日

A Dual Approach to Scalable Verification of Deep Networks

A Dual Approach to Scalable Verification of Deep Networks

Arxiv

3+阅读 · 2018年8月3日

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Arxiv

3+阅读 · 2018年2月20日

Experience-driven Networking: A Deep Reinforcement Learning based Approach

Arxiv

9+阅读 · 2018年1月17日

VIP会员

文章信息

相关主题

查准率/准确率

查全率/召回率

相关VIP内容

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【干货书】现代数据平台架构，636页pdf

【干货书】现代数据平台架构，636页pdf

专知会员服务

260+阅读 · 2020年6月15日

【干货书】Python机器学习导论，340页pdf数据科学家指南

专知会员服务

175+阅读 · 2020年6月4日

【新书】Azure 深度学习，在微软人工智能平台上构建和部署人工智能解决方案 | Deep Learning with Azure，Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform，附298页pdf

【新书】Azure 深度学习，在微软人工智能平台上构建和部署人工智能解决方案 | Deep Learning with Azure，Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform，附298页pdf

专知会员服务

43+阅读 · 2020年1月13日

【O’Reilly讲座】基于深度学习的异常检测方法用于检测大型数据集的质量：Anomaly detection using deep learning to measure the quality of large datasets

【O’Reilly讲座】基于深度学习的异常检测方法用于检测大型数据集的质量：Anomaly detection using deep learning to measure the quality of large datasets

专知会员服务

31+阅读 · 2020年1月11日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

【O'Reilly TensorFlow Conference 2019】使用TensorFlow Extended（TFX）的生产ML管道（ Production ML pipelines with TensorFlow Extended (TFX) ）， Wifirst 的创始人兼CTO AurélienGéron

【O'Reilly TensorFlow Conference 2019】使用TensorFlow Extended（TFX）的生产ML管道（ Production ML pipelines with TensorFlow Extended (TFX) ）， Wifirst 的创始人兼CTO AurélienGéron

专知会员服务

11+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

用 Python 开发 Excel 宏脚本的神器

用 Python 开发 Excel 宏脚本的神器

私募工场

26+阅读 · 2019年9月8日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

人脸检测库：libfacedetection

人脸检测库：libfacedetection

Python程序员

15+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Nature 一周论文导读 | 2018 年 3 月 29 日

Nature 一周论文导读 | 2018 年 3 月 29 日

科研圈

12+阅读 · 2018年4月7日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Logging Practices with Mobile Analytics: An Empirical Study on Firebase

Arxiv

0+阅读 · 2021年4月6日

Particle Mesh Ewald for Molecular Dynamics in OpenCL on an FPGA Cluster

Particle Mesh Ewald for Molecular Dynamics in OpenCL on an FPGA Cluster

Arxiv

0+阅读 · 2021年4月5日

FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance

Arxiv

0+阅读 · 2021年4月5日

Approximate Byzantine Fault-Tolerance in Distributed Optimization

Arxiv

0+阅读 · 2021年4月3日

A Formal Analysis of the MimbleWimble Cryptocurrency Protocol

Arxiv

0+阅读 · 2021年4月2日

Assessing the Exposure of Software Changes: The DiPiDi Approach

Arxiv

0+阅读 · 2021年4月1日

A Survey of Learning Causality with Data: Problems and Methods

A Survey of Learning Causality with Data: Problems and Methods

Arxiv

19+阅读 · 2018年9月25日

A Dual Approach to Scalable Verification of Deep Networks

A Dual Approach to Scalable Verification of Deep Networks

Arxiv

3+阅读 · 2018年8月3日

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Arxiv

3+阅读 · 2018年2月20日

Experience-driven Networking: A Deep Reinforcement Learning based Approach

Arxiv

9+阅读 · 2018年1月17日

微信扫码咨询专知VIP会员