Open source code is considered a common practice in modern software development. However, reusing other code allows bad actors to access a wide developers' community, hence the products that rely on it. Those attacks are categorized as supply chain attacks. Recent years saw a growing number of supply chain attacks that leverage open source during software development, relaying the download and installation procedures, whether automatic or manual. Over the years, many approaches have been invented for detecting vulnerable packages. However, it is uncommon to detect malicious code within packages. Those detection approaches can be broadly categorized as analyzes that use (dynamic) and do not use (static) code execution. Here, we introduce Malicious Source code Detection using Transformers (MSDT) algorithm. MSDT is a novel static analysis based on a deep learning method that detects real-world code injection cases to source code packages. In this study, we used MSDT and a dataset with over 600,000 different functions to embed various functions and applied a clustering algorithm to the resulting vectors, detecting the malicious functions by detecting the outliers. We evaluated MSDT's performance by conducting extensive experiments and demonstrated that our algorithm is capable of detecting functions that were injected with malicious code with precision@k values of up to 0.909.
翻译:开放源码被认为是现代软件开发中的一种常见做法。 但是, 重新使用其他代码可以让不良行为方接触广泛的开发者群体, 也就是依赖它的产品。 这些袭击被归类为供应链袭击。 最近几年,供应链袭击越来越多,在软件开发期间利用开放源码,自动或人工转发下载和安装程序。 多年来,发明了许多方法来探测脆弱的软件包。 然而, 发现软件包中的恶意代码是罕见的。 这些检测方法可以被广泛归类为使用( 动态) 而不使用( 静态) 代码执行的分析。 在这里, 我们引入了使用变换器( MSDT) 算法的恶意源码检测。 MSDT是一种新颖的静态分析, 其基础是深入学习方法, 检测到源代码包中的真实世界代码注入案例。 在这项研究中,我们使用MSDT和数据集60多功能来嵌入各种功能, 并对由此形成的矢量采用组合算法, 通过检测外部数据来检测恶意功能。 我们通过进行广泛的实验来评估MSDD的性功能, 并证明我们的算算算法能够检测用恶意代码输入0.909的精确值的功能。