反常行为:对吉特Hub的异常和潜在恶意行为进行自动检测 (Anomalicious: Automated Detection of Anomalous and Potentially Malicious Commits on GitHub)

from arxiv, 10 pages, 3 figures, 3 tables. To appear at the 2021 International Conference on Software Engineering (ICSE), Software Engineering in Practice (SEiP) track

Security is critical to the adoption of open source software (OSS), yet few automated solutions currently exist to help detect and prevent malicious contributions from infecting open source repositories. On GitHub, a primary host of OSS, repositories contain not only code but also a wealth of commit-related and contextual metadata - what if this metadata could be used to automatically identify malicious OSS contributions? In this work, we show how to use only commit logs and repository metadata to automatically detect anomalous and potentially malicious commits. We identify and evaluate several relevant factors which can be automatically computed from this data, such as the modification of sensitive files, outlier change properties, or a lack of trust in the commit's author. Our tool, Anomalicious, automatically computes these factors and considers them holistically using a rule-based decision model. In an evaluation on a data set of 15 malware-infected repositories, Anomalicious showed promising results and identified 53.33% of malicious commits, while flagging less than 1% of commits for most repositories. Additionally, the tool found other interesting anomalies that are not related to malicious commits in an analysis of repositories with no known malicious commits.

翻译：安全对于采用开放源码软件(OSS)至关重要,但目前很少有自动解决方案可以帮助检测和防止恶意贡献影响开放源库。在GitHub(开放源码软件的主要主机),库不仅包含代码,而且包含大量与承诺相关的元数据――如果该元数据可用于自动识别恶意开放源码软件贡献?在这项工作中,我们展示了如何仅使用日志和存储元数据来自动检测异常和潜在恶意承诺;我们确定并评价了可以从这些数据自动计算出来的几个相关因素,例如敏感文档的修改、异常变化属性或对承诺作者缺乏信任。我们的工具,异常,自动解读了这些因素,并用基于规则的决定模型整体地考虑了这些因素。在对15个受恶意威胁的储存库的数据集的评估中,Anomicious展示了有希望的结果,并确定了53.33%的恶意承诺,同时为大多数储存库指明了不到1%的承诺。此外,该工具还发现其他有趣的异常情况,在分析没有恶意恶意承诺的储存库时与恶意行为无关。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日