Security is critical to the adoption of open source software (OSS), yet few automated solutions currently exist to help detect and prevent malicious contributions from infecting open source repositories. On GitHub, a primary host of OSS, repositories contain not only code but also a wealth of commit-related and contextual metadata - what if this metadata could be used to automatically identify malicious OSS contributions? In this work, we show how to use only commit logs and repository metadata to automatically detect anomalous and potentially malicious commits. We identify and evaluate several relevant factors which can be automatically computed from this data, such as the modification of sensitive files, outlier change properties, or a lack of trust in the commit's author. Our tool, Anomalicious, automatically computes these factors and considers them holistically using a rule-based decision model. In an evaluation on a data set of 15 malware-infected repositories, Anomalicious showed promising results and identified 53.33% of malicious commits, while flagging less than 1% of commits for most repositories. Additionally, the tool found other interesting anomalies that are not related to malicious commits in an analysis of repositories with no known malicious commits.
翻译:安全对于采用开放源码软件(OSS)至关重要,但目前很少有自动解决方案可以帮助检测和防止恶意贡献影响开放源库。在GitHub(开放源码软件的主要主机),库不仅包含代码,而且包含大量与承诺相关的元数据――如果该元数据可用于自动识别恶意开放源码软件贡献?在这项工作中,我们展示了如何仅使用日志和存储元数据来自动检测异常和潜在恶意承诺;我们确定并评价了可以从这些数据自动计算出来的几个相关因素,例如敏感文档的修改、异常变化属性或对承诺作者缺乏信任。我们的工具,异常,自动解读了这些因素,并用基于规则的决定模型整体地考虑了这些因素。在对15个受恶意威胁的储存库的数据集的评估中,Anomicious展示了有希望的结果,并确定了53.33%的恶意承诺,同时为大多数储存库指明了不到1%的承诺。此外,该工具还发现其他有趣的异常情况,在分析没有恶意恶意承诺的储存库时与恶意行为无关。