Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to untrusted sinks (in confidentiality policies). The appeal of this approach is that the taint-tracking mechanism has to be implemented only once, and can then be parameterized with different taint specifications (that is, sets of sources and sinks, as well as any sanitizers that render otherwise problematic flows innocuous) to detect many different kinds of vulnerabilities. But while techniques for implementing scalable inter-procedural static taint tracking are fairly well established, crafting taint specifications is still more of an art than a science, and in practice tends to involve a lot of manual effort. Past work has focussed on automated techniques for inferring taint specifications for libraries either from their implementation or from the way they tend to be used in client code. Among the latter, machine learning-based approaches have shown great promise. In this work we present our experience combining an existing machine-learning approach to mining sink specifications for JavaScript libraries with manual taint modelling in the context of GitHub's CodeQL analysis framework. We show that the machine-learning component can successfully infer many new taint sinks that either are not part of the manual modelling or are not detected due to analysis incompleteness. Moreover, we present techniques for organizing sink predictions using automated ranking and code-similarity metrics that allow an analysis engineer to efficiently sift through large numbers of predictions to identify true positives.
翻译:静态分析被确定为探测安全脆弱性的一种选择武器。 特别是塔恩特分析是一种非常普遍和强大的技术,安全政策表现为禁止流动,从不信任的投入源到敏感汇(完整性政策),或从敏感源到不信任汇(保密政策 ) 。 这种方法的吸引力是,清洁跟踪机制只能实施一次,然后可以用不同的污点规格(即源和汇的组合,以及任何使其他问题流动不易的防腐剂)进行参数化,以探测多种类型的脆弱性。但是,安全政策表现为禁止流动,从不信任的投入源到敏感汇(完整性政策),或从敏感源到敏感汇(保密政策 ) 。 这种方法的吸引力是,污点追踪机制机制只能实施一次,然后可以用不同的污点规格(即自动技术)来推断图书馆的污点规格,或者用它们通常用于客户代码的污点。 后一种基于机器的学习方法没有表现出很大的希望。 但是,虽然实施可缩缩略性内部静态的追踪技术已经相当成熟,但是,我们目前将一个精细的精度分析方法结合了我们目前手工结构结构结构分析。