Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized metapaths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.
翻译:由不同类型节点和边缘组成的学习异质图形可以增强同质图形技术的结果。这些图表的一个有趣的例子是显示可能的软件代码执行流程的控件流图。由于这些图表代表了更多代码的语义信息,因此为这些图形开发技术和工具对于在软件中检测脆弱性以取得可靠性非常有益。然而,现有的多种图形技术仍然不足以处理复杂图表,因为不同类型节点和边缘的数量巨大且可变。本文侧重于Etheinal智能合同合同,作为以控制流图和含有不同类型节点和链接的调用图表为基础的不同合同流程图为代表的软件代码样本。我们建议使用新的混合图表来学习这些混合合同图的结构结构。MANDO提取了定制的元路德,这些图将不同类型节点和边缘之间的关联联系在一起。此外,它开发了一个基于多位元模型的图解图关注网络,以学习不同类型节点的多级嵌入方法,以及它们以包含不同类型节点和不同类型节点的调图谱和含有不同节点和链接的图表。我们建议的新的混合图解图解图解图解图解的图解图解图解图解图解图解图解图解图解的图解图解图解图解图解图解的图解图解图解图解的图解,可以更多样化的图解的图解的图解的图解的图解的图解的图解的图解。在大合同图解,可以更精度,在大的合同图解的评路路路路路路路路路路路路路路路路路路路路路路路。在大的评路路。在级的评的评,在级的评中更精度中可以记录中更精确地标,在级的里的解中分析中更精确地分析级的解级的解级的解中,在级的解中分析级的解,在级的解中可以辨。