Recovering binary programs' call graphs is crucial for inter-procedural analysis tasks and applications based on them.transfer One of the core challenges is recognizing targets of indirect calls (i.e., indirect callees). Existing solutions all have high false positives and negatives, making call graphs inaccurate. In this paper, we propose a new solution Callee combining transfer learning and contrastive learning. The key insight is that, deep neural networks (DNNs) can automatically identify patterns concerning indirect calls, which can be more efficient than designing approximation algorithms or heuristic rules to handle various cases. Inspired by the advances in question-answering applications, we utilize contrastive learning to answer the callsite-callee question. However, one of the toughest challenges is that DNNs need large datasets to achieve high performance, while collecting large-scale indirect-call ground-truths can be computational-expensive. Since direct calls and indirect calls share similar calling conventions, it is possible to transfer knowledge learned from direct calls to indirect ones. Therefore, we leverage transfer learning to pre-train DNNs with easy-to-collect direct calls and further fine-tune the indirect-call DNNs. We evaluate Callee on several groups of targets, and results show that our solution could match callsites to callees with an F1-Measure of 94.6%, much better than state-of-the-art solutions. Further, we apply Callee to binary code similarity detection and hybrid fuzzing, and found it could greatly improve their performance.
翻译:回收二进制程序的调试图对于程序间分析任务和基于它们的应用至关重要。 转移 核心挑战之一是识别间接调试的目标( 间接调试被调试者 ) 。 现有的解决方案都有很高的假正反反, 使调试图不准确 。 在本文中, 我们提出一个新的解决方案 Callee 结合传输学习和对比性学习。 关键的洞察力是, 深度神经网络( DNN) 可以自动识别间接调试的模式, 这比设计近似算法或超常规则来处理各种案件更有效。 因此, 我们受答问应用程序进展的启发, 我们利用对比学习来回答调用人的问题。 然而, 最严峻的挑战之一是, DNNNPs 需要大型的数据集才能取得高性能, 同时收集大规模间接间接调用地地面调试探查。 由于直接调用电话和间接调试可以进一步将学到的知识传递到间接调试。 因此, 我们利用对前调 DNNPs 进行更精确的学习, 我们直接的调调调用一个更好的调制, 。