As control-flow protection methods get widely deployed it is difficult for attackers to corrupt control data to build attacks. Instead, data-oriented exploits, which modify non-control data for malicious goals, have been demonstrated to be possible and powerful. To defend against data-oriented exploits, the first fundamental step is to identify non-control, security-critical data. However, previous works mainly rely on tedious human efforts to identify critical data, which cannot handle large applications nor easily port to new programs. In this work, we investigate the application of deep learning to critical data identification. This work provides non-intuitive understanding about (a) why straightforward ways of applying deep learning would fail, and (b) how deep learning should be applied in identifying critical data. Based on our insights, we have discovered a non-intuitive method which combines Tree-LSTM models and a novel structure of data-flow tree to effectively identify critical data from execution traces. The evaluation results show that our method can achieve 87.47% accuracy and a F1 score of 0.9123, which significantly outperforms the baselines. To the best of our knowledge, this is the first work using a deep neural model to identify critical data in program binaries.
翻译:随着控制流保护方法的广泛使用,袭击者很难腐蚀控制数据来制造攻击。相反,事实证明,改变非控制数据用于恶意目的的数据导向开发是可能的和强大的。为了防范数据导向开发,第一个基本步骤是确定非控制、安全关键数据。然而,以往的工作主要依靠人类的乏味努力来识别关键数据,而这些数据无法处理大应用程序,也无法轻易地连接到新程序。在这项工作中,我们调查对关键数据识别应用深度学习。这项工作提供了非直观的理解:(a) 为何应用深度学习的直接方法会失败,以及(b) 如何在识别关键数据方面应用深度学习。根据我们的见解,我们发现了一种非直观的方法,将树-LSTM模型和数据流树的新结构结合起来,以有效识别执行痕迹的关键数据。评价结果表明,我们的方法可以达到87.47%的准确率和0.9123分的F1分,这大大超出了基线。根据我们的知识,这是使用深深层神经模型确定关键数据的第一个工作。