In this paper, we address the problem of automatic repair of software vulnerabilities with deep learning. The major problemwith data-driven vulnerability repair is that the few existing datasets of known confirmed vulnerabilities consist of only a few thousandexamples. However, training a deep learning model often requires hundreds of thousands of examples. In this work, we leverage theintuition that the bug fixing task and the vulnerability fixing task are related, and that the knowledge learned from bug fixes can betransferred to fixing vulnerabilities. In the machine learning community, this technique is called transfer learning. In this paper, wepropose an approach for repairing security vulnerabilities named VRepair which is based on transfer learning. VRepair is first trainedon a large bug fix corpus and is then tuned on a vulnerability fix dataset, which is an order of magnitude smaller. In our experiments,we show that a model trained only on a bug fix corpus can already fix some vulnerabilities. Then, we demonstrate that transfer learningimproves the ability to repair vulnerable C functions. We also show that the transfer learning model performs better than a modeltrained with a denoising task and fine-tuned on the vulnerability fixing task. To sum up, this paper shows that transfer learning workswell for repairing security vulnerabilities in C compared to learning on a small dataset.
翻译:在本文中,我们用深层学习解决自动修理软件脆弱性的问题。 数据驱动的脆弱性修复的主要问题是, 现有的为数不多的已知确认的脆弱性数据集仅包括几千个实例。 然而, 培训深学习模型往往需要数十万个实例。 在这项工作中, 我们利用错误修复任务和脆弱性修复任务之间的关联, 从错误修复工作中学到的知识可以转移到修复脆弱性。 在机器学习界, 这个技术被称为转移学习。 在本文中, 我们提出一种方法来修复安全脆弱性, 名为Vrepair, 其基础是转移学习。 Vrepair 最初在大型错误修复程序上受过训练, 之后又对一个脆弱性修复数据集进行了调整, 其规模小于10万个。 在我们的实验中, 我们显示, 仅仅训练于错误修复系统的模型可以修复某些脆弱性。 然后, 我们证明, 转移学习会简化脆弱的 C 功能的能力。 我们还表明, 转移学习模型比通过解析任务和微调校正的脆弱性修复工作要好一些模型。 学习关于脆弱性修复数据的转换工作, 学习小的文档。