In this technical report, we evaluated the performance of the ChatGPT and GPT-3 models for the task of vulnerability detection in code. Our evaluation was conducted on our real-world dataset, using binary and multi-label classification tasks on CWE vulnerabilities. We decided to evaluate the model because it has shown good performance on other code-based tasks, such as solving programming challenges and understanding code at a high level. However, we found that the ChatGPT model performed no better than a dummy classifier for both binary and multi-label classification tasks for code vulnerability detection.
翻译:在本技术报告中,我们针对漏洞检测任务,使用二进制和多标签分类任务在基于现实世界数据集上对ChatGPT和GPT-3模型的性能进行了评估。我们决定评估该模型是因为在其他基于代码的任务上已经显示出良好的性能,例如解决编程挑战和高级理解代码。然而,我们发现ChatGPT模型在二进制和多标签分类任务上的表现均不如虚拟分类器的性能。