LLM-based code agents(e.g., ChatGPT Codex) are increasingly deployed as detector for code review and security auditing tasks. Although CoT-enhanced LLM vulnerability detectors are believed to provide improved robustness against obfuscated malicious code, we find that their reasoning chains and semantic abstraction processes exhibit exploitable systematic weaknesses.This allows attackers to covertly embed malicious logic, bypass code review, and propagate backdoored components throughout real-world software supply chains.To investigate this issue, we present CoTDeceptor, the first adversarial code obfuscation framework targeting CoT-enhanced LLM detectors. CoTDeceptor autonomously constructs evolving, hard-to-reverse multi-stage obfuscation strategy chains that effectively disrupt CoT-driven detection logic.We obtained malicious code provided by security enterprise, experimental results demonstrate that CoTDeceptor achieves stable and transferable evasion performance against state-of-the-art LLMs and vulnerability detection agents. CoTDeceptor bypasses 14 out of 15 vulnerability categories, compared to only 2 bypassed by prior methods. Our findings highlight potential risks in real-world software supply chains and underscore the need for more robust and interpretable LLM-powered security analysis systems.
翻译:基于大型语言模型(LLM)的代码代理(例如ChatGPT Codex)正越来越多地被部署为代码审查和安全审计任务中的检测器。尽管CoT增强型LLM漏洞检测器被认为能提供更强的鲁棒性以应对混淆后的恶意代码,但我们发现其推理链和语义抽象过程存在可利用的系统性弱点。这使得攻击者能够隐蔽地嵌入恶意逻辑,绕过代码审查,并在现实世界的软件供应链中传播包含后门的组件。为了研究这一问题,我们提出了CoTDeceptor,这是首个针对CoT增强型LLM检测器的对抗性代码混淆框架。CoTDeceptor能够自主构建不断演进的、难以逆向的多阶段混淆策略链,从而有效破坏CoT驱动的检测逻辑。我们基于安全企业提供的恶意代码进行实验,结果表明CoTDeceptor在针对最先进的LLM和漏洞检测代理时,实现了稳定且可迁移的规避性能。与先前方法仅能绕过2个漏洞类别相比,CoTDeceptor绕过了15个漏洞类别中的14个。我们的研究结果揭示了现实世界软件供应链中的潜在风险,并强调需要构建更鲁棒且可解释的、基于LLM的安全分析系统。