We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. Different from existing works, we show that code obfuscation, a standard code transformation operation, provides novel means to generate complementary `views' of a code that enable us to achieve both robust and accurate code models. To the best of our knowledge, this is the first systematic study to explore and exploit the robustness and accuracy benefits of (multi-view) code obfuscations in code models. Specifically, we first adopt adversarial codes as robustness-promoting views in CL at the self-supervised pre-training phase. This yields improved robustness and transferability for downstream tasks. Next, at the supervised fine-tuning stage, we show that adversarial training with a proper temporally-staggered schedule of adversarial code generation can further improve robustness and accuracy of the pre-trained code model. Built on the above two modules, we develop CLAWSAT, a novel self-supervised learning (SSL) framework for code by integrating $\underline{\textrm{CL}}$ with $\underline{\textrm{a}}$dversarial vie$\underline{\textrm{w}}$s (CLAW) with $\underline{\textrm{s}}$taggered $\underline{\textrm{a}}$dversarial $\underline{\textrm{t}}$raining (SAT). On evaluating three downstream tasks across Python and Java, we show that CLAWSAT consistently yields the best robustness and accuracy ($\textit{e.g.}$ 11$\%$ in robustness and 6$\%$ in accuracy on the code summarization task in Python). We additionally demonstrate the effectiveness of adversarial learning in CLAW by analyzing the characteristics of the loss landscape and interpretability of the pre-trained models.
翻译:我们把对比学习(CL) 和对称学习(CL) 结合到对称学习中, 以便共同优化代码模型的稳健性和准确性。 与现有的工作不同, 我们显示代码模糊化( 标准代码转换操作) 是一种新颖的手段, 以产生一个代码的补充性“ 视图 ”, 使我们能够实现稳健和准确的代码模型。 据我们所知, 这是第一次系统研究, 探索和利用( 多视图) 代码在代码模型中的稳健性和准确性。 具体地说, 我们首先采用对称代码代码代码作为 CL 的稳健性促进性观点, 在自我监督的精度前阶段, 我们首先在 CL 中, 在 C- 直观的精度中, 直观地, 直观地, 直观地, 直观地, 直观地, 直观地, 直观地, 直径, 直径, 直径。